23
Nov

Scraping and Mapping Product URL Lists

Posted By admin

Project Title: Scraping and Mapping Product URL Lists

Project Description:
In the URL Search tab, you will find a list of URLs which are known likely to sell the items in the Product Catalog tab.

Your task is to go through each URL and search for the products listed in the Product Catalog tab. Such search might involve searching by brand, brand variants, UPC, SKU or partial product name.

If you cannot find the brand on the website, do a quick google search for the website and the name of the brand to see if any results appear. Once you’ve found those products, record the URL in the Mappings tab, along with the relevant item name, SKU, and UPC, and prices (from the Product Catalog).

We’ve provided a sample of how it should look, where we copied over the product catalog for each of the first three websites, and put in one matching URL where it goes for cosmotress.com.

Then go back to the URL Search tab, update the Status (there’s a drop-down menu of options), and enter # of Products Matching and # of Products Not Matching, based on your search.

Mapping URLs – tips:
• If a website uses variations or variants on the end to identify a specific product on the page, please be sure to include it. e.g. https://desertfamilypets.us/products/canidae-all-life-stages-formula-dry-dog-food?variant=16759366746179 or http://www.pawdpet.supplies/product/canidae-all-life-stages-chicken-turkey-lamb-fish-meals/?attribute_weight=15+lb

• If a website uses variations or variants on the end to identify a specific product on the page, make sure to select the default drop-down option to get that variation (you may need to select another option first and then select the default option). e.g. https://desertfamilypets.us/products/canidae-all-life-stages-formula-dry-dog-food?variant=16759366647875 not https://desertfamilypets.us/products/canidae-all-life-stages-formula-dry-dog-food

• If a website lists multiple products on a page and does not use variations on the URL, add the product to the cart and then view the product to see if the URL changes for each variation. If so, record the URL with the variation.

• If you do add a product to the cart to get the variation, make sure not to include extra information on the URL. e.g. Use https://www.ebags.com/product/dakine/pickup-pad/362936?productid=10687931 not https://www.ebags.com/product/dakine/pickup-pad/362936?productid=10687931&cartItemId=2

• Make sure, in general, not to include any extra information on the URL. e.g. Use https://gravitycoalition.com/products/dakine-hot-laps-5l-bike-waist-bag?variant=22135035559984 not https://gravitycoalition.com/products/dakine-hot-laps-5l-bike-waist-bag?_pos=1&_sid=afbab28ac&_ss=r&variant=22135035559984. Other examples would be when a website adds ?search= to the URL after you search for something. As long as the product page works without the extra information (and gets the correct variation when possible) extra parameters on the URl should be removed.

• Only map a website if it sells in USD.

• Only map a website if it lists a selling price. If a website only lists an MSRP price, do not map it. If a website requires a login to view price, do not map it. There are a few exceptions to this where we’ll ask you to get a fake account with that website, but we’ll tell you what those are if they come up. One common example that requires login and we would like it mapped is Rakuten.com, but we only want products sold by Rakuten.com itself mapped (not the ones they list from other websites)

• Please do thorough quality check before completing the mapping task making sure URLs and formatting are accurate.

• Make sure any leading zeros (in the “UPC” column) are still in the Mapping tab after being copied over from the Product Catalog tab.

When done, please send us the completed Excel file back, along with what you would have charged us for this project, as well as what you will charge per URL discovered should we choose to hire you.

For similar work requirements feel free to email us on info@webscrapingexpert.com.

Comments

Add a comment