Scraper / Spider / Bot to Scrape Amazon Product Prices
Project Title: Scraper / Spider / Bot to Scrape Amazon Product Prices
We are looking for a scraper / spider / bot to scan and store Amazon product prices in our own Database
The Amazon API allows us to check and update prices of products, but for the volume and frequency we need to check, the API is unreliable and restrictive
We have a database of Amazon Products and ASIN numbers (the unqiue product numbers for items on Amazon) that we need to check every hour, 24 hours a day, 7 days a week. The current database of ASIN numbers stands at around 25,000, although this is likely to double in size in the next week – so we need to work on the assumption that we have 50,000 products that need checking
Our database stores the ASIN number, the product price and the stock status of the item (available / not available).
We wish to deploy a scraper / spider / bot – or multiple scrapers / spiders / bots to check this list of 50,000 ASINs every single hour, returning the price and stock status to our database, where we can update / adjust price / stock status as required.
We need to have this data returned every hour, so we are looking at a tool that is.
1) Fast, needing to scan upto 50,000 products per hour – or multiple bots / spiders that scan a portion each per hour.
2) Reliable – we need a safe guard in place incase Amazon block it or the spider / bot fails. We would need to be alerted (email / text) if it broke / stopped or was unable to access so that we could deploy / rename / use new IP for another bot / spider to begin again.
3) Automated – this will need to run every hour, 24 hours a day, 7 days a week without intervention, so it would need to run automatically without fail.
4) Hidden / Cloaked – we don’t want the spider / bot to get blocked in case Amazon detects it and does not favour it spidering the site, therefore we need to have a system in place whereby the spider / bot launched from varying IPs or hidden services hourly / daily (depending on what you think), or a spider / bot that renames itself every time it is launched hourly / daily to avoid detection. This may not be an issue and it may be used undetected without problem – however this is something to consider just in case it does occur.
5) Expandable – we may add more products to our database so we need to make sure that we can increase our scans from 50,000 to any number upwards – so we would need to know the limitations of the bot / spider to plan future growth accordingly
6) Run remotely – we do not want to run it from our own servers, or have it directly on our own IPs – therefore can we run this from a remote service or from random purchase domain and hosting packages from different IPs across the world?
I look forward to your thoughts on this and a quotation on the tool if you believe you are able to create it.
For smililar work requirement feel free to email us on email@example.com