Scraping Data from Walmart
Project Title: Scraping Data from Walmart
I need to scrape specific pages on Walmart.com:
Search results pages for specific search terms.
Reliable: Walmart uses several anti-bot measures to make scraping difficult. The system needs to avoid, detect and get around these.
System needs to be smart enough to detect issues and act accordingly and not get into infinite retry loops.
System needs to validate HTML for expected results and notify developer when a validation fails and include data that we need to investigate the issue (which validation failed, why it failed, HTML source code, etc.)
System needs to account for known or recurring issues; You don’t have to use Selenium, but for example: Selenium sometimes loses connection with the browser process; the system needs to detect this, kill the old process and start a new browser process.
I’d prefer Python, but if you have a ready solution in another language, as long as the system is reliable, that’s ok.
Example document format in collection scraping_jobs:
‘url’ : ‘https://www.walmart.com/search?q=garlic+press&page=3’,
‘status’ : ‘pending’,
‘scraper_timetime_started’ : None,
‘scraper_retries’ : 0,
‘scraping_issues’ : False,
‘html_path’ : None,
For similar work requirement feel free to email us on email@example.com.