Scraping Data from Yelp.com
Project Title: Scraping Data from Yelp.com
On a monthly basis Scrape all records from www.yelp.com across US for the below mentioned 7 categories.
Arts & Entertainment (arts)
Beauty & Spas (beauty)
Hotels & Travel (hotel)
I have also attached a file with sample records in output format. The sample files have the variables information and the expected output layout. Note, the layout and the variables remain the same across all the categories.
Note, there should not be any duplicate records within a specific category, however there can be duplicate records/businesses across the different categories.. i.e., A business can be captured in the restaurant as well as nightlife category. However, the same business should not appear twice in the restaurant category. The de-dupe logic has to be based on the combination of category name + reference URL so as to get the maximum number of unique records within the category.
Per our experience, we expect a total of approx.. 3.5 – 4 million records for all the 7 categories combined.
Request you to please evaluate share your PoC work within next few days at earliest, post which we can look for to partner with you on this project.
Do let us know if there are any questions or We can connect on Zoom for some time.
For similar work requirements feel free to email us on email@example.com.