Scraping Yelp Attractions Data
Project Title: Scraping Yelp verified Attractions Data
Objective: Retrieve all the basic listings data for a subset of categories verified from Yelp for each city, in each state, in the US.
1. Go to http://www.yelp.com/locations/states
2. Click into a state (e.g. California)
a. This brings you to a page with a list of all cities in the state.
b. We would like to crawl each city listed
3. Click into a city (e.g. Danville)
a. Once on this page, there is a heading that says “Best of Yelp”
4. Click on the button on the right side labeled “See More” (shown below)
a. This brings you to a page with a section with a rectangular block of pictures.
5. At the bottom right of this block of pictures, click a button labeled “More Restaurants”
a. This page should have a list of categories and filters, followed by a listing of businesses
b. On the top left there should be a breadcrumb feature that says “Businesses > Restaurants”
6. Click on the portion of the breadcrumb that say “Businesses”
a. The page should then refresh and the breadcrumb should disappear
b. Underneath the city name, there should be a list of categories
c. We would like to crawl only the categories “Active Life” and “Arts & Entertainment”, and “Hotels &Travel”
7. Click on one of the categories (only “Active Life”, “Arts & Entertainment”, or “Hotels &Travel”)
a. The page should refresh again and the breadcrumb should reappear
b. Underneath this section, there are filters under the labels of “Sort By”, “Cities”, “Distance”, and “Features”
8. Under the “Distance” label click “Bird’s-eye View” (this allows us to show all listings in that city since otherwise, it may default to 2-mile radius or something else that only shows a subset from that city.)
a. The page will refresh again with new listings
b. We want to crawl every listing on this page
i. Sometimes there will be listings that are located outside the city originally clicked on in Step 4. We want to compare the city of each attraction on that list with the name of the city we’re crawling to exclude anything that’s not in that city. Please do not include listings that are outside of the original location since they will be duplicates
c. Note: The listings are paginated
9. Click into the name of a listing
a. This is where the data will be pulled. On the next page, you will find a list of all the data that we want plus an annotated screenshot for guidance
b. Note: not all data will be available for many attractions. Some of the attractions display nothing but the name of the attraction
For similar work requirement feel free to email us on firstname.lastname@example.org.