08
Oct

Scrape hotels listings reviews, location, property information, user data

Posted By admin
Data Mining and Web Research Services

Project Title: Scrape Hotels Listings Reviews, Location, Property Information, User Data

Project Description:
1. Context
– We are looking for a partner to externalize its website crawling / scraping / data extraction activities.
– We are looking for a SLA of 99%. Quality and accuracy are of utmost importance.
– The number of websites to crawl will increase over time and exceed 100.
– Industry: Hotels.
– Type of data: (1) hotel listings, (2) hotel property information, (3) hotel location data, (4) travel reviews, (5) user data.
– Languages: All.
– Period: data from the past (with specific date limitation) but also incremental data on monthly basis.
– Coverage:
– Short-term: list of cities to be provided by Travelsify.
– Long-term: Worldwide.
– Additional challenge to standard scraping: all scraped data should be normalized per hotel property, meaning that hotel property matching across websites is mandatory:
– Example: Hotel Property #000001 is composed of:
– https://www.expedia.com/Barcelona-Hotels-W-Barcelona.h2578680.Hotel-Information
– http://www.tripexpert.com/barcelona/hotels/w-barcelona
– https://www.tripadvisor.com/Hotel_Review-g187497-d1465497-Reviews-W_Barcelona-Barcelona_Catalonia.html
– http://www.booking.com/hotel/es/w-barcelona.en-gb.html
– http://www.telegraph.co.uk/travel/destinations/europe/spain/catalonia/barcelona/hotels/w-barcelona-hotel/
– Pilot: Before entering into any commercial collaboration, We are willing to have a test on the following cities to validate accuracy, performance, and way of working:
a. Paris (city and surroundings), France
b. Barcelona (downtown city), Spain
c. Boston (downtown city), MA-USA

2. Websites
– For the purpose of the pilot, we are looking to scrap the 5 following websites:
a. Booking: https://www.booking.com
b. TripAdvisor: https://www.tripadvisor.com
c. Expedia: https://www.expedia.com
d. TripExpert: http://www.tripexpert.com
e. Telegraph Travel: http://www.telegraph.co.uk/travel/
– Should the pilot on the 3 cities mentioned above be conclusive, the list of websites to crawl/scrap will be extended and exceed 100.

3. Hotel Property Listings
– For each of the websites to scrap, the list of hotel properties per city needs to be retrieved.
– The hotel property listings should not include apartments, bed and breakfasts, boats, guesthouses, hostels, holiday homes, love hotels.
– Hotels ranging from 0 to 5 stars.
– Additional work: hotel properties matching across websites (cf. chapter #1)

4. Hotel Location Data
– For each of the websites to scrap, the following location data shall be retrieved:
a. Hotel name
b. Hotel page url from the website, e.g. http://www.booking.com/hotel/es/rivoliramblas.engb.html
c. Hotel street
d. Hotel street number
e. Hotel city
f. Hotel postal code
g. Hotel country
h. Hotel longitude (only for Expedia, TripAdvisor, Booking)
i. Hotel latitude (only for Expedia, TripAdvisor, Booking)

5. Hotel Property Information
– For each of the websites to scrap, the following hotel property data shall be retrieved if they are present:
a. Hotel name
b. Hotel other names, e.g. Expedia indicates whether the hotel has other names and which ones: https://www.expedia.com/Barcelona-Hotels-W-Barcelona.h2578680.Hotel-Information (the other names for W Barcelona are: Barcelona W, W Hotel, W Barcelona Catalonia, W Barcelona, W Hotel Barcelona, w Barcelona Hotel Barcelona)
c. Hotel ID Strictly confidential Page 3 April 23, 2016
d. Hotel full description in all languages from the website
e. Hotel description as provided by the hotel (“official description” from TripAdvisor, “an inside look at…” from Booking.com)
f. Number of rooms
g. Hotel chain name
h. List of all hotel amenities/facilities (only in English)
i. List of all hotel policies (only in English) including hotel price range on TripAdvisor

6. Travel reviews
– For each of the websites to scrap, the following travel reviews data shall be retrieved if they are present. Please note that the travel reviews can come from users (TripAdvisor, Booking, Expedia) but also from travel experts/journalists (TripExpert, Telegraph Travel).
a. Overall review score
b. Review score per category (cleanliness, location, staff, comfort, free wi-fi, value for money, etc.)
c. Scraping of all reviews in all languages since November 2014. Please note that translations of reviews are not accepted.
d. Date of the travel review
e. Language of the travel review

7. User Data
– For each travel review, the following user data shall be retrieved if they are present:
a. User name
b. User gender (male/female)
c. User age/age range
d. User city
e. User country

For similar work requirement feel free to email us on info@webscrapingexpert.com

Comments
  • 7 years ago Maciej Stopierzyński

    Hello, we want to get a rating hotels from trivago or trip advisor. Can you provide us this data?

    Reply
  • 7 years ago Jens Hoffmann

    Do you offer scraping software or service?

    I need contact info, emails and maybe websites for hotels, resorts or vacation villas and travel agents by locations or countries worldwide.

    How much does it cost?

    Reply
  • 7 years ago Jim Noone

    Looking for quote for getting owner email addresses for all hotels, motels, and bed & breakfast businesses in pennsylvania and new jersey, US

    Reply
  • 6 years ago Nancy R.

    Scrape New York Hotels from Agoda.com. Will you be able to do?

    Please give quote.

    Reply
  • 6 years ago ELEANOR HUNT

    How much would it cost to get data scraped from Booking.com for USA Hotels database?

    Reply
  • 6 years ago J LAWSON

    Are you able to work with us on the following project?

    I would like to know the price for scraping hotel reviews from following Travel Directory:
    http://www.trivago.com

    Reply

Leave a Reply to Jim Noone Cancel reply