European Hotel Review Analysis
Project Overview
Sally wants to go on vacation. She is thinking about where to go and which hotel should she choose to stay? What are her options?
- Choose randomly
- Choose based on experience or recommendation of friends and family
- Choose based on Hours of Research
- Create a database and host it on AWS.
- Run various machine learning models to predict review scoring and compare which machine learning model is most accurate.
- Create a fully functioning and interactive dashboard using tableau.
- Create and host a Web application on Github to showcase results.
Data Source Description
This dataset contains 515,000 customer reviews and a scoring of 1493 luxury hotels across Europe. The geographical location of hotels is also provided. The data was scraped from Booking.com. All data in the file is publicly available to everyone already. Data is originally owned by Booking.com.
Data content
The CSV file contains 17 fields. The description of each field is as below:
- Hotel_Address: Address of the hotel.
- Review_Date: Date when the reviewer posted the corresponding review.
- Average_Score: Average Score of the hotel, calculated based on the latest comment in the last year.
- Hotel_Name: Name of Hotel
- Reviewer_Nationality: Nationality of Reviewer
- Negative_Review: Negative Review the reviewer gave to the hotel. If the reviewer does not give a negative review, then it should be: 'No Negative'
- ReviewTotalNegativeWordCounts: Total number of words in the negative review.
- Positive_Review: Positive Review the reviewer gave to the hotel. If the reviewer does not give a negative review, then it should be: 'No Positive'
- ReviewTotalPositiveWordCounts: Total number of words in the positive review.
- Reviewer_Score: Score the reviewer has given to the hotel, based on his/her experience
- TotalNumberofReviewsReviewerHasGiven: Number of Reviews the reviewers have given in the past.
- TotalNumberof_Reviews: Total number of valid reviews the hotel has.
- Tags: Tags reviewer gave the hotel.
- dayssincereview: Duration between the review date and scrape date.
- AdditionalNumberof_Scoring: There are also some guests who just made scoring on the service rather than a review. This number indicates how many valid scores without review in there.
- lat: Latitude of the hotel
- lng: longitude of the hotel
Questions we are trying to answer
- Train and evaluate various machine learning models to determine which selected machine learning model predicts the review score with the highest accuracy.
- What is the Avg Review Score per Country?
- What % are positive reviews and negative reviews per Country?
- Overtime - Did the Avg Reviews score of the Country improved or degraded?
- What are the Top 5 hotels with the highest review score per Country?
- Overtime - Did the Avg Reviews score of the Top 5 Hotels per Country improved or degraded?
- What is the Number of Reviews per Reviewer Nationality?
- What are the Top 5 Reviewer Nationality per country?
- What % are positive reviews and negative reviews per Top 5 Reviewer Nationality per Country?
- Based on the reviewer's nationality - What are the positive and negative Avg words count per Top 5 Hotels per country?
- What is the recommended hotel per country?