Uncovering hidden sentiment and subjectivity

Research question: Can Natural Language Processing (NLP) be used to identify review sentiment and customer satisfaction trends?

Abstract: This project provides an overview of how businesses can utilise Natural Language Processing (NLP) to analyse online reviews and gain valuable insights into customer experiences. It highlights the time-consuming nature of reviewing large amounts of customer feedback and emphasises how NLP can help businesses to analyse data faster and more accurately. In addition, it provides real-world examples of how NLP can be used for analysing online reviews, including sentiment analysis, topic modelling, and entity recognition. It also emphasises the importance of selecting the appropriate NLP tools and techniques that best suit a company's specific needs. Highlighted are the benefits of using NLP for online review analysis, such as identifying emerging trends in real-time, reducing costs, and improving decision-making.

An Intro to Sentiment Analysis

Sentiment analysis is a technique used to automatically identify and extract subjective information from text, such as opinions, emotions, and attitudes. It has become increasingly popular in recent years as more and more businesses recognise the importance of understanding customer sentiment. There are several approaches to sentiment analysis, including rule-based, machine learning-based, and hybrid methods. Rule-based methods use handcrafted rules to identify sentiment, while machine learning-based methods use algorithms to learn from data. Hybrid methods combine both approaches. Sentiment analysis can be applied to a wide range of data sources, including social media, customer reviews, and news articles and can determine whether the sentiment expressed is positive, negative, or neutral, which can provide valuable insights into customer satisfaction, brand reputation, and market trends. However, it is essential to be aware of the limitations of sentiment analysis, such as the difficulty of detecting sarcasm and irony.

Sentiment Analysis Methodology 

The client data, a CSV file of approximately 5000 reviews of a hotel left on the Trip Advisor website, spans from 2003-2022. These reviews are also aligned with a customer rating out of 5, which will be used for comparative analysis. Note: Code segments are not 'complete' and are provided for reference only.

Sentiment Analysis Raw Review Data

Table 1 - Raw Review Data

Establishing the Environment

An object-oriented, high-level programming language, Python, along with several Python libraries, was primarily used to conduct this analysis. Specifically, these libraries were:

  • NLTK (Natural Language Toolkit) is a library for working with human language data in Python. 
  • Numpy (Numerical Python) is a library for working with arrays and matrices. It provides functions for performing mathematical operations on arrays, such as linear algebra, statistical analysis, and random number generation. 
  • Pandas is a library for working with data in Python. It provides data structures for efficiently storing and manipulating large datasets, such as data frames and series. 
  • Matplotlib is a plotting library for Python. It provides a wide range of functions for creating static, animated, and interactive visualizations, such as line plots, bar plots, scatter plots, histograms, and heatmaps. 
  • Seaborn is a data visualization library built on top of Matplotlib. It provides a high-level interface for creating beautiful and informative visualizations, such as violin, box, and pair plots. 

These libraries are installed as follows:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk

Each library has strengths and is used in data science and scientific computing for different purposes. Combining them allows us to perform a wide range of data-related tasks, from cleaning and manipulating data to visualising and analysing it. Two additional libraries, VADER and TextBlob were chosen to conduct the sentiment analysis. The following section is a comparison of these. 

  • Approach: TextBlob uses a lexicon-based approach, where it checks the sentiment of a given text against a pre-defined sentiment lexicon. On the other hand, VADER uses a rule-based approach, which uses a set of rules to determine the sentiment of a given text.
  • Accuracy: In terms of accuracy, both TextBlob and VADER perform well, but VADER has explicitly been designed for social media text, which tends to have a unique set of characteristics, such as informal language, emoticons, and slang. As a result, VADER tends to perform better than TextBlob on social media text.
  • Speed: TextBlob is generally faster than VADER, as its lexicon-based approach requires less computational power. However, the speed difference may not be noticeable for small-scale projects.
  • Customisation: TextBlob allows for the customisation of its sentiment lexicon, which means users can add or remove words and their corresponding sentiments. On the other hand, VADER does not allow for customisation of its rules, but it provides a set of pre-defined sentiments that can be used out of the box.
  • Multi-language Support: TextBlob supports multiple languages, including English, Spanish, German, and French. VADER, on the other hand, is specifically designed for English, and its performance in other languages may not be as good.

These libraries use a lexicon of words, each with a sentiment score that ranges from negative (-1) to positive (+1). The sentiment score of a body of text is calculated by summing up the sentiment scores of each word in the text, taking into account the intensity of the sentiment, and adjusting the score based on contextual cues such as negations, punctuation, and capitalisation.  The sentiment score does not merely sum up the positive, neutral, and negative scores but also includes additional rule-based adjustments such as punctuation amplifiers. Both have strengths and weaknesses, and choosing between them will depend on a project's specific requirements. If accuracy is a top priority, VADER might be the better choice, especially for social media text. If customisation and speed are more critical, TextBlob might be the better option. Below, we import these libraries and load the reviews from the CSV file into a Pandas DataFrame:

import TextBlob
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# Load and format CSV file
df = pd.read_csv('hotel_reviews.csv',sep='\t').astype(str)

Data Cleaning and Preprocessing

Several text cleaning and preprocessing steps were required in the review column. The first step was downloading and importing the stopwords module from the NLTK library. Stopwords are commonly used words that are often irrelevant in text analysis, such as 'the', 'and', and 'it'. Code was used to remove these stopwords from each review using a lambda function. The next step was to perform spelling correction on each word in the review column using the TextBlob module. This was done by iterating over each word in a review, correcting it, and then joining the corrected words back together.

After spelling correction, the reviews were converted to lowercase to ensure consistency in the analysis. Finally, the code removed any remaining punctuation from the reviews using the str.replace method, replacing all non-word and non-space characters with an empty string. Finally, several corrupted rows were removed. These preprocessing steps helped to clean and standardise the text data before performing sentiment analysis.

from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
stop = stopwords.words('english')

df['review'] = df['review'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))

# Spelling correction
df['review'] = df['review'].apply(lambda x: " ".join([str(TextBlob(word).correct()) for word in x.split()]))

# To lowercase
df['review'] = df['review'].str.lower()

# Remove punctuation
df['review'] = df['review'].str.replace('[^\w\s]','')

# Lemmatise the words
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in filtered_words]

Lemmatisation is a process in NLP where words are reduced to their base or root form, called a lemma. This is done by applying morphological analysis to determine the base form of the word, based on its context and definition. For example, the word "running" could be reduced to its lemma "run". The purpose of lemmatising is to group together different forms of the same word so that they can be analyzed as a single item. This can improve the accuracy of natural language processing tasks such as sentiment analysis, topic modelling, and machine translation. By reducing words to their base form, we can more easily identify patterns and relationships between words in a sentence or document.

Descriptive & Exploratory Data Analysis

We could now begin to explore the cleaned corpus of text. We were left with 4724 rows, and the reviews constituted 2,409,990 words in total. It was deemed useful to break down the reviews and categorise them by year, which was achieved using the code below. We can then see from Figure 1, that from 2003 there was a steady climb in the number of reviews per year, falling away after 2014 and a significant drop in 2020 during the national lockdown.

# Convert the date column to a datetime type
df['date'] = pd.to_datetime(df['date'])

# Extract the year from the date column
df['year'] = df['date'].dt.year

plt.figure(figsize=(14, 5))

# Plot the histogram of the row counts by year

# Add labels and title to the plot
plt.title('Row Counts by Year')

# Show the plot
Sentiment Analysis Reviews by Year

Figure 1 - Reviews by Year

The Python code below imports several libraries, including Seaborn, the NLTK corpus for stopwords, and Collections for counting. The code defines a function called "plot_top_non_stopwords_barchart" that takes text as input. This function first defines a list of stopwords to exclude from the analysis, then creates a list of words from the input text by splitting it into words and flattening it into a single list. It then uses the Counter class from the collections module to count the occurrences of each word in the text.

Next, the code extracts the 80 most common words, not in the stopword list and creates two lists, one containing the word counts and the other containing the corresponding words. It then creates a horizontal bar chart using seaborn's barplot method, with the word counts on the x-axis and the words on the y-axis. Finally, the function sets the labels for the y-axis, the tick labels for the y-axis, and the font size for both. The function is then called with a DataFrame column 'review' from a variable called 'sentiment'. The resulting bar chart shows the most frequently mentioned words in the 'review' column, excluding the stopwords defined in the function.

from collections import  Counter
from matplotlib.pyplot import figure

# Set Figure Size
figure(figsize=(16, 8), dpi=120)

def plot_top_non_stopwords_barchart(text):
    stops = stopwords.words('english')
    new_stopwords = ["-", "I", "The", "would", "We", "Hey", "get", "one", "it", "It", "us", "could", "This", "also"]
    new= text.str.split()
    corpus=[word for i in new for word in i]

    x, y=[], []
    for word,count in most[:80]:
        if (word not in stop):
    ax = sns.barplot(x=y, y=x)
    ax.set_ylabel('Most Mentioned Words', fontsize=16)
    ax.set_yticklabels(labels=x, fontsize=14)
    ax.tick_params(axis='x', labelsize=14)
Sentiment Analysis Most Popular Words

Figure 2 - Most Popular Words

Another way to visualise this output is through a word cloud, which can easily be generated in Python. The code below stems and tokenises the individual words, updates the stopwords list and produces the visualisation in Figure 3.

from nltk.stem import WordNetLemmatizer,PorterStemmer
from nltk.tokenize import word_tokenize
from wordcloud import WordCloud, STOPWORDS

text = " ".join(review for review in df.review.astype(str))
stopwords = set(STOPWORDS)
stopwords.update(["The", "hotel", "u", "etc", "two", "back", "go", "bit", "told", "came", "one", "us"])

# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, background_color="white", width=1600, height=800).generate(text)
Sentiment Analysis Wordcloud

Figure 3 - Analysis Wordcloud

Results and Analysis

Beyond the descriptive analysis, we could now seek to ascertain the sentiment of each review by looking at the polarity and subjectivity. The output from the two chosen libraries will differ; therefore, it was decided to smooth out these results and work from a mean score. Following this, the subjectivity of each review was calculated, and the reviews were then classified according to the mean score. The accuracy of the rating was analysed by comparing this with the polarity classification and finally the Net Promoter Score® was calculated.

Sentiment - Polarity & Subjectivity

Using the NLTK library, the code below adds a new column to the df DataFrame called 'v_scores'. The function being applied is a lambda function that takes a single argument, review, and calls the polarity_scores() function of a previously defined object named sid on it. polarity_scores() is a function provided by the NLTK package's SentimentIntensityAnalyzer class. This function takes a text string and returns a dictionary of scores representing the text's positivity, negativity, and neutrality, as well as an overall compound score summarising the sentiment. The lambda function returns this dictionary for each review in the DataFrame, and the resulting list of dictionaries is stored in the new 'v_scores' column.

df['v_scores'] = df['review'].apply(lambda review:sid.polarity_scores(review))
df['v_compound'] = df['v_scores'].apply(lambda d:d['compound'])

The second line adds a second new column to the df DataFrame called 'v_compound'. The column is created by applying another lambda function to each value in the 'v_scores' column of the DataFrame using the apply() method. The lambda function takes a single argument, d, a dictionary of sentiment scores for a particular review. The lambda function returns just the 'compound' score from this dictionary, representing an overall sentiment score for the review. The resulting list of compound scores is stored in the new 'v_compound' column.

Sentiment Analysis VADER Calculations

Table 2 - VADER Calculations

The analysis was then conducted using the TextBlob library, and this was added to the DataFrame, before calculating the mean polarity score. Descriptive statistics for these can be seen in Table 3 and Figure 4.

# Extract polarity score
df["tb_polarity"] = [TextBlob(comment).polarity for comment in df["review"]]

# Extract subjectivity score
df["tb_subjectivity"] = [TextBlob(comment).subjectivity for comment in df["review"]]

# Calculate the mean sentiment score across TextBlob and VADER and add a new column to the dataframe
df['mean_score'] = df[['tb_polarity', 'v_compound']].mean(axis=1)
Sentiment Analysis Descriptive Statistics

Table 3 - Descriptive Statistics

Sentiment Analysis Descriptive Histograms

Figure 4 - Descriptive Histograms

We then brought these metrics together by way of a scatterplot visualisation. Scatterplots map the relationship between two variables and can help identify correlations, clusters, outliers, and other data features that might be important for further analysis.

# Create a scatter plot showing the relationship between the 'v_compound' and 'tb_subjectivity' columns
plt.figure(figsize=(12, 6), dpi=120)
plt.scatter(df['v_compound'], df['tb_polarity'], c=df['v_compound'], cmap='coolwarm')

# Set the colorbar label and tick labels
cbar = plt.colorbar()
cbar.set_label('VADER Compound Score')
cbar.set_ticks([-1, 0, 1])

# Set the chart title and axis labels
plt.title('Sentiment Analysis Results')
plt.xlabel('VADER Compound Score')
plt.ylabel('TextBlob Subjectivity Score')

# Show the chart
Sentiment Analysis VADER and TextBlob Scatterplot

Figure 5 - VADER and TextBlob Scatterplot

A simple way to classify the mean polarity value was to map the numeric value of this to one of several categorical values that describe the sentiment of a text review. The following code defines a function to map numeric values to categorical sentiment categories and applies the function to a DataFrame column to create a new column with sentiment categories. It should be noted that the range boundaries are fluid, i.e. > 0.6 and x < 1 and can be altered to suit. We can then see the output of this in Table 4.

# Defining all the conditions inside a function
def condition(x):
    # Define Scope
    if x > 0.6 and x < 1:
        return "Very Positive"
    elif x > 0.2 and x < 0.6:
        return "Moderately Positive"
    elif x > -0.2 and x < 0.2:
        return "Neutral"
    elif x > -0.6 and x < -0.2:
        return "Moderately Negative"
        return 'Very Negative'
# Apply the conditions
df['result'] = df['mean_score'].apply(x)
Sentiement Analysis Mean Score and Classification

Table 4 - Mean Score and Classification

Once the categories had been applied, a bar chart was created to visualise the distribution further. The code below, leveraging the Matplotlib library, produces the chart in Figure 6.

def plot_result_barchart(result):
    plt.figure(figsize=(12, 6), dpi=120)
    # Set the chart title and axis labels
    plt.title('Category Count')
            result.value_counts(), color=('#7ca9f2','#99de9d','#b7b7b7','#eb8f67','#ff6565'))
    # Show values
    for i, v in enumerate(result.value_counts()):
        plt.text(i - 0.1, v + 50, str(v))
Sentiment Analysis Mean Score Classification Count

Figure 6 - Mean Score Classification Count

Sentiment - Accuracy & Alignment

In summary, this next function classifies the accuracy of the rating based on the absolute difference between this and the  mean_score. The function returns a string indicating the level of accuracy, ranging from 'Very accurate' to 'Very inaccurate'. Like the classification of the mean score, the range boundaries are fluid.

# Define a function to compare the 'mean score' and 'rating' columns for each row
def compare_rating_accuracy(row):
    rating_level = int(row['rating'])/5
    diff = abs(row['mean_score'] - rating_level)

    if diff < 0.2:
        return 'Very accurate'
    elif diff < 0.4:
        return 'Accurate'
    elif diff < 0.6:
        return 'Somewhat accurate'
    elif diff < 0.8:
        return 'Inaccurate'
        return 'Very inaccurate'

# Apply the 'compare_rating_accuracy' function to each row of the DataFrame
df['rating_alignment'] = df.apply(compare_rating_accuracy, axis=1)

# Get the value counts of the 'rating_accuracy' column
value_counts = df['rating_alignment'].value_counts()
Sentiment Analysis Rating Accuracy Alignment

Figure 7 - Rating Accuracy Alignment

Net Promoter Score

We then looked to calculate a further measure, the Net Promoter Score® (NPS), which is a useful tool for measuring customer loyalty and gauging customer satisfaction with a business or product. It is based on a single question: "How likely is it that you would recommend our product/service to a friend or colleague?" The NPS is calculated by subtracting the percentage of detractors (customers who respond with a score of 0-6) from the percentage of promoters (customers who respond with a score of 9-10). The result is a score between -100 and 100, where a score of 100 indicates that all customers are promoters and a score of -100 indicates that all customers are detractors.

The following lines of code transform the continuous 'rating' variable into three binary variables that classify each observation as a detractor, passive, or promoter. The output can be seen in Table 5. It then offers some descriptive NPS statistics, and we note the following: Promoters: 646, Passive: 958, Detractors: 3120. The Net Promoter Score is: -52.37 across the entire corpus.

df = df.astype({'rating':'int'})

df['detractor'] = np.where(df['rating'] < 4, 1, 0)
df['passive'] = np.where(df['rating'] == 4, 1, 0)
df['promoter'] = np.where(df['rating'] == 5, 1, 0)

# Calculate the total number of respondents
total = len(df)

# Calculate the number of promoters, passives and detractors
promoters = len(df[df['promoter'] == 1])
passives = len(df[df['passive'] == 1])
detractors = len(df[df['detractor'] == 1])

# Calculate the net promoter score
nps = (promoters - detractors) / total * 100

df[['review', 'rating', 'detractor', 'passive', 'promoter']].sample(5)
Sentiment Analysis NPS Output

Table 5 - NPS Output

N-gram analysis is a technique in NLP that involves counting the frequency of contiguous sequences of N words (called N-grams) in a text corpus. This code provides two functions that perform N-gram analysis on text descriptions using scikit-learn's CountVectorizer class. The functions return the top N-grams and their counts as a list of tuples. The output can be seen in Table 6.

# Create DataFrame objects
df_detractor = df[df['detractor']==1]
df_passive = df[df['passive']==1]
df_promoter = df[df['promoter']==1]

# N-Gram Analysis
from sklearn.feature_extraction.text import CountVectorizer

def code_bigrams(descriptions, n=None):
    vec = CountVectorizer(ngram_range = (2,2), max_features = 20000).fit(descriptions)
    bag_of_words = vec.transform(descriptions)
    sum_words = bag_of_words.sum(axis = 0) 
    words_freq = [(word, sum_words[0, i]) for word, i in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse = True)
    return words_freq[:n]

def code_trigrams(descriptions, n=None):
    vec = CountVectorizer(ngram_range = (3,3), max_features = 20000).fit(descriptions)
    bag_of_words = vec.transform(descriptions)
    sum_words = bag_of_words.sum(axis = 0) 
    words_freq = [(word, sum_words[0, i]) for word, i in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse = True)
    return words_freq[:n]
Sentiment Analysis NPS Promoter vs Detractor Words

Table 6 - NPS Promoter vs Detractor Words

Putting it all Together

Thus far, the analysis has treated the corpus of reviews as one body, but we could make more informed decisions and recommendations by understanding how this data changed over time. There are several benefits, including:

  • Trend identification: By tracking changes over time, you can identify long-term trends, seasonal fluctuations, and other patterns to help you make more informed decisions.
  • Seasonal variations: By understanding these patterns, you can better plan for busy and slow periods, optimize your marketing efforts, and make more informed decisions about inventory and staffing.
  • Performance measurement: By setting benchmarks and tracking your progress, you can identify areas where you are doing well and need to improve.
  • Prediction and forecasting: By analyzing historical data and identifying trends, you can make more accurate predictions about future demand, revenue, and other key metrics.
  • Root cause analysis: By tracking changes over time and correlating them with other variables, you can identify the factors contributing to changes in your data and take steps to address them.

The code below plots the Mean Sentiment, Subjectivity and NPS Over Time and allows us to see clear trends and alignment. The output can be seen in Figure 8.

# Set the y-axis label for the sentiment scores

# Rotate the x-axis labels for better readability

# Add horizontal grid lines for the y-axis

# Set the chart title and axis labels
ax.set_title("Mean Sentiment, Subjectivity and NPS Over Time")

# Create a second y-axis for the NPS values
ax2 = ax.twinx()

# Plot the NPS values on the right y-axis
ax2.plot(yearly_nps.index.astype(str), yearly_nps.values, label="NPS", color="gray", linestyle="--")

# Add value labels above the points for each line
for line in ax.lines:
    for x, y in zip(line.get_xdata(), line.get_ydata()):
        ax.annotate('{:.2f}'.format(y), xy=(x, y), xytext=(0, 5), 
        textcoords='offset points', ha='center', va='bottom', fontsize=8)

# Set the y-axis label for the NPS values

# Add a legend to the chart
ax.legend(loc="upper left")
ax2.legend(loc="upper center", bbox_to_anchor=(0.36, 0.997), ncol=1)

# Show the chart
Sentiment, Subjectivity and NPS Over Time

Figure 8 - Sentiment, Subjectivity and NPS Over Time


In conclusion, sentiment analysis of text is a powerful tool for organisations seeking to gain insights into customer opinions, preferences, and behaviours. The sentiment analysis process involves using natural language processing and machine learning algorithms to categorise the sentiment expressed in text data, such as these online reviews, as positive, negative, or neutral. The resulting data can be used to obtain critical metrics and insights, including sentiment polarity, subjectivity, and intensity, as well as information about customer pain points, preferences, and brand perception. Once sentiment analysis has been performed on review data, several key metrics and insights can be obtained:

  • Sentiment polarity: This metric indicates the overall sentiment expressed in the data, usually as positive, negative, or neutral.
  • Sentiment subjectivity: This metric measures the subjectivity of the sentiment expressed in the data, with values ranging from objective (0) to subjective (1).
  • Sentiment intensity: This metric measures the strength of the sentiment expressed in the data, with values ranging from weak to strong.
  • Top positive and negative keywords: This insight identifies the data's most frequently used positive and negative words and phrases.
  • Customer pain points: This insight identifies common complaints, frustrations, and areas for improvement expressed by customers on social media.
  • Customer preferences: This insight identifies common themes and topics of interest among customers on social media, providing insight into customer preferences and behaviours.
  • Brand perception: This insight provides a broad understanding of how customers perceive a brand or company based on the sentiment expressed about the brand on social media.
  • Competitor analysis: This insight compares the sentiment expressed about a company and its competitors, allowing companies to understand how they stack up against their competitors in customers' eyes.

These metrics and insights can also be used to monitor the effectiveness of their marketing and customer service efforts and make adjustments as needed.

Recommendations for Further Study

Recommendations for further study in the area of Natural Language Processing (NLP) for online review analysis could include developing more advanced NLP techniques to identify sarcasm, irony, and other forms of implicit sentiment in online reviews. Further research could also investigate the impact of differences in language and culture on online review sentiment analysis and the impact of changes in product quality or features on sentiment analysis. Additionally, it would be valuable to examine how sentiment analysis can be used to predict future customer behaviour, such as churn, and how these predictions can be incorporated into marketing strategies. Further studies could also analyze the effectiveness of different sentiment analysis tools and techniques across different industries and business models.