NLP Sentiment Analysis - A POC for User Research

Sentiment analysis is a natural language processing (NLP) technique that aims to detect/extract the tone or feeling from a body of text. Usually, sentiment is classified as being positive, negative, or neutral which gives us a measure of "the level of satisfaction." However, it can easily be used to classify emotion, intention, and even predict user ratings.

Problem

A comprehensive understanding of every review ever written was required to create a report for a client that showcased the overall satisfaction of the customer base which included over 50k reviews.

Solution

Data was aggreagated from several data sources and processed using a machine learning model trained for sentiment analysis. The data was then filtered to show relevant comparisons among various brands. This was done through a proof of concept application which was created using Python and a machine learning model.

from transformers import pipeline
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from wordcloud import WordCloud
from wordcloud import STOPWORDS
from collections import Counter
from sklearn.metrics import accuracy_score

Using Pre-trained Models with Huggingface 🤗

In machine learning there is a concept called transfer learning which in short, means using a pre-trained model as the starting point for another task. Huggingface is a platform that provides access to thousands of these models so that you don't need to spend time and resources creating models from scratch. There are more that 200 sentiment analysis models available on the hub and using them only takes a few lines of code.

model = pipeline(model = "nlptown/bert-base-multilingual-uncased-sentiment")

This model is a BERT model fine tuned for sentiment analysis tasks. For context, BERT, which stands for Bidirectional Encoder Representations from Transformers, is a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection. Being a transformer, its structure is different from the neural networks previously used for language modeling by adding what's known as attention. Without getting too deep into it, attention allows the model to give more weight to the parts of text that are most relevant.

Balancing Data

In order to provide data privacy, this case data is being substituted with data from Kaggle and consists of over 65k cell phones reviews from different companies. The dataset is split into two tables so both of tables were joined together to get the "main" dataset.

reviews = pd.read_csv('20191226-reviews.csv')
# rename body to reviews

reviews.rename({'body':'review'}, axis = 1, inplace = True)items = pd.read_csv('20191226-items.csv')

Preprocess Data

To minimize processing time, all data was dropped where there wasn't a valid review or company name present.

dataset = dataframe.dropna(subset = ['brand', 'review'])

Prdicting User Sentiment

Next the dataset reviews were passed to the sentiment model to obtain predicted ratings.

1
2

reviews = subset['review'].tolist()
ratings = model(reviews, truncation = True)

A list of dictionaries was create using the predicted ratings.

[{ 'label': '2 stars', 'score': 0.1618033988749894 }]

Then the data extracted both the "label" and the "score" for each prediction which was used to add new columns to the existing dataframe.

predicted_rating = [int(r['label'][0]) for r in ratings]
rating_confidence = [round(r['score'] * 100, 2) for r in ratings]
subset['predicted_rating'] = predicted_rating
subset['rating_confidence'] = rating_confidence

To get a more indepth understanding, the data was filtered by brand and charts were used to get a better view of the prediction of rating in comparison to the actual rating. This was helpful with evaluating the findings.

Results

Performance of how well the model did at predicting sentiment as well as the predictions themselves were used to learn something about customer satisfaction for each company.

Since the original dataset contains the actual rating for each review, an accuracy score could be calculated to get an idea of how well the model performed. A way to convert star ratings to sentiment was also created.

labeler = {
  1: 'Very negative',
  2: 'Negative',
  3: 'Neutral',
  4: 'Positive',
  5: 'Very positive'
}
subset['sentiment'] = subset['predicted_rating'].apply(lambda rating: labeler[rating])

Very negative

Rating of 1

Negative

Rating of 2

Neutral

Rating of 3

Positive

Rating of 4

Very positive

Rating of 5

Now a clear view of the overall sentiment could be seen.

The results of the model could also be used to create a report that showcases the percentage of each sentiment category for all reviews for each company. By example here we are looking at a report of Mototorola negative review comparing the actual rating vs predictied rating along with the confidence of the prediction.

1

Actual Rating

2

Predicted Rating

57.72

Confidence

Although this is a good way to measure performance per company, it isn't a very good method for understanding how one company may compare to others since there aren't an equal number of observations for each.

Word Cloud

Lastly, a word cloud visualization was created to give an idea of the words commonly associated with each sentiment of category. By example below is a word cloud generated by the filtering of "Very positive" reviews of Motorola from the data.