LING-460: Textual Analysis with R at UNC-Chapel Hill, Spring 2025.
Consumer reviews provide rich insights into the evaluative language that people use to express their satisfaction or dissatisfaction. The way consumers articulate their experiences can reveal underlying linguistic patterns, where factors such as lexical diversity and emotional tone contribute to the expression of opinions. Understanding these patterns has implications for marketing, consumer research, and sentiment analysis in natural language processing. Our project addresses the broader question of whether the language used in negative reviews differs fundamentally from that used in positive reviews.
Dataset: https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews?resource=download.
Do negative reviews exhibit lower lexical diversity and higher negative sentiment intensity than positive reviews?
We hypothesize that:
Based on the hypotheses:
Our approach involved the following steps:
Data Acquisition:
We used the publicly available Amazon Fine Food Reviews dataset from Kaggle, containing over 500,000 reviews with ratings spanning from 1 to 5 stars.
Time
column to human-readable dates.Score
field was numeric.tidytext
package, we tokenized the review texts and counted the frequency of negative words.Sentiment | Mean TTR | SD TTR | Mean Negative Intensity | SD Negative Intensity | Count |
---|---|---|---|---|---|
Negative | 0.807 | 0.102 | 0.0368 | 0.0296 | 82,037 |
Positive | 0.827 | 0.100 | 0.0179 | 0.0206 | 443,777 |
Interpretation:
Negative reviews have a lower average TTR and higher negative sentiment intensity compared to positive reviews.
The linear regression model predicting TTR using a binary sentiment indicator and negative sentiment intensity produced the following results:
Our analysis included:
Descriptive Comparison:
The summary statistics indicated clear differences in both TTR and negative sentiment intensity between negative and positive reviews.
Statistical Testing:
Both independent-sample t-tests resulted in highly significant differences between the groups (p < 2.2e-16). The significant t-test results for TTR confirm that negative reviews use more repetitive language (lower lexical diversity) than positive reviews. The t-test results for negative sentiment intensity demonstrate that negative reviews incorporate a higher proportion of negative words.
Regression Modeling:
The regression analysis further supported these findings. The significant negative coefficient for the sentiment binary variable confirmed that negative sentiment is associated with reduced lexical diversity. Although the contribution of negative intensity to TTR was marginally significant, its inclusion in the model offers additional nuance to the relationship between review language and sentiment.
The report includes the following key visualizations:
Based on our analyses, we conclude the following:
Lexical Diversity:
Negative reviews exhibit significantly lower lexical diversity (i.e., a lower TTR) than positive reviews. This supports our hypothesis that more negative reviews tend to be more repetitive in their language use.
Negative Sentiment Intensity:
Negative reviews have a significantly higher intensity of negative words compared to positive reviews, indicating a more emotionally charged language in negative reviews.
Regression Findings:
The regression model reinforces these findings by revealing that review sentiment is a significant predictor of TTR. Even though the effect of negative sentiment intensity on lexical diversity was less pronounced, the overall model confirms that sentiment plays an important role.
Implications:
These conclusions suggest that evaluative language in negative reviews is characterized by less lexical variety and a greater focus on negative sentiment. This may have implications for improving sentiment analysis tools, refining review summarization techniques, and understanding consumer behavior.
Future Directions:
Further research could explore additional linguistic features (e.g., syntactic complexity, use of modifiers) or investigate how these patterns might evolve over time. Additionally, comparing these findings with other types of online reviews could provide broader insights.