Sentiment score calculation

An in-depth analysis of how we calculate sentiment score using the VADER lexicon tool.

After collecting the verbatim (open ended comment) from the survey or other data source:

  1. We process it with the sentiment tool
  2. The tool uses industry standard VADER sentiment analysis.
  3. The score attributed ranges from -5 to 5.
  4. If it is positive then it is > 0
  5. If it is negative then it is < 0
     

Lexicon and rule-based sentiment analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media and open text. It is fully open-sourced under licence from MIT.

This is an overview of the technical aspects of VADER: https://github.com/cjhutto/vaderSentiment

And this is a more theoretical overview of VADER’s application: https://blog.quantinsti.com/vader-sentiment/

 


VADER is sensitive to both the

  • Polarity of a word (whether the sentiment is positive or negative), and the
  • Intensity of the emotions associated with the word (how positive or negative is the attributed sentiment)


Valence Scoring

VADER incorporates this by providing a Valence Score to the word into consideration.

Example Valence score of some context-free text (i.e. words literally 'taken out of context' i.e with no surrounding text to amplify the words' meaning on face value) are:

  • Positive Valence: "okay" is 0.9 "good" is 1.9, and "great" is 3.1
  • Negative Valence: "horrible" is –2.5, emoticon ' :( ' is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5

Note that positive, negative and neutral proportions represent the "raw categorization" of each lexical item (e.g., words, emoticons/emojis, or initialisms) into positive, negative, or neutral classes.

They do not account for the VADER rule-based enhancements such as:

  • word-order sensitivity for sentiment-laden multi-word phrases
  • degree modifiers
  • word-shape amplifiers
  • punctuation amplifiers
  • negation polarity switches, or
  • contrastive conjunction sensitivity.


How does VADER calculate the Valence score of an input sentence?

VADER makes use of certain rules to incorporate the impact of each sub-text on the perceived intensity of sentiment in sentence-level text.

Five Heuristics of VADER:

  1. Punctuation, namely the exclamation point (!), increases the magnitude of the intensity without modifying the semantic orientation. For example: “The weather is hot!!!” is more intense than “The weather is hot.”
     
  2. Capitalization, specifically using ALL-CAPS to emphasize a sentiment-relevant word in the presence of other non-capitalized words, increases the magnitude of the sentiment intensity without affecting the semantic orientation. For example: “The weather is HOT.” conveys more intensity than “The weather is hot.”
     
  3. Degree modifiers (also called intensifiers, booster words, or degree adverbs) impact sentiment intensity by either increasing or decreasing the intensity. For example: “The weather is extremely hot.” is more intense than “The weather is hot.”, whereas “The weather is slightly hot.” reduces the intensity.
     
  4. Polarity shift due to Conjunctions, The contrastive conjunction “but” signals a shift in sentiment polarity, with the sentiment of the text following the conjunction being dominant. For example: “The weather is hot, but it is bearable.” has mixed sentiment, with the latter half dictating the overall rating.
     
  5. Catching Polarity Negation, By examining the contiguous sequence of 3 items preceding a sentiment-laden lexical feature, we catch nearly 90% of cases where negation flips the polarity of the text.


In short, details matter:

  • punctuation - how many exclamation marks are used if any, etc
  • emoticons/emojis - the algorithm recognize these too as well as any more common slang words and such (like ‘meh’ and ‘bleh’)
  • using capitals - writing words with full caps is calculated differently


So, for example, a group of sentences that at first glance appear identical, but in fact have difference punctuation and emphases, can be scored quite differently as on this example:

 

 

Was this article helpful?