Dictionary Based Sentiment Analysis in Python


This recipe shows how to conduct dictionary-based sentiment analysis on a collection of passages, such as tweets or reviews. It uses pre-existing dictionaries of positive and negative words, and loads a text file of passages to analyze.

Exercise Steps 

Prepare the Data for Analysis
Dictionary-based sentiment analysis works by comparing the words in a text or corpus with pre-established dictionaries of words. These dictionaries could be based around positive/negative words or other queries such as professional/casual language. Each dictionary contains a list of words that the user believes corresponds to a particular quality (e.g. 'positivity', 'impoliteness'). This recipe will focus on evaluating the positivity/negativity of a text. The first steps are to create or acquire dictionary files for positive and negative words, import the dictionaries and tokenize them into positive/negative word lists, and import a text to be analyzed.

Create Reusable Functions
You will want the abililty to easily analyze the new passages, so create functions to automate this task. One function simply takes a string of the passage to be analyzed and returns the words as a tokenized list. The main function takes the passage to analyze, and the positive/negative lists. It uses the first function to tokenize the passage, then compares the passage tokens with the word lists, evaluating whether each word is positive, negative, or neutral. The function will sum the results and return a score for overall positivity/negativity. If it were instead scanning for the professionalism of the language, it might return a value reflecting the ratio of professional/unprofessional words used.

Analyze Passages
The function can give us an overall score for a passage's positivity/negativity, but it does not take the subtleties of the results into account. These can be handled by creating variables that allow the user to set thresholds: a passage must score below a certain one to be negative, above another to be positive, or in the middle to be neutral. The analysis function can now be run and its results compared to the thresholds to present the user with a clear "Positive," "Negative," or "Neutral," result.

Organize the Data
Now that we can evaluate each passage, we may want to gather comparable passages (e.g. all positive ones). We can do this by creating an empty list for all positive passages, and running a loop to analyze the positivity of each passage. We can then add it to the list whenever the analysis returns it as positive. We may also want to gather all of the positive words being used in a passage. Create an empty list, and a new function that compares each word to the positive dictionary. Each time it finds a match, the word is added to the list. This approach can be duplicated to gather the negative words in a passage.

Sentiment analysis is widely used on social media, and so there are many tools available for undertaking it. LIWC (Linguistic Inquiry and Word Count) is a notable commercial example that analyzes how much certain categories of words are used in a text.
Further Information