TATR: General Inquirer Category

Introduction 

This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will look at categorizing text using the General Inquirer Categories released by Harvard

Ingredients 
Steps 
  • Open a new Jupyter Notebook and import the following libraries:
    • PANDAS
    • NUMPY
    • MATPLOTLIB
    • CSV
  • Open your csv file and save it into a variable
  • Declare the categories you wish to use
  • Define helper functions for the categories
    • Set all the categories of a word into a set if that word did not exist already
    • Combine all matching words into one larger set
    • Assign every word in a set with the categories
    • Creates columns and fill them with how many of each category the text belongs to
    • Look through the tokens and increment the appropriate categories that it belongs to
  • Initialize the categories using the helper functions
  • Use the helper functions to count the occurrence of each category and set its index to the date
  • Display the categories in comparison to each other using a graph
  • Define a function to convert the category data into percentages
  • Plot the category data percentages in a graph to see the percentage over time
Discussion 

The TATR library was presented as an academic poster in 2018’s Congress held in Regina, SK. For a PDF version of the full poster, please visit:

Next steps / further information 

Certain aspects of this recipe draw upon code from the companion TATR notebooks and recipes. In particular, please see:

TATR: Panda and CSV of Tweets

TATR: Graphing Twitter Data

Status