TATR: General Inquirer Category
Introduction
This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will look at categorizing text using the General Inquirer Categories released by Harvard.
Ingredients
- Python 3
- Panda
- Numpy
- Matplotlib (Plotting)
- A Comma Separated Values (CSV) file
- Kynan Ly’s sample code available on TAPoR
Steps
- Open a new Jupyter Notebook and import the following libraries:
- PANDAS
- NUMPY
- MATPLOTLIB
- CSV
- Open your csv file and save it into a variable
- Declare the categories you wish to use
- Define helper functions for the categories
- Set all the categories of a word into a set if that word did not exist already
- Combine all matching words into one larger set
- Assign every word in a set with the categories
- Creates columns and fill them with how many of each category the text belongs to
- Look through the tokens and increment the appropriate categories that it belongs to
- Initialize the categories using the helper functions
- Use the helper functions to count the occurrence of each category and set its index to the date
- Display the categories in comparison to each other using a graph
- Define a function to convert the category data into percentages
- Plot the category data percentages in a graph to see the percentage over time
Discussion
The TATR library was presented as an academic poster in 2018’s Congress held in Regina, SK. For a PDF version of the full poster, please visit:
Status
Submitted by Jason on Tue, 05/01/2018 - 16:49