TATR: Finding Popular Hashtags
Introduction
This recipe is part of the Text Analysis for Twitter Research (TATR) series. In this recipe we will show you how to use a dataset of Tweets to find the most popular hashtags by date. The results can then be manipulated by placing them in a Panda dataframe and visualized by plotting the most popular hashtag points over time.
Ingredients
- Python 3
- Natural Language Toolkit (NLTK)
- Panda
- Numpy
- Matplotlib (Plotting)
- Abstract Syntax Trees (AST)
- Twitter Tweet data
- Kynan Ly’s sample code available on TAPoR
Steps
- Open a new Jupyter Notebook and import the following libraries:
- NLTK
- PANDAS
- NUMPY
- MATPLOTLIB
- AST
- Import Twitter data
- Collapse each date’s hashtags into a list
- Write a helper function to determine most popular hashtag for each date
- Use the Pandas apply and lambda feature to save the most popular hashtags into columns and display as a graph
- Save the Pandas dataframe data of the most popular hashtags as a CSV
- Apply a graphing function to visualize the results
Discussion
The TATR library was presented as an academic poster in 2018’s Congress held in Regina, SK. For a PDF version of the full poster, please visit:
Status
Submitted by Jason on Tue, 05/01/2018 - 16:44