This recipe is part of the Text Analysis for Twitter Research (TATR) series and describes how to begin plotting basic graphs using Twitter data.
This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will look at categorizing text using the General Inquirer Categories released by Harvard.
This recipe is part of the Text Analysis for Twitter Research (TATR) series. In this recipe we will show you how to use a dataset of Tweets to find the most popular hashtags by date. The results can then be manipulated by placing them in a Panda dataframe and visualized by plotting the most popular hashtag points over time.
Multiple Correspondence Analysis (MCA) is a data analysis technique that can detect and represent the underlying structures of a dataset. In terms of textual analysis, we can identify and graph simultaneously occurring variables from the texts that comprise a corpus.
You can retrieve the MCA module at https://pypi.python.org/pypi/mca/1.0.3, or by typing the following line in the Python terminal:
pip install --user mca
This recipe uses regular expressions (or Regex) to clean a text document. This recipe is based on the Using Regular Expressions to Clean a Text code.
This recipe will use regular expressions to clean up a webpage. This is useful if you want to carry out any meaningful textual analysis of the content in a web page. We can remove the html tags and other unnecessary textual elements with this method.
This recipe with show you how to prepare Voronoi diagrams, one way of showing relationships between words in a text and a search term. In order to do this, we will employ the use of word embeddings. These represent individual words in a text as real-valued vectors in a confined vector space. This recipe is based on Kynan Ly's cookbook as seen on this notebook.
This recipe will show how to generate basic concordances of a word and showing it within a textual context. We will use "find and replace" strategy known as regular expressions in this approach. Regex, as it is also known is universal to most programming languages and is a well documented method of parsing text. This recipe is based on Jinman's cookbook.