There are many different methods and ways to study Twitter data. Unlike more traditional text (e.g. books), discourse on twitter can encompass a wide range of perspectives, ideas, thoughts, and context. The analysis of this discourse is something that needs that requires different cleaning methods, refinement, and categorization.

In this lesson, we will make the list we created in the ‘From HTML to a List of Words’ lesson easier to analyze by normalizing this data.

In this lesson, you will learn the Python commands needed to implement the second part of the algorithm begun in the lesson ‘From HTML to a List of Words (part 1)’.

In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

This tutorial teaches you how to count the frequency of specific words in a list that can provide illustrative data. 

This is an tutorial about how to use the voyant tool for text analysis. The tutorial starts with very basic function of voyant tools and then go deeper for the function that can be used in text analysis-using knot to visualize the text. 

This tutorial is going to talk about natural language processing using Python. This tutorial is focus on how to use NLTK to do the text preprocessing such as tokenize text, get synonyms and antonyms, word stemming etc. 


This tutorial takes frequency pairs collected in Counting frequencies and outputs them in HTML, which teachs how to select a keyword and toutput all instances of that keyword, along with the words to the left and right of it, making it easy to see at a glance how the keyword is used. 

This tutorial teaches how to load raw data, sample it and visually explore and present it by using Bokeh and Pandas libraries in python. 

This tutorial covers :

  • how to load tabular CSV data 
  • how to perform basic data manipulation such as aggregating and subsampling raw data 
  • how to visualize quantitative, categorical, and geographic data for web display
  • how to add varying types of interactivity to your visualizations. 


This tutorial is about the basic of python and working with text files to compute something interesting.The first set of tutorial designed to teach the basic programming knowledge of using python to analyze text data.The second set of tutorial the author compute the proportion of positive words in tweet after cleaning up the data a bit.The third set of tutorial expand the code written in the previous two, to explore the positive and negative sentiment of any set of text.