Intermediate

This tutorial takes frequency pairs collected in Counting frequencies and outputs them in HTML, which teachs how to select a keyword and toutput all instances of that keyword, along with the words to the left and right of it, making it easy to see at a glance how the keyword is used. 

This tutorial teaches how to load raw data, sample it and visually explore and present it by using Bokeh and Pandas libraries in python. 

This tutorial covers :

  • how to load tabular CSV data 
  • how to perform basic data manipulation such as aggregating and subsampling raw data 
  • how to visualize quantitative, categorical, and geographic data for web display
  • how to add varying types of interactivity to your visualizations. 

 

This recipe will compare two machine learning approaches to see which is more likely to give an accurate analysis of sentiment. Both approaches analyse a corpora of positive and negative Movie Review data by training and thereafter testing to get an accuracy score. The techniques are Support Vector Machines (SVM)  and Naive Bayes.

This recipe is part of the Text Analysis for Twitter Research (TATR) series, and will look at tokenizing and extracting key features from a Tweet.

This recipe is part of the Text Analysis for Twitter Research (TATR) series. This recipe will describe Panda dataframe manipulation, in particular the techniques used for some of the more advanced Twitter analysis found in the TATR library.

This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will show how to load and save a CSV (comma-separated values) file into a Panda data structure.

This recipe is part of the Text Analysis for Twitter Research (TATR) series and describes how to begin plotting basic graphs using Twitter data.

This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will look at categorizing text using the General Inquirer Categories released by Harvard

This recipe is part of the Text Analysis for Twitter Research (TATR) series. In this recipe we will show you how to use a dataset of Tweets to find the most popular hashtags by date. The results can then be manipulated by placing them in a Panda dataframe and visualized by plotting the most popular hashtag points over time.

Multiple Correspondence Analysis (MCA) is a data analysis technique that can detect and represent the underlying structures of a dataset. In terms of textual analysis, we can identify and graph simultaneously occurring variables from the texts that comprise a corpus.

You can retrieve the MCA module at https://pypi.python.org/pypi/mca/1.0.3, or by typing the following line in the Python terminal:

pip install --user mca

Pages