This tutorial teaches how to load raw data, sample it and visually explore and present it by using Bokeh and Pandas libraries in python.
This tutorial covers :
- how to load tabular CSV data
- how to perform basic data manipulation such as aggregating and subsampling raw data
- how to visualize quantitative, categorical, and geographic data for web display
- how to add varying types of interactivity to your visualizations.
This tutorial is about how to chart time series data with line plots and categorical quantities with bar charts. How to summarize data distributions with histograms and box plots. How to summarize the relationship between variables with scatter plots.
This recipe is part of the Text Analysis for Twitter Research (TATR) series and describes how to begin plotting basic graphs using Twitter data.
This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will look at categorizing text using the General Inquirer Categories released by Harvard.
This recipe is part of the Text Analysis for Twitter Research (TATR) series. In this recipe we will show you how to use a dataset of Tweets to find the most popular hashtags by date. The results can then be manipulated by placing them in a Panda dataframe and visualized by plotting the most popular hashtag points over time.
Multiple Correspondence Analysis (MCA) is a data analysis technique that can detect and represent the underlying structures of a dataset. In terms of textual analysis, we can identify and graph simultaneously occurring variables from the texts that comprise a corpus.
You can retrieve the MCA module at https://pypi.python.org/pypi/mca/1.0.3, or by typing the following line in the Python terminal:
pip install --user mca
This recipe with show you how to prepare Voronoi diagrams, one way of showing relationships between words in a text and a search term. In order to do this, we will employ the use of word embeddings. These represent individual words in a text as real-valued vectors in a confined vector space. This recipe is based on Kynan Ly's cookbook as seen on this notebook.
Multidimensional Scaling (MDS) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. This recipe deals with several advanced text analysis concepts and methods. Links are provided to additional information on these terms.
Panda Data frame: https://pandas.pydata.org/
Principal Component Analysis (PCA) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. PCA tries to identify a smaller number of uncorrelated variables, called "principal components" from the dataset. The goal is to explain the maximum amount of variance with the fewest number of principal components. This recipe deals with several advanced text analysis concepts and methods.