Visualization

Multidimensional Scaling (MDS) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. This recipe deals with several advanced text analysis concepts and methods. Links are provided to additional information on these terms.

Panda Data frame: https://pandas.pydata.org/

TF-IDF: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Principal Component Analysis (PCA) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. PCA tries to identify a smaller number of uncorrelated variables, called "principal components" from the dataset. The goal is to explain the maximum amount of variance with the fewest number of principal components.  This recipe deals with several advanced text analysis concepts and methods.

In this recipe we use 3 ebooks to show how topic analysis can identify the different topics each text represents. We will use Latent Dirichlet Allocation (LDA) approach which is the most common modelling method to discover topics. We can then spice it up with an interactive visualization of the discovered themes. This recipe is based on Zhang Jinman's notebook found on TAPoR.

NB: Any number of texts can be used, we choose 3 for this recipe.

Word frequencies and counts are text analysis methods that return results about the words in a text or set of texts. Counts return the amount of times a word is used in the text, whereas frequencies give a sense of how often a word is used in comparison to others in the text.

This recipe shows how to graph data in Python using the Matlablib library.

This recipe takes data provided by textual analysis tools and uses Microsoft Excel to create graphs to aid in its interpretation.

    This is a recipe to use textual visualization tools such as a Word cloud or visual Collocation tool to identify streams of thought, trends and potential avenues for scholarly investigation.

    Pages