Construct

This recipe is part of the Text Analysis for Twitter Research (TATR) series. This recipe will describe Panda dataframe manipulation, in particular the techniques used for some of the more advanced Twitter analysis found in the TATR library.

This recipe is part of the Text Analysis for Twitter Research (TATR) series. The recipe will show how to load and save a CSV (comma-separated values) file into a Panda data structure.

Multidimensional Scaling (MDS) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. This recipe deals with several advanced text analysis concepts and methods. Links are provided to additional information on these terms.

Panda Data frame: https://pandas.pydata.org/

TF-IDF: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Principal Component Analysis (PCA) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or dissimilar, the contents of one text are to another. PCA tries to identify a smaller number of uncorrelated variables, called "principal components" from the dataset. The goal is to explain the maximum amount of variance with the fewest number of principal components.  This recipe deals with several advanced text analysis concepts and methods.

Word frequencies and counts are text analysis methods that return results about the words in a text or set of texts. Counts return the amount of times a word is used in the text, whereas frequencies give a sense of how often a word is used in comparison to others in the text.

This section presents a concise summary of what the recipe will teach, focusing primarily on: (1) the outcome of the recipe (what are you trying to achieve, in non-technical terms); (2) the main technical approaches employed; and (3) whether the recipe is based on someone else’s work/code (they should be cited if so).

This is a recipe for looking at the changes that have taken place in a wikipedia article over time, and generating a corpus of the different edited versions.

This recipe will guide you in developing a digital archival file collection suitable for deposit in an academic institution’s digital archive, including descriptive XML metadata documentation and a readme document.

This recipe is a guide to developing a detailed summary for text analysis or other research-oriented tools, based on primary and secondary sources. It is particularly useful for providing information on legacy tools that are no longer available, but is also extensible to modern tools.

This recipe is a guide to developing a review for a text analysis tool that will enable other users to decide whether that tool is suitable for their research tasks.

Pages