Statistical

In this recipe, we measure a corpora to determine authorship of the featured texts and visualize them by authorship. We will use Multidimensional scaling(MDS) as one of the techniques for analyzing similar / dissimilar data. This recipe if based on Jinman Zhang's Cookbook on Github.

This recipe will show you how to classify text into general topics. We will use supervised machine learning model called Support Vector Machines (SVM) and the 20 Newsgroups subject matter data set as the topic classifier. This recipe is based on Jinman Zhang's Cookbook.

This recipe performs three different tasks:
(1) Plot the cumulative type/Token ratio in a text;
(2) Track the occurrence of a particular word in a text and plot all occurrences of the word in a dispersion plot;
(3) Show graphically the Relative frequency of the word across n equal sub-parts of the text and add to the plot chi-square and a dispersion measure (default is Juilland's D).

This is a recipe to list words to suggest themes in a text.