Recipes

This recipe will use regular expressions to clean up a webpage. This is useful if you want to carry out any meaningful textual analysis of the content in a web page. We can remove the html tags...

This recipe will show you how to classify text into general topics. We will use supervised machine learning model called Support Vector Machines (...

This recipe with show you how to prepare Voronoi diagrams, one way of showing relationships between words in a text and a search term. In order to do this, we will employ the use of word...

This recipe will show how to generate basic concordances of a word and showing it within a textual context. We will use "find and replace" strategy known as...

Multidimensional Scaling (MDS) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or...

Principal Component Analysis (PCA) is a method to convert sets of document terms into a data frame that can then be visualized. The distances expressed in the visualization show how similar, or...

Similar to finding People and Characters, finding locations in text is a common exploratory...

In this recipe we use 3 ebooks to show how topic analysis can identify the different topics each text represents. We will use Latent Dirichlet Allocation (LDA) approach which is the most common ...

Word frequencies and counts are text analysis methods that return results about the words in a text or set of texts. Counts return the amount of times a word is used in the text, whereas...

Tokenization is the process of splitting a sentence or a chunk of text into its constituent parts. These “tokens” may be the letters, punctuation, words, or sentences. They could even be a...

Pages