Intermediate

Compare word usage across two corpora to see if there is any difference between the two. Look at a word in the two texts, you use the U test to determine if the difference is significant. Helps determine the uniqueness of a term.

Term Frequency-Inverse Document Frequency or TF-IDF, is used to determine how important a word is within a single document of a collection. It will help determine the importance or weight of word to a document in a collection or corpus. It ranks the importance of word based on how often it appears in a text but the rank is offset by how often it occurs in the whole collection. 

This recipe explores how to analyze a corpus for the locations that are written about within it. These results can be mapped to visualize the spatial focus of the corpus. This recipe is based off of an iPython notebook by Matthew Wilkins.