This recipe discusses ways to find electronic texts (e-texts) online that can be used by other text analysis tools.
Compare word usage across two corpora to see if there is any difference between the two. Look at a word in the two texts, you use the U test to determine if the difference is significant. Helps determine the uniqueness of a term.
Term Frequency-Inverse Document Frequency or TF-IDF, is used to determine how important a word is within a single document of a collection. It will help determine the importance or weight of word to a document in a collection or corpus. It ranks the importance of word based on how often it appears in a text but the rank is offset by how often it occurs in the whole collection.
This recipe explores how to analyze a corpus for the locations that are written about within it. These results can be mapped to visualize the spatial focus of the corpus. This recipe is based off of an iPython notebook by Matthew Wilkins.
This recipe uses Python and the NLTK to explore repeating phrases (ngrams) in a text. An ngram is a repeating phrase, where the 'n' stands for 'number' and the 'gram' stands for the words; e.g. a 'trigram' would be a three word ngram.
This recipe helps you explore how to analyze a text based on its parts of speech (e.g. nouns, adjectives, prepositions, etc.).
This recipe uses Python and WordNet to explore meaning in a text.
This recipe shows how to graph data in Python using the Matlablib library.
This recipe shows how to create a basic concordance tool in Python.
This recipe shows how to perform Collocation on a text, finding which words collocate with a search term.