Locate and identify

In this recipe we use 3 ebooks to show how topic analysis can identify the different topics each text represents. We will use Latent Dirichlet Allocation (LDA) approach which is the most common modelling method to discover topics. We can then spice it up with an interactive visualization of the discovered themes. This recipe is based on Zhang Jinman's notebook found on TAPoR.

NB: Any number of texts can be used, we choose 3 for this recipe.

A common requirement in extracting information is the ability to identify all persons or characters referred to in the text. It is an elaborate process of knowing parts-of-speech in the text, tagging them and retrieving the names associated with those dialogues. The common approach is by using part-of-speech POS-tagger which analyses a sentence and associates words with their lexical descriptor i.e. whether it is an adverb, noun, adjective, conjuntion e.t.c. NLTK is a robust library and therefore the main ingredient of our recipe.

This recipe explores how to analyze a corpus for the locations that are written about within it. These results can be mapped to visualize the spatial focus of the corpus. This recipe is based off of an iPython notebook by Matthew Wilkins.

This recipe uses Collocation and Co-occurrence tools to explore the syntactic dependencies within the textual construction.

This is a recipe to identify simple themes within a text using Concordance, Collocation and List Word tools.

This recipe examines a text in a language in which you are not fluent and demonstrates a strategic approach to comprehension using text analysis tools.

This recipe uses Frequency lists, Concordance and Collocation to efficiently explore information from the web that has been made into an Aggregate text for a particular topic.

This recipe demonstrates how to use the command line 'Grep' command and regular expressions to find patterns within a plain text file.

This recipe extracts and examines a character’s dialogue from a play to explore a particular discourse in a linear fashion.

This recipe takes a text and explores the tenses and senses of word usage by combining the use of a sense finding service, the Concordance and Collocation Tools.