Using Python and WordNet to Explore a Text
This recipe uses Python and WordNet to explore meaning in a text.
- A text to analyze for meaning
- Python 3
- A Python notebook editor such as Jupyter
- The NLTK library
- Example code in The Art of Literary Text Analysis
- Import the text to analyze and tokenize it
- Normalize the word tokens using Stemming or Lemmatization
- Search for meaning with WordNet
- Import WordNet
- Generate a cognitive synonym set ("synset") for a word you want to search
- Select a single use of the word (e.g. "bug" as a noun, not a verb)
- WIth the word's synset, Generate lists of hypernyms (words with more general meanings) and hyponyms (words with more specific meanings)
- Create a recursive function to "drill down" and find the lists of hyponyms for each hyponyn
- Use the function to search for all of the hyponyms of the hypernymn of the word you are interested in (e.g. "insect" for "bug")
- Look at the results, and Remove any words that should be stopwords
- Create a dispersion plot of the words
- Combine the words into a single identity and plot it
- Use the NLTK similar() function to loosely find words co-located to the identity
- Use a ContextIndex object and its word_similarity_dict() function to find word frequencies across the document
Next steps / further information
- This example is also available in The Art of Literary Text Analysis.
- Compare the count of the word you're searching in the tokens list, and its count using the WordNet and lemmatization approach.
- What's the percentage increase in coverage that we get by lemmatizing and looking for related words using WordNet?
Submitted by GregWS on Sun, 10/09/2016 - 21:45