Using Python and WordNet to Explore a Text


This recipe uses Python and WordNet to explore meaning in a text.

  • Import the text to analyze and tokenize it
  • Normalize the word tokens using Stemming or Lemmatization
  • Search for meaning with WordNet
    • Import WordNet
    • Generate a cognitive synonym set ("synset") for a word you want to search
    • Select a single use of the word (e.g. "bug" as a noun, not a verb)
    • WIth the word's synset, Generate lists of hypernyms (words with more general meanings) and hyponyms (words with more specific meanings)
    • Create a recursive function to "drill down" and find the lists of hyponyms for each hyponyn
    • Use the function to search for all of the hyponyms of the hypernymn of the word you are interested in (e.g. "insect" for "bug")
    • Look at the results, and Remove any words that should be stopwords
    • Create a dispersion plot of the words
    • Combine the words into a single identity and plot it
    • Use the NLTK similar() function to loosely find words co-located to the identity
    • Use a ContextIndex object and its word_similarity_dict() function to find word frequencies across the document
Next steps / further information 
  • This example is also available in The Art of Literary Text Analysis.
  • Compare the count of the word you're searching in the tokens list, and its count using the WordNet and lemmatization approach.
  • What's the percentage increase in coverage that we get by lemmatizing and looking for related words using WordNet?