Error message

  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/data/sites/methodi.ca-7.63/includes/common.inc).

Using Python and WordNet to Explore a Text

Introduction 

This recipe uses Python and WordNet to explore meaning in a text.

Ingredients 
Steps 
  • Import the text to analyze and tokenize it
  • Normalize the word tokens using Stemming or Lemmatization
  • Search for meaning with WordNet
    • Import WordNet
    • Generate a cognitive synonym set ("synset") for a word you want to search
    • Select a single use of the word (e.g. "bug" as a noun, not a verb)
    • WIth the word's synset, Generate lists of hypernyms (words with more general meanings) and hyponyms (words with more specific meanings)
    • Create a recursive function to "drill down" and find the lists of hyponyms for each hyponyn
    • Use the function to search for all of the hyponyms of the hypernymn of the word you are interested in (e.g. "insect" for "bug")
    • Look at the results, and Remove any words that should be stopwords
    • Create a dispersion plot of the words
    • Combine the words into a single identity and plot it
    • Use the NLTK similar() function to loosely find words co-located to the identity
    • Use a ContextIndex object and its word_similarity_dict() function to find word frequencies across the document
Next steps / further information 
  • This example is also available in The Art of Literary Text Analysis.
  • Compare the count of the word you're searching in the tokens list, and its count using the WordNet and lemmatization approach.
  • What's the percentage increase in coverage that we get by lemmatizing and looking for related words using WordNet?
Status