Basic Collocation in Python


This recipe shows how to perform Collocation on a text, finding which words collocate with a search term.

  1. Import the text to be analyzed into a string
  2. Tokenize the string
  3. Choose which works to search for
  4. Choose how many context words to search on either side
  5. Analyze the tokens for collocation
    1. Prime a counter variable and a list of collocates
    2. Use a for loop to check whether each token is word to collocate
    3. Use a nested for loop to add the words on either side of the collocated word to the collocates list, checking that the there are words to add (e.g. the collocated word isn't the first or last one in the tokens).
  6. Analyze the collocates list
    1. Count the number of unique collocates using the set() function
    2. Use the NLTK library's FreqDist() function to check the highest frequency terms
    3. Plot the results
  7. Export the collocates as a CSV file
Next steps / further information 
  • This recipe is based off of a utility in The Art of Literary Text Analysis
  • Play with different context values to see how they change the results
  • Explore different ways of plotting the results