Basic Collocation in Python
Introduction
This recipe shows how to perform Collocation on a text, finding which words collocate with a search term.
Ingredients
- A text to perform collocation on
- Python 3
- A notebook editor such as Jupyter
- Example code from The Art of Literary Text Analysis
Steps
- Import the text to be analyzed into a string
- Tokenize the string
- Choose which works to search for
- Choose how many context words to search on either side
- Analyze the tokens for collocation
- Prime a counter variable and a list of collocates
- Use a for loop to check whether each token is word to collocate
- Use a nested for loop to add the words on either side of the collocated word to the collocates list, checking that the there are words to add (e.g. the collocated word isn't the first or last one in the tokens).
- Analyze the collocates list
- Count the number of unique collocates using the set() function
- Use the NLTK library's FreqDist() function to check the highest frequency terms
- Plot the results
- Export the collocates as a CSV file
Status
Submitted by GregWS on Mon, 10/03/2016 - 12:21