Graphing Data in Python
Introduction
This recipe shows how to graph data in Python using the Matlablib library.
Ingredients
- Textual data to graph
- Python 3
- A notebook editor such as Jupyter
- The Matlablib library (preinstalled with the Anaconda bundle)
- Example code in The Art of Literary Text Analysis
Steps
- Import the library (assign it a shorthand if you want)
- Instruct the kernel to produce graphs inline in the notebook using the code: %matplotlib inline
- If you forget this step, your notebook may freeze, and you'll have to shutdown the kernel and start again
- Plot word frequency
- Create a tokenized list of the most frequent words in a text using the NLTK's FreqDist() function
- Plot this, entering the number of terms in the list to plot and a title for the graph
- Create a Word Length Characteristic Curve
- Create a new list that replaces each word in the tokens list with its length
- Try plotting this list
- Create a new list from the previous one, sorted by word length not frequency
- Each item in the list is two numbers in a fixed order
- Create a new number list that only contains the word length (x axis)
- Create a new number list that only contains the word frequency (y axis)
- Plot the two lists as the x/y axes
- Graph Word Distribution of Proper Nouns
- Remove stopwords from the tokenized word list
- Re-plot the top frequency words
- Take note of frequent words that sound like proper nouns
- Re-tokenize the imported text, but don't convert it to lowercase
- Run the concordance() function or a concordance tool to see the word in context, and whether it is capitalized
- Use the dispersion_plot() function to graph the frequency of the two proper noun strings in the full text
Status
Submitted by GregWS on Mon, 10/03/2016 - 13:37