Graphing Data in Python


This recipe shows how to graph data in Python using the Matlablib library.

  1. Import the library (assign it a shorthand if you want)
  2. Instruct the kernel to produce graphs inline in the notebook using the code: %matplotlib inline
    1. If you forget this step, your notebook may freeze, and you'll have to shutdown the kernel and start again
  3. Plot word frequency
    1. Create a tokenized list of the most frequent words in a text using the NLTK's FreqDist() function
    2. Plot this, entering the number of terms in the list to plot and a title for the graph
  4. Create a Word Length Characteristic Curve
    1. Create a new list that replaces each word in the tokens list with its length
    2. Try plotting this list
    3. Create a new list from the previous one, sorted by word length not frequency
    4. Each item in the list is two numbers in a fixed order
    5. Create a new number list that only contains the word length (x axis)
    6. Create a new number list that only contains the word frequency (y axis)
    7. Plot the two lists as the x/y axes
  5. Graph Word Distribution of Proper Nouns
    1. Remove stopwords from the tokenized word list
    2. Re-plot the top frequency words
    3. Take note of frequent words that sound like proper nouns
    4. Re-tokenize the imported text, but don't convert it to lowercase
    5. Run the concordance() function or a concordance tool to see the word in context, and whether it is capitalized
    6. Use the dispersion_plot() function to graph the frequency of the two proper noun strings in the full text
Next steps / further information