Error message

  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/methodi.ca-7.63/includes/common.inc).
  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/data/sites/methodi.ca-7.63/includes/common.inc).

TATR: Tokenization and Extraction

Introduction 

This recipe is part of the Text Analysis for Twitter Research (TATR) series, and will look at tokenizing and extracting key features from a Tweet.

Ingredients 
Steps 
  • Open a new Jupyter Notebook and import the following libraries:
    • NLTK
    • PANDAS
    • NUMPY
  • Import Twitter data
  • Insert data into a Panda’s dataframe
  • Write a function to tokenize the text within the dataframe
  • Apply the function to every cell in the dataframe (see Panda’s “apply” and “lamba” features)
  • Write a function to extract your desired column of token data from the dataframe
  • Save the extracted data into a new Panda’s dataframe
Discussion 

The TATR library was presented as an academic poster in 2018’s Congress held in Regina, SK. For a PDF version of the full poster, please visit:

Next steps / further information 

Certain aspects of this recipe draw upon code from the companion TATR notebooks and recipes. In particular, please see:

TATR: Panda and CSV of Tweets

This recipe describes components that are fundamental for some of the more advanced TATR notebooks.

Status