Analyzing discourse around cultural phenomena
Introduction
This recipe provides a process for gathering a corpus of discourse about a cultural phenomenon from web sources, and then analyze it with a text analysis tool. The analysis will identify notable features in the text such as common words, examine these in context to identify its major topics and concepts, and enable forming hypotheses about the discourse and cultural phenomena within the corpus supported by textual evidence.
Ingredients
- a cultural phenomenon attracting a significant level of interest in current debate
- a number of topic related documents gathered from the web
- your curiosity
- an online web browser with a search engine (Google etc.) to collect discourse documents from the web
- a text editor to paste these documents into a single text file
- a text analytical tool (such as CATMA, downloadable at www.catma.de)
Steps
- load text file and auto-generate Concordance
- segment text file by inserting a tag (“source tag”)
- analyse wordlist for potential markers (keywords) of topic focus ; if useful create keyword groups
- define respective tags and tag text
- do a distribution & Collocation analysis
- interpret results
Discussion
This recipe is generic and can be put to practice with a variety of text analysis tools, such as Voyant, CATMA etc. (or a combination of these)
Our example analysis for collocates of the group of words tagged as referencing notions of nationality generated an unexpected result, namely a statistically significant relationship with 'Love'. However, on validating this result we found that it was actually misleading and due to the fact that we had included in the corpus a number of bibliographies and filmographies which cited the title "From Russia with Love".
The example analysis thus serves to illustrate how a badly constituted corpus can invalidate the results of textual analysis. In the group discussion of the example it was also noted that the problem could be easily eliminated in CATMA by tagging title citations in the corpus separately and then excluding them form the colocation analysis.
Status
Submitted by sondheim on Sun, 02/27/2011 - 00:00