Background on Text Analysis

Program Overview 

What is text analysis? A very short answer.

Word processors have searching tools that allow you to find a word or phrase. Such finding tools can be used as a simple text analysis environment. Your word processor is not, however, suited to searching large texts interactively, nor does it show you the results of a search in a way that can help you understand a text. Computer assisted text analysis environments do three types of things beyond what the "Find" tool of a word processor might do:

  • Text analysis systems can search large texts quickly. They do this by preparing electronic indexes to the text so that the computer does not have to read through the entire text. When finding words can be done so quickly that it is "interactive", it changes how you can work with the text - you can serendipitously explore without being frustrated by the slowness of the search process.
  • Text analysis systems can conduct complex searches. Text analysis systems will often allow you to search for lists of words or for complex patterns of words. For example you can search for the co-occurrence of two words.
  • Text analysis systems can present the results in ways that suit the study of texts. Text analysis systems can display the results in a number of ways; for example, a Keyword In Context display shows you all the occurrences of the found word with one line of context.

Why bother with computer-assisted text analysis? A short answer.

Text analyis tools aide the interpreter asking questions of electronic texts:

Much of the evidence used by humanists is in textual form. One way of interpreting texts involves bringing questions to these texts and using various reading practices to reflect on the possible answers. Simple text analysis tools can help with this process of asking questions of a text and retrieving passages that help one think through the questions. The computer does not replace human interepretation, it enhances it. The concordance is an example of a research aide that predates computing. The concordance, like the index, allows the interpreter to find passages that share a common word about which you are asking. Unlike the index, the concordance is a presentation of the passages that "concord" together for reflection. Thus a Key Word In Context display shows one line of text with the key word searched for in the middle so that the reader can see if there are patterns in the way the word is used. Most text analysis tools build on the concordance. They break down the text (analysis) and then represent passages in new arrangements (synthesis) that aide the questioner list graphs of the distribution of a word.

  1. Text-analysis tools break a text down into smaller units like words, sentences, and passages, and then
  2. Gather these units into new views on the text that aide interpretation.

Text analysis practices encourage reflection on the questions asked and formalization of queries:

One of the side-effects of computer-assisted text is that it forces the interpreter to think about their interpretative practices in order to use the tool. When you use an index or concordance you have ask yourself what word or concept you want to follow through the text. In order for a computer to aide in interpretation you need to describe the questions you bring to bear and then formalize them into queries that the computer can handle. To look for an abstract concept in a text with a concordance you have to ask what words or patterns would be indicative of the discussion of that concept. In the formalization the interpreter can learn about what they are asking and develop new questions. With interactive systems you start thinking you are asking about one question but in the formalization and refinement discover anomalies or other questions you want to ask. Interactive with a text through text analysis tools then becomes an interative practice of discovery with its own serindipidous paths comparable to, but not identical to, the serindipidous discovery that happen in rereading a text.

Text analysis is a way of targeted rereading that tests intuitions

For some, text analysis is a way of "thinking through" or targeted rereading. When interpreting a text you develop intuitions about what may be an interesting rereading of the evidence. Most intutions have implied quantitative or comparative elements of the sort "Rousseau is more interested in political issues while Diderot is interested in domestic philosophical issues." Such intuitions can be checked with text analysis. What language use would we expect in a philosopher interested in state issues rather than personal or domestic issues? Is there a higher frequency of certain words in Rosseau? In this formal thinking through of intuitions one doesn't necessarily prove the intuition, instead one uses interpretative aides in practices of reflection. Interestingly, these practices are often not shared as formal methods in the humanities. The resulting papers we write often don't mention the text analysis just as they don't mention the library practices. The practices are undertheorized in the humanities, which is one of the reasons for this Text Analysis Developers Alliance wiki. It was set up to provide a place for those interested in thinking about computer-assisted interpretative practices, associated tools, and how they can be developed.

You are already doing text analysis

As an increasing amount of the evidence that we use is accessed electronically online, we are forced to use analytical techniques in everyday research. When you Google for documents and then follow trails through the web you are using sophisticated tools that rank results of queries for words or combinations of words. Few understand how Google operates, but we learn to use it. Likewise when we search bibliographic databases for articles and then read them online we are using computer research tools. To use large full-text databases we learn to search for keywords and we learn to interpret the resulting views. This is simple text analysis, aren't you bothered by not be able to ask more refined questions of an electronic text? Doesn't it bother you to not know how the search works when you don't get what you expected? Text analysis as a practice is reflective. Text analysis researchers and developers are asking about the tools and the way they constrain or make possible practices of reflection.

Last Update 
Jan 21, 2013
Note 
This document is retrieved from the Internet archive.