- Find an electronic text at a source such as Project Gutenberg;
- Prepare text by removing any added infrastructure;
- Generate a word list (sorted by frequency) using the Voyant Corpus Term Frequencies;
- Examine list to see if anything unusual stands out;
- Refine word list by applying a Stop list;
- Re-examine list for particular words you expect or don't expect to see;
- Explore keywords using Voyant Document KWICs to find their context;
- Identify collocated words using Voyant Links to determine usage patterns;
- Finding a Text
Possible sources for electronic texts are listed on the Electronic Texts Panel of TAPoR. When preparing text for analysis, you should be aware that academic infrastructure included in the text may obstruct reading the text for its original construction. It may be useful to remove notes and other materials added by subsequent authors from the original work. You can use tools such TAPoR Extract Text to remove added material.
The word list can provide a first clue about the nature of the text. Questions which can be asked of the word list may include:
- What are the basic preoccupations of this text?
- What is unusual in the text?
- Are there any patterns in the tenses of words used?
- Given any expectations, are there words missing from the word list?