Compare Two Texts

Introduction: 

This is a recipe for comparing two different texts, and exploring where they are similar and divergent in terms of words used, word distribution, theme and so on.

Ingredients: 

Steps: 

  1. Make sure your texts are in the same format. This recipe assumes they are Plain text files, though there are comparable tools to handle HTML and XML.
  2. Use the TAPoRware Comparator to compare your two texts. The comparator will show you the vocabulary which is common sorted by the ratio of relative frequency. This means words common to both texts, but far more frequent in text 1 appear first down to those more frequent in text two.
  3. Try to see if there are themes to the words that appear more frequently in one text over the other. You can explore these themes by using the following tools:
    • The Voyant Document KWICs tool can be used to search for a word to see its Context. You can compare KWIC concordances from the different texts of the same word or phrase.
    • Voyant Links will let you see what words collocate with the words you are exploring. If you compare the Collocation results for the same word from different texts you can get a sense of the differences in how the word is used.
    • The TAPoRware List Word Pairs tool will let you see frequently used two word phrases in each of the texts. These again can be compared.

Discussion: 

  • One way to think of comparing is that you are trying to find the common themes or clusters of words with the TAPoRware Comparator and then you are follow the themes seperately through each text and compare how they play out. See the Explore Themes within a Text recipe at the TADA wiki for more information.
  • You can use the TAPOR Portal instead of the TAPoRware tools listed here if you want to track your results. The portal lets you also save aggregations of more than one text so you can have a text that combines the two you are comparing. Note, however, that some of the experimental tools in TAPoRware are not in the portal.

Next steps / further information: 

Tools: 

Categories: 

Status: