What are electronic texts and how can we analyze them?

What are electronic texts and how can we analyze them? An extended answer.

Based on Electronic Texts and Text Analysis by Geoffrey Rockwell and Ian Lancashire

The written word is one of the most important ways we communicate and preserve information. Whether it is legal records, novels, historical records, medical case studies, or now website pages, written text is in an important form of data. It is one of the primary means by which we communicate in industry, academia or for pleasure and, as an increasing amount of the texts that we care about are created in electronic form and accessed in electronic form.

What is an Electronic Text?

Electronic texts digitally represent oral or written language in a form suitable for analysis with a computer. Typically, an electronic text is either an electronic version of a written work, an electronic version of a transcript of an oral event, or a document composed on the computer. In any case the information in an electronic text is meant to be in a natural language that can be read by humans when displayed properly. Some examples of electronic texts would be:

  • An e-mail message
  • A medical case study that is stored on a computer
  • Website pages that contain a significant amount of text so that, to be understood, they have to be read
  • A hypertext manual
  • An electronic edition of a play with markup and links to images of the original manuscript
  • A corpus of texts collected for linguistic or lexicographical study like a collection of exemplary texts used in the creation of a dictionary
  • A business document like a report, proposal, or contract
  • An interactive CD-ROM dictionary or encyclopedia
  • An interactive text adventure game where you read passages and make decisions
  • A collection of legal documents accessible through a digital retrieval system
  • A transcript of a series of interviews with embedded interpretative information
  • A transcript of a court case or administrative proceeding


Electronic texts come in four major forms:

  1. A copy of a work that was originally on paper – such as, a digital representation of a literary, dramatic, or other type of written work that was originally in analogue form.
  2. A work composed on the computer that is stored in that form, but was intended to be printed like a word-processing file or PDF (Portable Document Format) file.
  3. A work composed on a computer that is meant to be accessed on a computer like a website page, electronic text database, or hypertext
  4. A transcript of a conversation or other oral event


What can we do with electronic texts?

We can use computers to present, manage, and learn from electronic texts in ways difficult to do by hand. We can archive large quantities of text and make reliable copies of these archives. We can quickly retrieve passages from a large text database of millions of pages. We can ask where two or more words occur within the same paragraph. We can link automatically to other information from a hypertext. We can quantify writing style or try to identify the author of a disputed work by his or her style. We can compare written works or study the evolution of language usage over a collection of texts.

In general, the process of computer assisted text-analysis uses computers to search, retrieve, manipulate, measure and classify natural-language documents by author, subject, and genre or type, and for patterns. Here is partial list of some of the activities researchers do with electronic texts:

  • Editors and translators add interpretative information to electronic versions of historically important texts to create rich electronic editions for use by other scholars, students or interested readers. Such electronic editions can include modern spellings, commentary, variant translations, references, multimedia supplements and images of the original manuscript all available at a click of a button.
  • Researchers across the humanities and social sciences use electronic text collections to find passages where issues are discussed and to retrieve documents that are relevant to their questions.
  • Social scientists use text analysis to study interviews, responses to questionnaires, collections of policy documents, or letters. By qualitative analysis they characterize or model the topics, opinions, or psychological traits exhibited in the texts.
  • Linguists add information to texts about language features so that they can study language use. Using these corpora (collections of texts) they write dictionaries, grammars, studies of language change over time, and analyses of language use in different communities.
  • Researchers who are interested in the meaning of words analyze them by their company, that is, by the terms that co-occur or collate with them, and use statistical techniques. Their research benefits researchers developing automatic translation tools for global commerce.
  • Sociologists, educationalists, and psychologists sometimes analyze one aspect of human behavior, verbal communication (commonly termed 'talk'), by studying the utterances of individual speakers for traits such as sentence length, rate of repetition, pauses, questions, negatives, turns, etc.
  • Experts are called upon in court to use combinations of these techniques to establish the authorship of disputed texts. Forensic linguistics is a growing field as an increasing number of the documents that we exchange are electronic so that traditional ways of establishing the author will not work. (There is no fingerprint on an e-mail message only patterns of language use.) In general, a quantitative or qualitative profile of the disputed text is compared to profiles of texts known to have been written by candidate authors.
  • Documenters and usability analysts employ such techniques to improve client manuals and business technical reports, and to help customers to summarize documents.
  • Language instruction researchers use text tools to study language learning problems and develop collections of electronic texts with which to teach languages. They work with linguists to develop text collections with which to train translation systems.
  • Researchers from all areas publish in electronic journals creating more electronic texts for others to study and access.

Selected Canadian Links:

Dictionary Projects:

Dictionary of Old English: http://www.doe.utoronto.ca/

Waterloo Centre for the Study of the New OED: http://db.uwaterloo.ca/OED/

Dictionnaire de l'Académie française: http://www.chass.utoronto.ca/~wulfric/academie/

Termium: http://www.translationbureau.gc.ca/pwgsc_internet/english/03_tools/03_termium.htm

Comparative Lexicography of French and English in Canada: http://balzac.sti.uottawa.ca/


Electronic Editions:

Internet Shakespeare Editions: http://web.uvic.ca/shakespeare/

TLFQ: http://www.ciral.ulaval.ca/tlfq/

Canadian Poetry Database: http://www.lib.unb.ca/Texts/projects.html

Representative Poetry On-line: http://www.library.utoronto.ca/utel/rp/intro.html

Laboratoire de Français Ancien: http://www.uottawa.ca/academic/arts/lfa/

Web Joyce - Finnegans Web: http://www.trentu.ca/jjoyce/fw.htm

Complete Poems and Letters of E.J. Pratt: http://www.trentu.ca/pratt/

Canadian Poetry: http://www.library.utoronto.ca/canpoetry/

Early Canadiana Online: http://www.canadiana.org/

The Orlando Project: http://www.artsrn.ualberta.ca/orlando/


Text Projects Elsewhere:

Arts and Humanities Data Service (no longer being operated): https://web.archive.org/web/20120716205617/http://www.ahds.ac.uk/

Oxford Text Archives: http://ota.ahds.ac.uk/

University of Virginia Electronic Text Centre: http://dcs.library.virginia.edu/digital-stewardship-services/etext/

University of Virginia Institute for Advanced Technology in the Humanities: http://www.iath.virginia.edu/

Project Gutenberg: https://www.gutenberg.org/

Text Encoding Initiative: http://www.tei-c.org/index.xml



