Error message

  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/
  • Notice: Trying to access array offset on value of type int in element_children() (line 6609 of /var/data/sites/
  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/data/sites/


Program Overview 

COCOA was a program for creating KWIC concordances, word frequency reports and word counts on texts (Day and Marriott 56).

COCOA was first available in the early to mid 1960s. D. B. Russell's 1965 review describes the earliest version as "a system which allows users to generate word-counts and concordances from literary (or other) texts. It was written originally for Atlas after consultation with various British Universities, and is currently being implemented for System 4-75 at Edinburgh" (Russell).

It was still in use 10 years later. In 1974, Paul Corcoran described COCOA as a highly flexible device for processing natural language texts. The program can produce word and vocabulary counts, concordances with context formats and location references, and word-frequency profiles. Texts supplied by the user may be in any language, although non-Roman characters or alphabets must be transliterated by means of characters or character combinations available on input devices and acceptable to the computer system on which the program is to be run. (Corcoran 566)

COCOA was written in FORTRAN and housed on approximately 4000 punched cards (Day and Mariott 56). Compared to other programs of its type, COCOA was powerful and flexible, capable of lemmatization, selecting words of specified length or suffix, or specifying only a given section of a text (56). COCOA required three interrelated parts to run, being the COCOA program itself, a 13-card 'control file' and the text to be processed, though the version adapted to DECsystem-10 computers combined the control file with the text file and permitted users to combine them in a single deck of cards (Corcoran 566). Users could declare their own character set, punctuation, word separators and connectors, special characters and signs (to assist the program's processing), references to other parts of the text or text library, a reference format for concordances, inclusion or exclusion of particular words, and suffixes for special processing (566).

Word counts output in tables, with the program automatically providing the counts sorted in frequency, alphabetic and rhyme (alphabetic on word endings) orders (Russell). Word frequencies were also arranged in a table (Russell). For concordances, COCOA produced a list of all lines a word appeared on, a limited amount of context, and a reference to the source of the text (Russell). The concorded word appeared in the center of the output line surrounded by its context (Russell). All output was suitable for printing without further processing (Corcoran 566). In its earliest version, COCOA required that texts in non-Roman alphabets be transliterated (Russell), but by 1974 it could handle texts of 256 individual characters, or 128 where the alphabet includes upper and lower case forms (Corcoran 566).

COCOA was notable for permitting users to insert comments directly into the text by enclosing them in square brackets (Russell). Identification records, which assisted COCOA in processing the text, were enclosed in angle brackets (Russell). As both forms of bracket were treated as special characters by the program, they could not be included in the text body (Russell). Russell described the use and treatment of the identification records:

The identification records will typically be copied from the headings of the original script, e.g.

< W SHAKESPEAR > < T HAMLET > < A 1 > < S 1 >

the W standing for Writer not William, T for Title, A for Act and S for Scene and so on. The choice of letters is arbitrary and is left to the individual user except that L always stands for Line Number, which is automatically initialized and incremented by the system. These identification records allow COCOA to identify the different sections of text. By reference to these records, users can program COCOA to select those sections of his archive which are appropriate for a particular study. He can also program COCOA to generate references for occurrences of words appearing in the output from a concordance. (Russell)

The control cards provided with the program were used to specify functions, and had the drawback of both lacking clarity and being prone to human error (Day and Marriott 56). However, COCOA was otherwise well commented and had good technical support (56). While COCOA could process any size of text in theory, in practice it was restricted by the core space of the machine processing it and the user's available disk space (Corcoran 566). To work around those restrictions, COCOA provided controls for limiting the output of particularly large texts to manageable proportions (Russell). Users could reduce the scope of their text by providing a list of words of particular interest, providing a list of words to exclude, setting the program to only concord words appearing within a specified frequency range, or by breaking up concordance into blocks by alphabetic range (Russell).

COCOA was a success with academic users from the start. Russell reports that by the time it had been out for six months, there were academic studies using it in over six languages (Russell). As of 1976, COCOA was still regarded as a valuable tool for literary research (Day and Marriott 56).

Last Update 
Sep 19, 2013
Related Tools  
This document is retrieved from the Internet archive.