Add a French Language Text to TAPoR

Introduction 

This recipe takes a French language text and adds it to the TAPoR workspace for textual analysis. This recipe ensures that the fundamental task of loading text into a text analysis environment is accomplished correctly. For proper analysis, the text must be interpreted by the computer in the same way in which you enter it, including accented characters. There are a variety of ways in which text can be encoded by operating systems and applications during text entry and storage. This recipe will ensure that your text has been entered and encoded properly for analysis and that you can enter search terms and parameters from your browser to complete analytical tasks.

Ingredients 
Steps 
  1. Prepare Text in an external editor to ensure that it is encoded correctly;
    or
  2. Confirm that the French language web page that you wish to use is encoded properly;
  3. Log in to TAPoR;
  4. Add your encoded French language text file to MyTexts;
  5. Generate a word list (sorted by frequency) using the TAPoR List Words Tool;
  6. Explore an accented word individually using TAPoR Find Words - Concordance Tool
Discussion 
  • Text Editors

You may require a text editor to encode your text into UTF-8 or Latin-1 to maintain the accents and special characters in the textual language. On a Windows system, this can be done through NotePad and under Macintosh OSX through TextEdit. On Unix-based systems, you will find a text editor installed as part of the standard system install. Word processors typically provide a much deeper tool set for formatting text and generally save documents in their native format which is not appropriate for importing into a text analysis environment. However, they too can be used to save a plain text file with appropriate encoding by following the appropriate steps.

  • < >Instructions for saving as UTF-8 or Latin-1 using NotePad
  • Instructions for saving as UTF-8 or Latin-1 using TextEdit
  • Instructions for saving as UTF-8 or Latin-1 using MicrosoftWord
  • Web Page Encoding

To verify that the web page that you wish to import into TAPoR is encoded in either UTF-8 or Latin-1, you need to check the browser settings. In Internet Explorer, simply go to the View Menu and select the Encoding Option. This should read Unicode (UTF-8). On Firefox, the option is Character Encoding under the View menu. This should also read Unicode (UTF-8). If this is not the case, then you can manually select the encoding you wish to use from this menu. On other web browsers, the process should be similar. Please consult their help files for specific instructions on character encoding. If you view the page source for your web page, it may contain the HTML line:
or

""

Which will indicate that it is encoded properly for text analysis.

Status