This utility is for creating a simple web scraper with Python.
This section presents a concise summary of what the recipe will teach, focusing primarily on: (1) the outcome of the recipe (what are you trying to achieve, in non-technical terms); (2) the main technical approaches employed; and (3) whether the recipe is based on someone else’s work/code (they should be cited if so).
This recipe discusses ways to find electronic texts (e-texts) online that can be used by other text analysis tools.
Compare word usage across two corpora to see if there is any difference between the two. Look at a word in the two texts, you use the U test to determine if the difference is significant. Helps determine the uniqueness of a term.
Term Frequency-Inverse Document Frequency or TF-IDF, is used to determine how important a word is within a single document of a collection. It will help determine the importance or weight of word to a document in a collection or corpus. It ranks the importance of word based on how often it appears in a text but the rank is offset by how often it occurs in the whole collection.