Finding Locations in a Text Using Named-Entity Recognition in NLTK
Similar to finding People and Characters, finding locations in text is a common exploratory technique. This recipe shows how to extract places, countries, cities from a text. We will use Named-Entity Recognition (NER) module of NLKT library to achieve this. This recipe is based on Jinman Zhang's cookbook.
- Open the text file and load the contents into a container
- Create an empty list to store the location names that will be found in the text
- Split the loaded text by sentences
- Split the sentences further into words
- Tag / label each word with its lexical association
- Finally Extract the words labelled "GPE" and deposit them into the list we created in step 2.
- Depending on the text, the populated list will have have words referring to places, nations, cities, towns e.t.c.
- The final step is to use Geotext library to group results into nationalities, countries and cities
Could form the initial steps for mapping and gauging need for visualization of locations on a map