With a Case Study of Holocaust NER


Dr. W.J.B. Mattingly

Postdoctoral Fellow at the Smithsonian Institution's Data Science Lab


United States Holocaust Memorial Museum

How to Cite

Mattingly, William. Introduction to Named Entity Recognition, 2021 (2nd ed.).


This 2nd edition is updated to bring the textbook aligned with the syntax of spaCy 3. This series of notebooks is meant to function as a textbook for named entity recognition (NER), a task of natural language processing. The purpose of NER is to extract structured data from unstructured texts, namely specific entities, such as people, places, dates, etc. To date, there is not a freely available extensive treatment of the subject and methods of NER, from using off-the-shelf frameworks to creating custom domain-specific solutions. This notebook uses several different datasets to demonstrate the utility of and the methods for using NER. These notebooks are designed to be used alongside YouTube videos, which are embedded in the relevant sections. The complete playlist can be found here: Introduction to NER. If you find typos or errors in these notebooks, please do not hesitate to contact me either via Twitter or here on GitHub.


This NER Textbook was created during my postdoctoral fellowship at the Smithsonian Institution’s Data Science Lab with collaboration at the United States Holocaust Memorial Museum. It would not have been possible without the help of Rebecca Dikow, Mike Trizna, and those in the Data Science Lab who listened to, aided, and advised me while creating these notebooks. I would also like to thank the content experts at the USHMM, specifically Michael Haley Goldman, Michael Levy, and Robert Ehrenreich.

dsl si ushmm