Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Natural Language Processing

Named Entity Recognition (NER) Primer

Flashcards

0/15

Still learning

Named Entity Recognition (NER)

A subtask of information extraction that seeks to locate and classify named entities mentioned in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Chunking

Also known as shallow parsing, it's a process of extracting phrases from unstructured text, which is then used as an input for NER to identify named entities.

IOB Tagging Scheme

A common tagging format for marking up text chunks where 'I' stands for Inside, 'O' for Outside, and 'B' for Beginning of a named entity. The tags are used for representing entities in a tokenized text.

CoNLL

Short for 'Conference on Natural Language Learning', it's known for its NER challenges which have contributed to the progression of NER methods by providing a common framework for evaluation.

CRF (Conditional Random Fields)

A type of discriminative probabilistic model often used for structured prediction in NER, which takes into account the context within which entities appear to improve recognition accuracy.

Gazetteer

A list or a dictionary of named entities used as a reference to aid in entity recognition. It's a lookup table that can be used to improve NER by providing external knowledge.

Entity Disambiguation

The task of resolving the ambiguity of entities in text, determining which 'entity' out of several possible candidates the term refers to, often using context or external databases.

BiLSTM (Bidirectional Long Short Term Memory)

A type of recurrent neural network (RNN) that can capture context from both past and future data points within a sequence, widely used in NER for its effectiveness in context-aware entity prediction.

Feature Engineering

The process of using domain knowledge to extract features from raw data that will be used to train machine learning models, especially crucial for NER to determine relevant attributes of named entities.

Transfer Learning

A machine learning method where a model developed for one task is reused as the starting point for a different but related task. In NER, it allows leveraging pre-trained models to improve entity recognition performance.

BERT (Bidirectional Encoder Representations from Transformers)

A transformer-based model designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers, which has shown significant improvements in NER tasks.

False Positive

Occurs when an entity is incorrectly identified or classified by a NER system, indicating that the model recognized an entity that is not actually present in the text.

False Negative

Occurs when a NER system fails to identify or classify an actual entity in the text, which should have been recognized by the model.

Precision and Recall

Precision measures the correctness of the entities that the NER system has identified while Recall measures the system's ability to identify all relevant instances of entities in the text. Both metrics are used to evaluate NER performance.

NER in Multilingual Context

Refers to the ability of NER systems to operate on languages other than English, requiring models to handle various linguistic phenomena and script variances which increases the complexity of entity recognition.

Know

Still learning

Click to flip

Know