Explore tens of thousands of sets crafted by our community.
NLP Evaluation Metrics
10
Flashcards
0/10
BLEU Score
The BLEU Score (Bilingual Evaluation Understudy) is calculated by comparing the machine-generated text to one or more reference translations. It considers the precision of n-grams in the generated text and applies a brevity penalty for overly short translations.
F-Measure (F1 Score)
The F-Measure, or F1 Score, is the harmonic mean of precision and recall.
WER (Word Error Rate)
WER is calculated by taking the total count of errors (substitutions, insertions, deletions) made by an ASR system and dividing it by the number of words in the reference transcription.
Precision
Precision is the proportion of positive identifications that were actually correct.
AUC-ROC (Area Under the Receiver Operating Characteristics Curve)
AUC-ROC is a performance measurement for binary classification problems. It measures the area under the curve created by plotting the true positive rate against the false positive rate at various threshold settings. The AUC represents the likelihood that the model ranks a random positive example more highly than a random negative example.
PERPLEXITY
Perplexity measures how well a probability model predicts a sample. It is calculated as the exponentiated average negative log-likelihood of a model predicting a sequence.
ROUGE Score
The ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation) measures the quality of a summary by comparing it to reference summaries. It focuses on recall by calculating the fraction of n-grams in the reference summaries that are also found in the generated summary.
Accuracy
Accuracy is the proportion of correct predictions (both true positives and true negatives) over all predictions.
METEOR Score
The METEOR Score (Metric for Evaluation of Translation with Explicit Ordering) is a metric for machine translation evaluation that considers exact, stemmed, and synonym matches between the generated text and reference translations. It also accounts for the harmonious ordering of words.
Recall
Recall is the proportion of actual positives that were identified correctly.
© Hypatia.Tech. 2024 All rights reserved.