Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Speech Recognition Fundamentals

15

Flashcards

0/15

Still learning
StarStarStarStar

Acoustic Model

StarStarStarStar

A statistical model that maps audio signals to phonetic units.

StarStarStarStar

Word Error Rate (WER)

StarStarStarStar

A common metric for evaluating the performance of a speech recognition system, calculated as the number of errors divided by the number of words spoken.

StarStarStarStar

Language Model

StarStarStarStar

A probabilistic framework that predicts the likelihood of a sequence of words.

StarStarStarStar

Connectionist Temporal Classification (CTC)

StarStarStarStar

An output layer for neural network models that allows the alignment of input audio frames with output labels, often used in end-to-end ASR.

StarStarStarStar

Feature Extraction

StarStarStarStar

The process of transforming raw audio data into a set of numerical features that can be processed by an ASR system.

StarStarStarStar

End-to-End Speech Recognition

StarStarStarStar

An ASR approach that directly maps raw audio to text using a single neural network model, without separate acoustic and language models.

StarStarStarStar

Hidden Markov Model (HMM)

StarStarStarStar

A statistical model used to represent stochastic processes, often employed in ASR to model the probability of transitions between acoustic model states.

StarStarStarStar

Phoneme

StarStarStarStar

The smallest unit of sound in speech that can distinguish one word from another.

StarStarStarStar

Automatic Speech Recognition (ASR)

StarStarStarStar

The use of computer algorithms to convert spoken language into text.

StarStarStarStar

Beam Search

StarStarStarStar

A search algorithm that expands only a limited set of nodes that are most promising, often used in ASR to decode the spoken words from an acoustic model's output.

StarStarStarStar

Deep Neural Network (DNN)

StarStarStarStar

A neural network with multiple hidden layers between the input and output layers, used in modern ASR for modeling complex patterns in data.

StarStarStarStar

Voice Activity Detection (VAD)

StarStarStarStar

A technique in ASR to determine the presence or absence of human speech within an audio stream.

StarStarStarStar

Mel-Frequency Cepstral Coefficients (MFCCs)

StarStarStarStar

A representation of short-term power spectrum of sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

StarStarStarStar

Speaker Diarization

StarStarStarStar

The process of partitioning an input audio stream into homogenous segments according to the speaker identity.

StarStarStarStar

Continuous Speech Recognition

StarStarStarStar

Recognition of natural speech where words are spoken in full sentences without pausing between them.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.