This glossary collects key terms from the field of computational audio content analysis. The original idea of this glossary was to provide Finnish translations for key terms in this field in order to stabilize the terminology. However, to make the list usable also for others than Finnish researchers, a brief English definition is added for each term with links to Wikipedia and Wiktionary. To make the list more accessible across other languages, some terms have been translated also into German, Spanish, and French through Wikipedia. The glossary does not try to be complete, as it is a work in progress.
Data file used to create this glossary is published as repository:
If you see an error, or you want to contribute, make a pull request to the repository or send me an email.
Special thanks to Tomasz Mąka for the Polish translations and Irene Martin Morato for the Spanish translations.
Dictionaries and glossaries from related fields:
- English-Finnish dictionary for general audio signal processing terms by Vesa Välimäki
- English-Finnish dictionary for statistics and probability theory terms by Petri Koistinen
- English-Finnish dictionary/glossary for language technology by Kimmo Koskenniemi
- Bank of Finnish terminology in arts and sciences
- Glossary of statistical terms by ISI
- Machine learning glossary by Google
- Tilastotieteen sanasto by Juha Alho, Elja Arjas, Esa Läärä ja Pekka Pere
Terms 443,
Translations
329
107
113
118
120 ,
Updated 2025-07-24
A
accuracy
The fraction of system output which was predicted correctly
See also: evaluation metric




acoustic feature
See also: feature

acoustic model
in speech recognition system, model learned from acoustic data

acoustic pattern recognition

acoustic scene
acoustic scene analysis
acoustics




activation function
in neural network, a function to define the output of a neuron



active learning
a learning technique where algorithm selects the data it learns from

additive noise

aggregation




Amazon mechanical turk (AMT)
crowdsourcing marketplace enabling the use of human intelligence to perform tasks
annotation
adding metadata to audio

annotator

area under the curve (AUC)
in binary classification, an evaluation metric to considers all classification thresholds
See also: reciever operating characteristic curve

artifical general intelligence (AGI)
See also: strong artificial intelligence, full artificial intelligence




artificial intelligence (AI)
an ability to have machines act with apparent intelligence




assisted living
a housing facility for people with disabilities or for adults who cannot or choose not to live independently


audification
audio-visual

audio analysis

audio captioning
audio classification
See also: classification

audio dataset
a collection of audio examples used for system development

audio signal processing

audio source separation

audio tagging
audiovisual data

auditory
relating to hearing

auditory event
auditory scene

auditory scene analysis (ASA)
a model proposed by Albert Bregman for the basis of auditory perception

augmented intelligence

auralization
B
background noise

backpropagation, backprop
method used in neural networks to calculate gradient descent



bag of frames
representing frames without taking into account their order
balanced accuracy (BACC)
batch
in neural network training, a set of examples used in one iteration for model training

batch normalization (BN)
a technique for improving the performance and stability of neural networks
See also: deep neural network, neural network, batch

beamforming
technique used in sensor arrays for directional signal reception or transmission


big data



bigram
binary classification
a type of classification which outputs one of two mutually exclusive classes

binary mask

binaural
related to two ears

bioacoustics
cross-disciplinary science that combines biology and acoustics




block mixing
data augmentation technique
See also: data augmentation
boosting
a machine learning technique which iteratively combines weak classifiers into a classifier with higher accuracy

brute-force search
Systematically going through all possible candidate solutions for the problem
See also: exhaustive search




C
category
a group to which items are assigned based on similarity or defined criteria

cepstrum

class label

classification
identification of which categories an item belong




classification model
See also: model, classification
classification of events, activities and relationships (CLEAR)
evaluation campaign organized on 2006 and 2007
classification threshold
See also: classification
classifier
See also: classification

closed set classification
See also: open set classification
cluster
See also: cluster analysis




cluster analysis




clustering
grouping related examples together




cognitive modeling

collaborative learning
See also: federated learning
computational audio content analysis
See also: content analysis

computational auditory scene analysis (CASA)

computational linguistics
an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective

computational modeling

computer audition (CA)
field of study of algorithms and systems for audio understanding by machine

confusion matrix
an NxN table to summarize classification performance (predicted class versus actual class)


connectionist temporal classification (CTC)
constant-Q cepstral coefficients (CQCC)
constant-Q transform (CQT)
content analysis




context

context-aware
convolution
mathematical operation of two functions to produce a third function that expresses how the shape of one is modified by the other




convolutional neural network (CNN)
a neural network with convolutional layers along with pooling and fully connected layers
See also: neural network, deep neural network



convolutional recurrent neural network (CRNN)
See also: neural network, deep neural network
corpus

crowdsourcing

D
data

data acquisition



data augmentation
artificially increasing the number of training examples

data post-processing
See also: data preprocessing

data preprocessing
See also: data post-processing

decision boundary
learned separating boundary between classes

decision tree
a learning method using tree-like decision graph

deep learning (DL)
a multi-level algorithm that gradually identifies things at higher levels of abstraction


deep machine learning (DML)
See also: deep learning
deep neural network (DNN)
neural network containing multiple hidden layers

detection

detection and classification of acoustic scenes and events (DCASE)
detection error tradeoff (DET)
Plot of the false rejection rate versus false acceptance rate for classification systems

deterministic

diarization error rate (DER)
dimensionality

direction of arrival (DOA)

discrete-time Fourier transform (DFT)

discrete cosine transform (DCT)
Transform to represent data points with a sum of cosine functions




discriminant analysis




discriminative learning
modeling the dependence of a target variable y on an observed variable x
See also: generative learning
dissimilarity
See also: similarity




domain adaptation
machine learning field to deal with cases in which a model trained on source distribution is used on different target distribution

downmixing, down-mixing
mixing audio channels together
duration
a length of audio signal

dynamic range




E
early fusion
Features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: feature level fusion
edge AI
edge computing
embeddings
a low-dimensional space into which high-dimensional vectors can be translated
See also: word embedding

empirical


ensemble
ensemble learning
use multiple learning algorithms to obtain better predictive performance than any of the constituent learning algorithms alone

epoch
while traning neural networks, one pass of the full training set
See also: deep neural network, neural network

equal error rate (EER)
error rate

evaluation metric
event-based metric
See also: evaluation metric

event offset
event onset
everyday environment
everyday listening
the interpretation of the sound in terms of its source
exhaustive search
See also: brute-force search

expectation maximization (EM)
an iterative method to find maximum likelihood or maximum a posteriori estimates of parameters in statistical models




F
F-score, f1-score
an evaluation metric to take into account both the precision and the recall
See also: evaluation metric

false negative (FN)
an example wrongly predicted as negative class




false positive (FP)
an example wrongly predicted as positive class




fast Fourier transform (FFT)




feature
feature engineering
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature learning
feature extraction


feature learning
automatically discover needed representations
See also: feature engineering

feature level fusion
Features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: early fusion
feature selection



federated learning
machine learning technique to train a model across multiple devices
See also: collaborative learning


feedback

feedforward

feedforward neural network (FNN)
See also: deep neural network, neural network



filter




filter bank
an array of band-pass filters


folksonomy
classification based on user's tags



frame




frame blocking
See also: frame
frame stacking
free field

frequency domain
See also: time domain




frequency resolution

full artificial intelligence (Full AI)
fully connected layer
See also: deep neural network, neural network

fundamental frequency

G
gammatone feature cepstral coefficients (GFCC)
gammatone filter
gated recurrent unit (GRU)
See also: neural network, deep neural network
Gaussian mixture model (GMM)
See also: mixture model

generative adversarial network (GAN)
technique where a generator generates data candidates and a discriminator evaluates them.
See also: deep neural network, neural network

generative learning
See also: discriminative learning

ground truth
See also: reference label, annotation
H
hand-crafted feature
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature engineering
harmonic

head-related transfer function (HRTF)

heuristic
a practical and suboptimial solution



hidden layer
in neural network, layer between the input layer and the output layer
See also: deep neural network, neural network

hidden Markov model (HMM)




hierarchical classification




histogram of oriented gradients (HOG)

holdout data
examples which are only used for testing the system's performance
See also: cross-validation
hyperparameter
in machine learning, a variable which is set before the learning process starts
See also: parameter

hyponym

I
i-vector
implementation

impulse response



independent component analysis (ICA)
indexing

information retrieval

input

input layer
See also: deep neural network, neural network

inter-annotator agreement
a measurement of how well human annotators agree while annotation task

interclass correlation




intermediate statistics
intraclass correlation




inverse fast Fourier transform (IFFT)
J
jitter

K
k-fold cross-validation
See also: cross-validation

k-nearest-neighbor (kNN)
See also: nearest neighbor

kernel

knowledge

L
labeled example
an example with audio and assigned category label
labeling

language acquisition




late fusion
Combaning outputs from multiple classifiers.
latent variable
a variable that is not directly observed but is inferred based on a model from other observed variables




layer
leaderboard
a board showing the ranking of participant in a competition

learning rate
a hyperparameter to control the size of the learning step, gradient step
See also: deep neural network, neural network

likelihood




likelihood ratio test




linear discriminant analysis (LDA)

linear prediction
a mathematical operation to estimate future values as a linear function of previous values


linear prediction cepstral coefficients (LPCC)
local binary patterns (LBP)

localization

long short-term memory (LSTM)
See also: deep neural network, neural network

loss function
a function to measure how far prediction are from its label




loudness




loudness level

M
machine learning (ML)
field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" from data, without being explicitly programmed




machine listening
field of study of algorithms and systems for audio understanding by machine

macro-averaging
See also: evaluation metric

magnitude response

majority voting
maximum likelihood estimator (MLE)




mean square error (MSE)
See also: root mean square error




mel-frequency cepstral coefficients (MFCCs)


mel-scaled spectrogram
mel scale
non-linear perceptual frequency scale where listners judge frequencies to be equal in distance from one another.



meta learning

metadata

micro-averaging
See also: evaluation metric

mini-batch
See also: batch

misclassification




mixture model
See also: Gaussian mixture model

mixture signal
modal

modality
model
in machine learning system, a parameter set learned from the training data

modeling

monaural
related to one ears

monitoring

monoaural
See also: monophonic




monophonic
See also: monoaural

multi-annotator
multi-class classification
classification type where prediction is done between three or more classes

multi-condition training
multi-label classification
classification type where multiple class labels may be assigned to each instance
See also: single-label classification
multi-task learning
approach where multiple learning tasks are solved at the same time

multichannel, multiple channel
See also: single-channel

multilayer perceptron (MLP)
See also: neural network, deep neural network




multimodal

multiple kernel learning (MKL)
machine learning method to use a predefined set of kernels and learn optimal combination of these kernels
music information retrieval (MIR)
interdisciplinary science of retrieving information from music


N
naive Bayesian classification

naive listener

narrowband

near field

nearest neighbor
See also: k-nearest-neighbor


neural network (NN)
network of (artificial) neurons




neuron
a node in a neural network taking in multiple values and generating single value as an output

noise



noise suppression

noisy label
non-negative matrix factorization (NMF)

nonlinear, non-linear

normal distribution




normalization
converting values into standard range of values


null hypothesis
general statement that there is no relationship between two measured phenomena




O
objective
a metric the algorithm tries to optimize

one-hot encoding
representing categorical variables as binary vectors so that only single element is set to one
one-shot learning
machine learning approach where aim is to learn from a single training example
ontology
a structure of concepts or entities within a domain which are organized by relationships



open set classification
See also: closed set classification
optimizer
in neural network, an implementation of gradient descent algorithm

outliers
observation points that are distant from other observations




output
See also: input

output layer
last layer of a neural network outputting predictions
See also: deep neural network, neural network

overfitting
a model that models the training data too closely and fails to predict correcly on new data
See also: underfitting




P
paralinguistics
parallel

parameter
in machine learning, a variable which is adjusted during the learning process
See also: hyperparameter

parsing

part-of-speech tagging (POS tagging)
the process of marking up a word in a text as corresponding to a particular part of speech
See also: part of speech




part of speech (POS)
See also: part-of-speech tagging




pattern

pattern recognition




perception



perceptual
See also: perception

perceptual spread
performance
in machine learning, refers to the goodness of the model's predictions

pitch



pitch shifting
See also: pitch


polyphonic annotation
pooling
in neural network, reducing matrix into a smaller matrix
See also: deep neural network, neural network

posterior probability



pre-trained model
a model which has been already trained

precision
a measure how often prediction is correct when predicting the positive class
See also: recall, F-score, evaluation metric




prediction error

principal component analysis (PCA)
a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components



prior distribution

prior probability, a priori probability
See also: prior distribution




probability




probability measure

pruning

psychoacoustics
the scientific study of sound perception and audiology




Q
quantization, quantizing




R
random effect

random forest (RF)
an ensemble learning method which constructs a multiple decision trees at training stage


random noise

randomization




recall
a measure how many positive classes were correctly predicted
See also: precision, F-score, evaluation metric

reciever operating characteristic curve (ROC curve)
a curve of true positive rate versus false positive rate at different classification thresholds

recognition

recurrent neural network (RNN)
a neural network to model sequential interactions through a hidden stage or memory
See also: neural network, deep neural network



recursive quantitative analysis
reference label
See also: ground truth, annotation
regression analysis



regularization
in machine learning, penalizes a model's compelixity in order to prevent overfitting



reinforcement learning
machine learning technique to focusing on peformance, finding a balance between exploration of new knowledge and exploitation of current knowledge




repository

reproducibility




retrieval

reverberation



reverberation time (RT)

robitics




robust classification

robustness




room acoustics

room response

room simulation
root mean square error (RMSE)
See also: mean square error




roughness

S
saliency

salient

sampling frequency
See also: sampling rate

sampling rate
See also: sampling frequency

search algorithm




segment-based metric
See also: evaluation metric

segmentation

self-organizing map (SOM)
artificial neural network that is trained using unsupervised learning to produce a low-dimensional discretized representation of the input space




semantic information
semi-supervised learning
machine learning technique to use small amount of labeled data and large amount of unlabeled data in the learning stage



sensitivity
See also: evaluation metric

sensor

sensor node
See also: sensor


sharpness
short-time Fourier transform (STFT)




signal-to-interference ratio (SIR)


signal-to-noise ratio (SNR)



signal modeling

signal processing




significance level

similarity




similarity measure
single-channel
See also: multichannel
single-label classification
classification type where single class label may be assigned to each instance
See also: binary classification, multi-label classification
sinusoidal modeling
situational awareness (SA)
the perception of environmental elements and events with respect to time or space, the comprehension of their meaning, and the projection of their future status



smoothing




sonification

sound event

sound event detection (SED)
See also: sound event

sound event instance
sound event localization and detection (SELD)
sound pressure




sound pressure level (SPL)
See also: sound pressure

sound quality

sound source

soundscape

source proximity

sparse matrix
matrix which has elements predominantly zero




sparsity
number of zero elements a matrix divided by the total number of elements
speaker diarisation
Process of spliting audio signal in to segments accroding to the speakers

specificity
See also: evaluation metric

spectral centroid

spectral clustering
grouping related examples together using the eigenvalues of similarity matrix


spectral envelope
spectral flatness
spectral flux

spectral moments
spectral roll-off

spectral slope

spectrogram




spectrum




speech analysis

speech enhancement
improvement of speech quality by using various algorithms

speech processing

speech recognition




speech segmentation

speech separation
standard deviation

statistical model




statistical significance




stride
in convolution or pooling, the delta on horizontal or vertical dimension of the next input slice
strong annotation
See also: annotation, weak annotation
strong artificial intelligence (Strong AI)
strong label
See also: strong annotation, weak label, weak annotation
subband power distribution (SPD)
supervised learning
learning method which learns from labeled examples
See also: unsupervised learning




support vector machine (SVM)



system

system development

T
tag

taxonomy
a classification in a hierarchical system



temporal integration

test set
subset of data used to test the system, disjunct from the training set
See also: training set, validation set

testing data
See also: test set
textual label
texture

timbre




time-frequency representation

time domain




time domain envelope
time stretching
changing the duration of an audio singal without affecting its pitch
See also: pitch shifting
training
a process of determining the optimal parameters of the model

training data

training example
training set
subset of data used to train the system, disjunct from the test set
See also: test set, validation set

transfer function




transfer learning
a research problem focusing on storing knowledge gained while solving one problem and applying it to a different problem


transformation

transient

transition probability

trigram
true negative (TN)
an example correctly predicted as negative class

true positive (TP)
an example correctly predicted as positive class

U
underfitting
a model with low predictive ability because it does not model the training data well nor it does generalize to new data
See also: overfitting

unsupervised learning
machine learning technique to learn from unlabeled data
See also: supervised learning




V
validation

validation set
subset of data used to adjust hyperparameters, disjunct from the training set and test set
See also: training set, test set

W
waveform


wavelets



weak annotation
See also: annotation
weak artificial intelligence (Weak AI)



weak label
See also: weak annotation, strong label, strong annotation
weakly labeled
See also: annotation, weak annotation
wideband

wildlife monitoring

windowing
See also: windowing function

windowing function
function that is zero-valued outside of some chosen interval
See also: windowing




word embedding
mapping word or phrase from the vocabulary into vector of real numbers
See also: embeddings

word error rate (WER)
word sense disambiguation
identifying which sense of a word is used in a sentence, when the word has multiple meanings




WordNet

Z
zero-shot learning (ZSL)

zero crossing rate (ZCR)
