This glossary collects key terms from the field of computational audio content analysis. This glossary aims to provide Finnish translations for key terms in this field to stabilize the terminology. To make it usable also for others than Finnish researchers, a brief English definition is added for each term with links to Wikipedia and Wiktionary. Furthermore, some of the terms have been translated into German, Spanish, and French through Wikipedia. The glossary does not try to be complete, as it is a work in progress.
Data file used to create this glossary is published as repository:
If you see an error or you want to contribute, make a pull request to the repository or send me an email.
Special thanks to Tomasz Mąka for the Polish translations.
Dictionaries and glossaries from related fields:
- English-Finnish dictionary for general audio signal processing terms by Vesa Välimäki
- English-Finnish dictionary for statistics and probability theory terms by Petri Koistinen
- English-Finnish dictionary/glossary for language technology by Kimmo Koskenniemi
- Bank of Finnish terminology in arts and sciences
- Glossary of statistical terms by ISI
- Machine learning glossary by Google
Terms 387,
Translations
279
94
98
102
120 ,
Updated 2020-03-31
A
accuracy
The fraction of system output which was predicted correctly
See also: evaluation metric





acoustic feature
See also: feature


acoustic model
in speech recognition system, model learned from acoustic data


acoustic scene

activation function
in neural network, a function to define the output of a neuron




active learning
a learning technique where algorithm selects the data it learns from


aggregation





Amazon mechanical turk (AMT)
crowdsourcing marketplace enabling the use of human intelligence to perform tasks
annotation
adding metadata to audio


area under the curve (AUC)
in binary classification, an evaluation metric to considers all classification thresholds
See also: reciever operating characteristic curve

artificial intelligence (AI)
an ability to have machines act with apparent intelligence





assisted living
a housing facility for people with disabilities or for adults who cannot or choose not to live independently


audio classification
See also: classification


audio dataset
a collection of audio examples used for system development
audio signal processing


audio source separation

audio tagging
audiovisual data

auditory
relating to hearing


auditory event
See also: auditory scene

auditory scene

auditory scene analysis (ASA)
a model proposed by Albert Bregman for the basis of auditory perception

B
background noise


backpropagation, backprop
method used in neural networks to calculate gradient descent



bag of frames
representing frames without taking into account their order
balanced accuracy (BACC)
batch
in neural network training, a set of examples used in one iteration for model training

batch normalization (BN)
a technique for improving the performance and stability of neural networks
See also: deep neural network, neural network, batch

beamforming
technique used in sensor arrays for directional signal reception or transmission



bigram
binary classification
a type of classification which outputs one of two mutually exclusive classes


binary mask

binaural
related to two ears


bioacoustics
cross-disciplinary science that combines biology and acoustics





block mixing
data augmentation technique
See also: data augmentation
boosting
a machine learning technique which iteratively combines weak classifiers into a classifier with higher accuracy

C
category
a group to which items are assigned based on similarity or defined criteria


cepstrum

class label

classification
identification of which categories an item belong




classification model
See also: model, classification
classification of events, activities and relationships (CLEAR)
evaluation campaign organized on 2006 and 2007
classification threshold
See also: classification
classifier
See also: classification


closed set classification
See also: open set classification
cluster
See also: cluster analysis




cluster analysis




clustering
grouping related examples together





cognitive modeling


computational audio content analysis
See also: content analysis

computational auditory scene analysis (CASA)

computational linguistics
an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective

computational modeling

computer audition (CA)
field of study of algorithms and systems for audio understanding by machine

confusion matrix
an NxN table to summarize classification performance (predicted class versus actual class)


connectionist temporal classification (CTC)
constant-Q cepstral coefficients (CQCC)
constant-Q transform (CQT)
content analysis





context

context-aware
convolution
mathematical operation of two functions to produce a third function that expresses how the shape of one is modified by the other





convolutional neural network (CNN)
a neural network with convolutional layers along with pooling and fully connected layers
See also: neural network, deep neural network



convolutional recurrent neural network (CRNN)
See also: neural network, deep neural network
corpus


cross-validation
a method for estimating generalization of a system for new data by reserving a subset of dataset only for testing





crowdsourcing

D
data

data acquisition
data augmentation
artificially increasing the number of training examples

data post-processing
See also: data preprocessing

data preprocessing
See also: data post-processing

decision boundary
learned separating boundary between classes

decision tree
a learning method using tree-like decision graph


deep learning
a multi-level algorithm that gradually identifies things at higher levels of abstraction


deep neural network (DNN)
neural network containing multiple hidden layers

detection


detection and classification of acoustic scenes and events (DCASE)
detection error tradeoff (DET)
deterministic


diarization error rate (DER)
dimensionality

direction of arrival (DOA)

discrete cosine transform (DCT)




discrete-time Fourier transform (DFT)

discriminant analysis





discriminative learning
modeling the dependence of a target variable y on an observed variable x
See also: generative learning
dissimilarity
See also: similarity




domain adaptation
downmixing, down-mixing
mixing audio channels together
duration
a length of audio signal

dynamic range





E
embeddings
a low-dimensional space into which high-dimensional vectors can be translated
See also: word embedding

empirical

ensemble
ensemble learning
use multiple learning algorithms to obtain better predictive performance than any of the constituent learning algorithms alone

epoch
while traning neural networks, one pass of the full training set
See also: deep neural network, neural network


equal error rate (EER)
error rate

evaluation metric
event-based metric
See also: evaluation metric

event offset
event onset
everyday environment
everyday listening
the interpretation of the sound in terms of its source
expectation maximization (EM)
an iterative method to find maximum likelihood or maximum a posteriori estimates of parameters in statistical models




F
F-score, f1-score
an evaluation metric to take into account both the precision and the recall
See also: evaluation metric

false negative (FN)
an example wrongly predicted as negative class




false positive (FP)
an example wrongly predicted as positive class




fast Fourier transform (FFT)





feature
feature engineering
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature learning
feature extraction



feature learning
automatically discover needed representations
See also: feature engineering

feature selection




feedforward neural network (FNN)
See also: deep neural network, neural network



filter





filter bank
an array of band-pass filters



folksonomy
classification based on user's tags



frame





frame blocking
See also: frame
frame stacking
frequency domain
See also: time domain





frequency resolution


fully connected layer
See also: deep neural network, neural network

G
gammatone feature cepstral coefficients (GFCC)
gammatone filter
gated recurrent unit (GRU)
See also: neural network, deep neural network
Gaussian mixture model (GMM)
See also: mixture model

generative adversarial network (GAN)
technique where a generator generates data candidates and a discriminator evaluates them.
See also: deep neural network, neural network
generative learning
See also: discriminative learning

ground truth
See also: reference label, annotation
H
hand-crafted feature
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature engineering
harmonic


head-related transfer function (HRTF)

heuristic
a practical and suboptimial solution
hidden layer
in neural network, layer between the input layer and the output layer
See also: deep neural network, neural network


hidden Markov model (HMM)





hierarchical classification





histogram of oriented gradients (HOG)

holdout data
examples which are only used for testing the system's performance
See also: cross-validation
hyperparameter
in machine learning, a variable which is set before the learning process starts
See also: parameter

hyponym

I
i-vector
implementation


impulse response


independent component analysis (ICA)
indexing

information retrieval


input

input layer
See also: deep neural network, neural network


inter-annotator agreement
a measurement of how well human annotators agree while annotation task
interclass correlation




intermediate statistics
intraclass correlation




inverse fast Fourier transform (IFFT)

J
jitter


K
k-fold cross-validation
See also: cross-validation

k-nearest-neighbor (kNN)
See also: nearest neighbor

kernel


knowledge


L
labeled example
an example with audio and assigned category label
language acquisition




latent variable
a variable that is not directly observed but is inferred based on a model from other observed variables




layer
leaderboard
a board showing the ranking of participant in a competition

learning rate
a hyperparameter to control the size of the learning step, gradient step
See also: deep neural network, neural network

likelihood





likelihood ratio test




linear discriminant analysis (LDA)
linear prediction
a mathematical operation to estimate future values as a linear function of previous values



linear prediction cepstral coefficients (LPCC)
local binary patterns (LBP)

localization


long short-term memory (LSTM)
See also: deep neural network, neural network

loss function
a function to measure how far prediction are from its label




loudness


M
machine learning (ML)
field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" from data, without being explicitly programmed





machine listening
field of study of algorithms and systems for audio understanding by machine


macro-averaging
See also: evaluation metric

magnitude response

majority voting

maximum likelihood estimator (MLE)




mean square error (MSE)
See also: root mean square error





mel-frequency cepstral coefficients (MFCCs)


mel scale
mel-scaled spectrogram
micro-averaging
See also: evaluation metric

mini-batch
See also: batch

misclassification




mixture model
See also: Gaussian mixture model


mixture signal
modal

model
in machine learning system, a parameter set learned from the training data


modeling

monaural
related to one ears

monitoring

multi-class classification
classification type where prediction is done between three or more classes

multi-condition training
multi-label classification
classification type where multiple class labels may be assigned to each instance
See also: single-label classification
multi-task learning
approach where multiple learning tasks are solved at the same time
multichannel, multiple channel
See also: single-channel

multilayer perceptron (MLP)
See also: neural network, deep neural network





multiple kernel learning (MKL)
machine learning method to use a predefined set of kernels and learn optimal combination of these kernels
music information retrieval (MIR)
interdisciplinary science of retrieving information from music


N
naive Bayesian classification


narrowband


nearest neighbor
See also: k-nearest-neighbor


neural network
network of (artificial) neurons





neuron
a node in a neural network taking in multiple values and generating single value as an output


noise


noise suppression
non-negative matrix factorization (NMF)


nonlinear, non-linear


normal distribution





normalization
converting values into standard range of values



null hypothesis
general statement that there is no relationship between two measured phenomena





O
objective
a metric the algorithm tries to optimize

one-hot encoding
representing categorical variables as binary vectors so that only single element is set to one
one-shot learning
machine learning approach where aim is to learn from a single training example
ontology
a structure of concepts or entities within a domain which are organized by relationships




open set classification
See also: closed set classification
optimizer
in neural network, an implementation of gradient descent algorithm

outliers
observation points that are distant from other observations





output
See also: input

output layer
last layer of a neural network outputting predictions
See also: deep neural network, neural network


overfitting
a model that models the training data too closely and fails to predict correcly on new data
See also: underfitting





P
parallel


parameter
in machine learning, a variable which is adjusted during the learning process
See also: hyperparameter

parsing

part of speech (POS)
See also: part-of-speech tagging





part-of-speech tagging (POS tagging)
the process of marking up a word in a text as corresponding to a particular part of speech
See also: part of speech




pattern

pattern recognition





perception




perceptual
See also: perception

perceptual spread
performance
in machine learning, refers to the goodness of the model's predictions

pitch




pitch shifting
See also: pitch


polyphonic annotation
pooling
in neural network, reducing matrix into a smaller matrix
See also: deep neural network, neural network

posterior probability




pre-trained model
a model which has been already trained

precision
a measure how often prediction is correct when predicting the positive class
See also: recall, F-score, evaluation metric




prediction error


principal component analysis (PCA)
a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components



prior distribution

prior probability, a priori probability
See also: prior distribution




probability





probability measure

pruning

psychoacoustics
the scientific study of sound perception and audiology





Q
quantization, quantizing





R
random effect


random forest (RF)
an ensemble learning method which constructs a multiple decision trees at training stage


random noise


randomization




recall
a measure how many positive classes were correctly predicted
See also: precision, F-score, evaluation metric

reciever operating characteristic curve (ROC curve)
a curve of true positive rate versus false positive rate at different classification thresholds

recognition

recurrent neural network (RNN)
a neural network to model sequential interactions through a hidden stage or memory
See also: neural network, deep neural network



recursive quantitative analysis
reference label
See also: ground truth, annotation
regression analysis




regularization
in machine learning, penalizes a model's compelixity in order to prevent overfitting




reinforcement learning
machine learning technique to focusing on peformance, finding a balance between exploration of new knowledge and exploitation of current knowledge




repository

reproducibility




retrieval

reverberation




robustness




room response
room simulation
root mean square error (RMSE)
See also: mean square error




S
saliency

salient

sampling frequency


sampling rate

search algorithm




segment-based metric
See also: evaluation metric

segmentation


self-organizing map (SOM)
artificial neural network that is trained using unsupervised learning to produce a low-dimensional discretized representation of the input space





semi-supervised learning
machine learning technique to use small amount of labeled data and large amount of unlabeled data in the learning stage



sensitivity
See also: evaluation metric

sensor

sensor node
See also: sensor
sharpness
short-time Fourier transform (STFT)




signal modeling

signal processing



signal-to-interference ratio (SIR)


signal-to-noise ratio (SNR)




significance level


similarity





similarity measure
a measure to determine how similar two examples are
See also: similarity


single-channel
See also: multichannel
single-label classification
classification type where single class label may be assigned to each instance
See also: binary classification, multi-label classification
sinusoidal modeling
situational awareness (SA)
the perception of environmental elements and events with respect to time or space, the comprehension of their meaning, and the projection of their future status



smoothing




sound event


sound event detection (SED)
See also: sound event

sound event instance
sound pressure





sound pressure level (SPL)
See also: sound pressure

sound quality


sound source

source proximity

sparse matrix
matrix which has elements predominantly zero





sparsity
number of zero elements a matrix divided by the total number of elements
speaker diarisation
specificity
See also: evaluation metric

spectral centroid

spectral clustering
grouping related examples together using the eigenvalues of similarity matrix


spectral envelope

spectral flatness
spectral flux

spectral moments
spectral roll-off

spectral slope

spectrogram





spectrum





speech analysis


speech enhancement
improvement of speech quality by using various algorithms

speech processing

speech recognition





speech segmentation


standard deviation


statistical model





statistical significance




stride
in convolution or pooling, the delta on horizontal or vertical dimension of the next input slice
strong annotation
See also: annotation, weak annotation
strong label
See also: strong annotation, weak label, weak annotation
subband power distribution (SPD)
supervised learning
learning method which learns from labeled examples
See also: unsupervised learning





support vector machine (SVM)



system

system development

T
tag

taxonomy
a classification in a hierarchical system



temporal integration

test set
subset of data used to test the system, disjunct from the training set
See also: training set, validation set

testing data
See also: test set
textual label
texture

time domain




time domain envelope

time-frequency representation


time stretching
changing the duration of an audio singal without affecting its pitch
See also: pitch shifting
training
a process of determining the optimal parameters of the model

training data

training example
training set
subset of data used to train the system, disjunct from the test set
See also: test set, validation set

transfer function




transfer learning
a research problem focusing on storing knowledge gained while solving one problem and applying it to a different problem


transformation

transition probability

trigram
true negative (TN)
an example correctly predicted as negative class

true positive (TP)
an example correctly predicted as positive class

U
underfitting
a model with low predictive ability because it does not model the training data well nor it does generalize to new data
See also: overfitting

unsupervised learning
machine learning technique to learn from unlabeled data
See also: supervised learning




V
validation

validation set
subset of data used to adjust hyperparameters, disjunct from the training set and test set
See also: training set, test set

W
waveform


wavelets



weak annotation
See also: annotation
weak label
See also: weak annotation, strong label, strong annotation
weakly labeled
See also: annotation, weak annotation
wideband

wildlife monitoring

windowing
See also: windowing function

windowing function
function that is zero-valued outside of some chosen interval
See also: windowing




word embedding
mapping word or phrase from the vocabulary into vector of real numbers
See also: embeddings

word error rate (WER)
word sense disambiguation
identifying which sense of a word is used in a sentence, when the word has multiple meanings




WordNet

Z
zero crossing rate (ZCR)
