Publications of Tuomas Virtanen

Publications on Google Scholar.

Publications

M. Neri, A. Politis, and T. Virtanen. Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation. In proc. International Workshop on Acoustic Signal Enhancement, accepted for publication.

M. Heikkinen, A. Politis, K. Drossos, and T. Virtanen. Neural Array-Generic Direction-of-Arrival Estimation Exploiting Array Transfer Functions. In proc. International Workshop on Acoustic Signal Enhancement, accepted for publication.

P. Sudarsanam and T. Virtanen. Self-Supervised Object-Centric Representation for Multi-Task Audio Analysis. In proc. International Workshop on Acoustic Signal Enhancement, accepted for publication.

Y. Wang, K. Lahtinen, P. Lauha, S. Zhang, P. Somervuo, O. Ovaskainen, and T. Virtanen. Mixture-Constrained Max Pooling Improves Separation-Based Bird Species Classification In proc. International Workshop on Acoustic Signal Enhancement, accepted for publication.

D. Luong, K. Drossos, M. Heikkinen, and T. Virtanen. Automatic Contextual Audio Denoising. In proc. European Signal Processing Conference 2026, accepted for publication.

B. Turi, A. Politis, P. Sudarsanam, and T. Virtanen. Speaker Head Orientation Estimation with a Single Microphone Array Using Phase Spectrogram Features. In proc. European Signal Processing Conference 2026, accepted for publication.

M. Neri and T. Virtanen. Multi-Channel Replay Speech Detection using Acoustic Maps . In proc. European Signal Processing Conference 2026, accepted for publication.

R. Pandey, J. Garcia-Martinez, P. Cabañas-Molero, D. Diaz-Guerra, R. Falcón Pérez, T. Virtanen, J.J. Carabias-Orti, and P. Vera-Candeas. Learning Input-Channel Permutation Equivariance for Multi-Channel Source Separation: Reducing Bleeding in Small Music Ensembles. In proc. European Signal Processing Conference 2026, accepted for publication.

M. Dumpis and T. Virtanen. Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier. In proc. European Signal Processing Conference 2026, accepted for publication.

J. Garcia-Martinez, D. Diaz-Guerra, J. Anderson, R. Falcon-Perez, P. Cabañas-Molero, T. Virtanen, J. J. Carabias-Orti, and P. Vera-Candeas. The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval. IEEE Transactions on Audio, Speech and Language Processing, accepted for publication.

Y. Wang, A. Politis, K. Drossos, and Tuomas Virtanen. Moving Speaker Separation via Parallel Spectral-Spatial Processing. IEEE Transactions on Audio, Speech and Language Processing, accepted for publication.

M. Adamopoulou, P. Sudarsanam, D. Diaz-Guerra, M. Jiang, A. Politis, S. J. Mousavirad, T. Virtanen, and J. Lundgren. CNN Models for Microphone Array Covariance Matrix Upsampling and Acoustic Imaging. In proc. IEEE International Symposium on Artificial Intelligence for Instrumentation and Measurement, 2026.

Y. Li, J. Tan, Q. Li, G. Chen, S. Huang, and T. Virtanen. Few-Shot Open-Set Audio Classification Using Attention Information-Fused Prototypes. IEEE Transactions on Audio, Speech and Language Processing, accepted for publication.

M. Heikkinen, A. Politis, K. Drossos, and T. Virtanen. Beyond Omnidirectional: Neural Ambisonics Encoding for Arbitrary Microphone Directivity Patterns using Cross-Attention. In proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2026. Accepted for publication.

M. Silaev, K. Drossos, and T. Virtanen. Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers. In proc. the Joint Workshop on HSCMA and CHiME 2026, accepted for publication.

I. Martin, P. Sudarsanam, and T. Virtanen. Analysing Human-Generated Captions for Audio and Visual Scenes. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2025.

D. Dogan, H. Xie, T. Heittola, and T. Virtanen. On the Role of Training Class Distribution in Zero-Shot Audio Classification. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2025.

K. Shimada, A. Politis, I. Roman, P. Sudarsanam, D. Diaz-Guerra, R. Pandey, K. Uchida, Y. Koyama, N. Takahashi, T. Shibuya, S. Takahashi, T. Virtanen, and Y. Mitsufuji. Stereo Sound Event Localization and Detection with Onscreen/Offscreen Classification. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2025.

A. Triantafyllopoulos, I. Tsangko, A. Gebhard, A. Mesaros, T. Virtanen, and B. Schuller. Computer Audition: From Task-Specific Machine Learning to Foundation Models. Proceedings of the IEEE, Volume: 113, Issue: 4, 2025.

J. Martinsson, T. Virtanen, M. Sandsten, and O. Mogren. The Accuracy Cost of Weakness: A Theoretical Analysis of Fixed-Segment Weak Labeling for Events in Time. Transactions on Machine Learning Research, September 2025.

Y. Wang, A. Politis, K. Drossos, and T. Virtanen. Multi-Utterance Speech Separation and Association Trained on Short Segments. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2025.

P. Sudarsanam, I. Martin-Morato, A. Hakala, and T. Virtanen. AVCaps: An Audio-visual Dataset with Modality-specific Captions. IEEE Open Journal of Signal Processing, Volume 6, 2025.

Y. Wang, A. Politis, K. Drossos, and T. Virtanen. Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers. In proc. Interspeech 2025.

W. Dai, A. Politis, and T. Virtanen. Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction. In proc. Interspeech 2025.

S. Zhang and T. Virtanen. Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection. In proc. European Signal Processing Conference 2025.

E. Tunturi, D. Diaz-Guerra, A. Politis, and T. Virtanen. Score-Informed Music Source Separation: Improving Synthetic-To-Real Generalization in Classical Music. In proc. European Signal Processing Conference 2025.

P. Sudarsanam, I. Martín-Morató, and T. Virtanen. Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities. In proc. European Signal Processing Conference 2025.

M. Neri and T. Virtanen. Impact of Microphone Array Mismatches to Learning-Based Replay Speech Detection. In proc. European Signal Processing Conference 2025.

M. Neri and T. Virtanen. Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer. IEEE Open Journal of Signal Processing, Volume 6, 2025.

M. Heikkinen, A. Politis, K. Drossos, and T. Virtanen. Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2025.

A. Mesaros, R. Serizel, T. Heittola, T. Virtanen, and M. Plumbley. A Decade of DCASE: Achievements, Practices, Evaluations and Future Challenges. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2025.

H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Text-based Audio Retrieval by Learning from Similarities between Audio Captions. IEEE Signal Processing Letters, Volume 32, 2025.

J. Garcia-Martinez, D. Diaz-Guerra, A. Politis, T Virtanen, J. J. Carabias-Orti, and P. Vera-Candeas. SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation. IEEE Open Journal of Signal Processing, Volume 6, 2025.

M. Moritz, T. Olán, and T. Virtanen. Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking. In proc. IEEE International Symposium on the Internet of Sounds 2024.

H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2024.

D. Diaz-Guerra, A. Politis, P. Sudarsanam, K. Shimada, D. Krause, K. Uchida, Y. Koyama, N. Takahashi, S. Takahashi, T. Shibuya, Y. Mitsufuji, and T. Virtanen. Baseline models and evaluation of sound event localization and detection with distance estimation in DCASE 2024 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2024.

D. Dogan, H. Xie, T. Heittola, and T. Virtanen. Multi-Label Zero-Shot Audio Classification with Temporal Attention. In proc. 18th International Workshop on Acoustic Signal Enhancement, 2024.

L. Hekanaho, M. Hirvonen, and T. Virtanen. Language-based machine perception: linguistic perspectives on the compilation of captioning datasets. Digital Scholarship in the Humanities, 2024

J. Martinsson, O. Mogren, M. Sandsten, and T. Virtanen. From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning. In proc. European Signal Processing Conference 2024.

W. Dai, X. Li, A. Politis, and T. Virtanen. Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement. In proc. European Signal Processing Conference 2024.

A. Hakala, T. Kincy, and T. Virtanen. Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning. In proc. European Signal Processing Conference 2024.

M. Neri , A. Politis , D. Krause , M. Carli, and T. Virtanen. Speaker Distance Estimation in Enclosures from Single-Channel Audio. IEEE/ACM Transactions on Audio, Speech and Language Processing. volume 32, 2024.

S. Drgas, L. Bramsløw, A. Politis, G. Naithani, and T. Virtanen. Dynamic Processing Neural Network Architecture for Hearing Loss Compensation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 32, 2024.

S. Gharib, M. Tran, D. Luong, K. Drossos, and T. Virtanen. Adversarial Representation Learning for Robust Privacy Preservation in Audio.. IEEE Open Journal of Signal Processing, volume: 5, 2024.

M. Heikkinen, A. Politis, and T. Virtanen. Neural Ambisonics Encoding for Compact Irregular Microphone Arrays. In proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024.

Y. Wang, A. Politis, and T. Virtanen. Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios. In proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024.

K. Shimada, A. Politis, P. Sudarsanam, D. Krause, K. Uchida, S. Adavanne, A. Hakala, Y. Koyama, N. Takahashi, S. Takahashi, T. Virtanen, and Y Mitsufuji. STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events. In proc. NeurIPS 2023.

M. Neri, A. Politis, D. Krause , M. Carli, and T. Virtanen. Single-Channel Speaker Distance Estimation in Reverberant Environments. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2023.

D. Luong, M. Tran, S. Gharib, K. Drossos, and T. Virtanen. Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2023.

H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2023.

D. Diaz-Guerra, A. Politis, A. Miguel, J. R. Beltran, and T. Virtanen. Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications. In proc. 10th Convention of the European Acoustics Association – Forum Acusticum 2023.

P. Sudarsanam and T. Virtanen. Attention-Based Methods for Audio Question Answering. In proc. 31st European Signal Processing Conference, 2023.

D. Diaz-Guerra, A. Politis, and T. Virtanen. Position Tracking of a Varying Number of Sound Sources with Sliding Permutation Invariant Training. In proc. 31st European Signal Processing Conference, 2023.

K. Khorrami, M. A. C. Blandón, T. Virtanen, and O. Räsänen. Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System. In proc. 31st European Signal Processing Conference, 2023.

P. Magron and T. Virtanen. Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints. In proc. 31st European Signal Processing Conference, 2023.

W. Xie, Y. Li, Q. He, W. Cao, and T. Virtanen. Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes. In proc. Interspeech 2023.

H. Xie, O. Räsänen, and T. Virtanen. On negative sampling for contrastive audio-text retrieval. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023.

S. Wang, A. Politis, A. Mesaros, and T. Virtanen. Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment. IEEE Journal of Selected Topics in Signal Processing, Volume 16, Issue 6, 2022.

H. Xie, S. Lipping, and T. Virtanen. Language-based Audio Retrieval Task in DCASE 2022 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.

A. Politis, K. Shimada, P. Sudarsanam, S. Adavanne, D. Krause, Y. Koyama, N. Takahashi, S. Takahashi, Y. Mitsufuji, and T. Virtanen. STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.

I. Martin, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen. Low-complexity acoustic scene classification in DCASE 2022 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.

Y. Li, W. Cao, K. Drossos, and T. Virtanen. Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network. In proc. International Workshop on Multimedia Signal Processing, 2022.

G. Naithani, K. Pietilä, R. Niemistö, E. Paajanen, T. Takala, and T. Virtanen. Subjective Evaluation of Deep Neural Network Based Speech Enhancement Systems in Real-World Conditions. In proc. International Workshop on Multimedia Signal Processing, 2022.

S. Lipping, P. Sudarsanam, K. Drossos, and T. Virtanen. Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering. In proc. European Signal Processing Conference, 2022.

D. Dogan, H. Xie, T. Heittola, and T. Virtanen. Zero-Shot Audio Classification using Image Embeddings. In proc. European Signal Processing Conference, 2022.

V.-V. Eklund, A. Diment, and T. Virtanen. Noise, Device and Room Robustness Methods for Pronunciation Error Detection. In proc. European Signal Processing Conference, 2022.

H. Xie, O. Räsänen, K. Drossos, and T. Virtanen. Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2022.

B. W. Schuller, T. Virtanen, M. Riveiro, G. Rizos, J. Han, A. Mesaros, and K. Drossos. Towards Sonification in Multimodal and User-friendly Explainable Artificial Intelligence. In proc. the 2021 International Conference on Multimodal Interaction, 2021.

S. Wang, T. Heittola, A. Mesaros, and T. Virtanen. Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.

I. Martín-Morató, T. Heittola, A. Mesaros, and Tuomas Virtanen. Low-complexity acoustic scene classification for multi-device audio: analysis of DCASE 2021 Challenge systems. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.

A. Politis, S. Adavanne, D. Krause, A. Deleforge, P. Srivastava, and T. Virtanen. A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.

S. Djukanović, J. Matas, and T. Virtanen. Acoustic vehicle speed estimation from single sensor measurements. IEEE Sensors Journal, Volume 21, Issue 20, 2021.

S. Adavanne, A. Politis, and T. Virtanen. Differentiable Tracking-Based Training of Deep-Learning Sound Source Localizers. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021.

A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley. Sound Event Detection: A Tutorial. In IEEE Signal Processing Magazine, Volume: 38, Issue 5, 2021.

S. Wang, G. Naithani, A. Politis, and T. Virtanen. Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair. In proc. 29th European Signal Processing Conference, 2021.

S. Djukanović, Y. Patel, J. Matas, and T. Virtanen. Neural network-based acoustic vehicle counting. In proc. 29th European Signal Processing Conference, 2021.

A. Tran, K. Drossos, and T. Virtanen. WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information. In proc. 29th European Signal Processing Conference, 2021.

P. Pertilä, E. Cakir, A. Hakala, E. Fagerlund, T. Virtanen, A. Politis, and A. Eronen. Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments. In proc. 29th European Signal Processing Conference, 2021.

S. Drgas and T. Virtanen. Joint Speaker Separation And Recognition Using Non-Negative Matrix Deconvolution With Adaptive Dictionary. Computer Speech & Language, volume 70, 2021.

H. Xie and T. Virtanen. Zero-Shot Audio Classification via Semantic Embeddings. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 29, 2021.

S. Wang, A. Mesaros, T. Heittola, and T. Virtanen. A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.

X. Favory, K. Drossos, T. Virtanen, and Xavier Serra. Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.

H. Zie, O. Räsänen, and T. Virtanen. Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.

A. Politis, A. Mesaros, S. Adavanne, T. Heittola, and T. Virtanen. Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol 29, 2021.

A. Kivinummi, G. Naithani, O. Tammela, T. Virtanen, E. Kurkela, M. Alhainen, D. J. Niehaus, A. Lachman, J. M. Leppänen, and M. J. Peltola. Associations between neonatal cry acoustics and visual attention during the first year. Frontiers in Psychology, September 30, 2020.

S. Zhao, T. Heittola, and T. Virtanen. Active Learning for Sound Event Detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, 2020.

E. Çakır, K. Drossos, and T. Virtanen. Multi-task Regularization Based on Infrequent Classes for Audio Captioning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.

P. Pyykkönen, S. I. Mimilakis, K. Drossos, and T. Virtanen. Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation. In proc. IEEE International Workshop on Multimedia Signal Processing, 2020.

T. Heittola, A. Mesaros, and T. Virtanen. Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.

K. Nguyen, K. Drossos, and T. Virtanen. Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.

A. Politis, S. Adavanne, and T. Virtanen. A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.

X. Favory, K. Drossos, T. Virtanen, X. Serra. COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations. In proc. ICML 2020 Workshop on Self-supervision in Audio and Speech.

S. Djukanovic, J. Matas, and T. Virtanen. Robust Audio-Based Vehicle Counting in Low-to-Moderate Traffic Flow. In proc. IEEE Intelligent Vehicles Symposium, 2020.

K. Drossos, S. I. Mimilakis, S. Gharib, Y. Li, and T. Virtanen. Sound Event Detection with Depthwise Separable and Dilated Convolutions. In proc. International Joint Conference on Neural Networks, 2020.

N. Nicodemo, G. Naithani, K. Drossos, T. Virtanen, and R. Saletti. Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters. In proc. 28th European Signal Processing Conference, 2020.

P. Magron and T. Virtanen. Online Spectrogram Inversion for Low-Latency Audio Source Separation. IEEE Signal Processing Letters, volume 27, 2020.

K. Drossos, S. Lipping, and T. Virtanen. Clotho: An Audio Captioning Dataset. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2020.

Y. Li, M. Liu, K. Drossos, and T. Virtanen. Sound event detection via dilated convolutional recurrent neural networks. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2020.

H. Purwins , B. Li , T. Virtanen , J. Schlüter , S.-Y. Chang, and T. Sainath. Deep Learning for Audio Signal Processing. IEEE Journal of Selected Topics in Signal Processing, volume 13, issue 2, 2019.

A. Mesaros, T. Heittola, and T. Virtanen. Acoustic scene classification in DCASE 2019 Challenge: closed and open set classification and data mismatch setups. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.

S. Lipping, K. Drossos, and T. Virtanen. Crowdsourcing a dataset of audio captions. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.

S. Adavanne, A. Politis, and T. Virtanen. Localization, detection and tracking of multiple moving sound sources with a convolutional recurrent neural network. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.

K. Drossos, S. Gharib, P. Magron, and T. Virtanen. Language modelling for sound event detection with teacher forcing and scheduled sampling. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.

S. Adavanne, A. Politis, and T. Virtanen. A multi-room reverberant dataset for sound event localization and detection. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.

H. Xie and T. Virtanen. Zero-Shot Audio Classification Based on Class Label Embeddings. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.

K. Drossos, P. Magron, and T. Virtanen. Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.

A. Mesaros, S. Adavanne, A. Politis, T. Heittola, and T. Virtanen. Joint Measurement of Localization and Detection of Sound Events. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.

H. L. Bear, T. Heittola, A. Mesaros, E. Benetos, and T. Virtanen. City classification from multiple real-world sound scenes. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.

M. C. Green, D. Murphy, S. Adavanne, and T. Virtanen. Acoustic Scene Classification Using Higher-Order Ambisonic Features. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.

I. Ahsan, C. Kertesz, A. Mesaros, T. Heittola, A. Knight, and T. Virtanen. Audio-Based Epileptic Seizure Detection. In proc. European Signal Processing Conference, 2019.

A. Mesaros, A. Diment, B. Elizalde, T. Heittola, E. Vincent, B. Raj, and T. Virtanen. Sound event detection in the DCASE 2017 Challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 27, issue 6, 2019.

S. Wang, G. Naithani, and T. Virtanen. Low-Latency Deep Clustering For Speech Separation. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2019.

I. Martín-Morató, A. Mesaros, T. Heittola, T. Virtanen, M. Cobos, F. J. Ferri. Sound Event Envelope Estimation in Polyphonic Mixtures. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2019.

A. Diment, E. Fagerlund, A. Benfield and T. Virtanen. Detection of Typical Pronunciation Errors in Non-native English Speech Using Convolutional Recurrent Neural Networks. In proc. International Joint Conference on Neural Networks, 2019.

V. M. Garcia-Molla, P. S. Juan, T. Virtanen, A. M. Vidala, and P. Alonso. Generalization of the K-SVD algorithm for minimization of β-divergence. In Digital Signal Processing, volume 92, 2019.

P. Magron and T. Virtanen. Complex ISNMF: A Phase-Aware Model for Monaural Audio Source Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 27, Issue 1, 2019.

T. Virtanen, M. D. Plumbley, D. Ellis (eds). Computational Analysis of Sound Scenes and Events. Springer, 2018.

E. Vincent, T. Virtanen, and S. Gannot (eds). Audio Source Separation and Speech Enhancement. Wiley, 2018.

S. Adavanne, A. Politis, J. Nikunen, T. Virtanen. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks. IEEE Journal of Selected Topics in Signal Processing, volume 13, issue 1, 2019.

L. Bramsløw, G. Naithani, A. Hafez, T. Barker, N. H. Pontoppidan, and T. Virtanen. Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America, Vol. 144, No. 1, 2018. Copyright 2018 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America. The article appeared in the journal may be found here.

S. Gharib, K. Drossos, E. Cakir, D. Serdyuk, and T. Virtanen. Unsupervised adversarial domain adaptation for acoustic scene classification. In proc. Detection and Classification of Acoustic Scenes and Events 2018 Workshop.

A. Mesaros, T. Heittola, and T. Virtanen. A multi-device dataset for urban acoustic scene classification. In proc. Detection and Classification of Acoustic Scenes and Events 2018 Workshop.

A. Mesaros, T. Heittola, and T. Virtanen. Acoustic Scene Classification: an Overview of DCASE 2017 Challenge Entries. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

P. Magron and T. Virtanen. Towards complex nonnegative matrix factorization with the beta-divergence. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

G. Huang, T. Heittola, and T. Virtanen. Using Sequential Information in Polyphonic Sound Event Detection. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

M. Parviainen, P. Pertilä, T. Virtanen, and P. Grosche. Time-Frequency Masking Strategies for Single-Channel Low-latency Speech Enhancement Using Neural Networks. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

P. Magron and T. Virtanen. On modeling the STFT phase of audio signals with the von Mises distribution. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

K. Drossos, P. Magron, S. I. Mimilakis, and Tuomas Virtanen. Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

S. Zhao, T. Heittola, and T. Virtanen. An Active Learning Method Using Clustering and Committee-Based Sample Selection for Sound Event Classification. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

G. Naithani, J. Nikunen, L. Bramsløw, and T. Virtanen. Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. In proc. International Workshop on Acoustic Signal Enhancement, 2018.

E. Cakir and T. Virtanen. Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders. In proc. AES 145th Convention.

S. Gharib, H. Derrar, D. Niizumi, T. Senttula, J. Tommola, T. Heittola, T. Virtanen, and H. Huttunen. Acoustic Scene Classification: A Competition Review. In proc. IEEE International Workshop on Machine Learning for Signal Processing, 2018.

K. Mahkonen, T. Virtanen, and J. Kämäräinen. Cascade of Boolean detector combinations. EURASIP Journal on Image and Video Processing, 2018:61, 2018.

J.J. Carabias-Orti, J. Nikunen, T. Virtanen, and P. Vera-Candeas. Multichannel Blind Sound Source Separation using Spatial Covariance Model with Level and Time Differences and Non-Negative Matrix Factorization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 26, Issue: 9, 2018.

J. Nikunen, A. Diment, and T. Virtanen. Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, 2018.

A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, 2018.

P. Maijala, S. Zhao, T. Heittola, T. Virtanen. Environmental noise monitoring using source classification in sensors. Applied Acoustics, Volume 129, 2018.

S. Adavanne, A. Politis, and T. Virtanen. Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In proc. European Signal Processing Conference, 2018.

Pablo San Juan Sebastián, Tuomas Virtanen, Victor M. Garcia-Molla, Antonio M. Vidal. Analysis of an efficient parallel implementation of active-set Newton algorithm. The Journal of Supercomputing, May, 2018.

P. Magron and T. Virtanen. Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization. In proc. Interspeech, 2018.

P. Magron, K. Drossos, S. Mimilakis, and T. Virtanen. Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation. In proc. Interspeech, 2018.

E. Cakir and T. Virtanen. End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input. In proc. International Joint Conference on Neural Networks, 2018.

S. Adavanne, A. Politis, T. Virtanen. Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features. In proc. International Joint Conference on Neural Networks, 2018.

K. Drossos, S.I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, and Y. Bengio. MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation. In proc. International Joint Conference on Neural Networks, 2018.

S.I. Mimilakis, K. Drossos, J.F. Santos, G. Schuller, T. Virtanen, and Y. Bengio. Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.

J. Nikunen and T. Virtanen. Estimation of Time-Varying Room Impulse Responses of Multiple Sound Sources from Observed Mixture and Isolated Source Signals. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.

P. Magron and T. Virtanen. Bayesian Anisotropic Gaussian Model for Audio Source Separation. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.

G. Naithani, J. Kivinummi, T. Virtanen, O. Tammela, M.J. Peltola, and J.M. Leppänen. Automatic segmentation of infant cry signals using hidden Markov models. EURASIP Journal on Audio, Speech, and Music Processing 2018, 2018:1.

E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 25, issue 6, 2017. TUT-SED Synthetic 2016 database used in the article.

S. Adavanne and T. Virtanen. Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.

E. Cakir and T. Virtanen. Convolutional Recurrent Neural Networks for Rare Sound Event Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.

A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.

D. Caballero, R. Araya, H. Kronholm, J. Viiri, A. Mansikkaniemi, S. Lehesvuori, T. Virtanen, and M. Kurimo. ASR in Classroom Today: Automatic Visualization of Conceptual Network in Science Classrooms. European Conference on Technology Enhanced Learning, 2017.

A. Diment and T. Virtanen. Transfer Learning of Weakly Labelled Audio. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

K. Drossos, S. Adavanne, and T. Virtanen. Automated Audio Captioning with Recurrent Neural Networks. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

A. Mesaros, T. Heittola, and T. Virtanen. Assessment of Human and Machine Performance in Acoustic Scene Classification: DCASE 2016 Case Study. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

S. Zhao, T. Heittola, and T. Virtanen. Learning Vocal Mode Classifiers from Heterogeneous Data Sources. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

P. Magron, J. Le Roux, and T. Virtanen. Consistent Anisotropic Wiener Filtering for Audio Source Separation. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

G. Naithani, T. Barker, G. Parascandolo, L. Bramsløw, N. H. Pontoppidan, and T. Virtanen. Low Latency Sound Source Separation Using Convolutional Recurrent Neural Networks. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.

E. Cakir, S. Adavanne, G. Parascandolo, K. Drossos, and T. Virtanen. Convolutional Recurrent Neural Networks for Bird Audio Detection. In proc. European Signal Processing Conference, 2017.

S. Adavanne, K. Drossos, E. Cakir and T. Virtanen. Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection. In proc. European Signal Processing Conference, 2017.

J. Nikunen and T. Virtanen. Time-difference of Arrival Model for Spherical Microphone Arrays and Application to Direction of Arrival Estimation. In proc. European Signal Processing Conference, 2017.

S. I. Mimilakis, K. Drossos, T. Virtanen, G. Schuller. A Recurrent Encoder-Decoder Approach With Skip-Filtering Connections For Monaural Singing Voice Separation. In IEEE Machine Learning for Signal Processing Workshop.

M. Malik, S. Adavanne, K. Drossos, T. Virtanen, D. Ticha, and R. Jarina. Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition. In proc. Sound and Music Computing Conference, 2017.

S. Drgas, T. Virtanen, J. Lücke, A. Hurmalainen. Binary non-negative matrix deconvolution for audio dictionary learning. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 25, issue 8, 2017.

M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen. A Convolutional Neural Network Approach for Acoustic Scene Classification. In proc. The Joint International Conference on Neural Networks 2017.

S. Adavanne, P. Pertilä, and T. Virtanen. Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2017.

Zhao S.Y., T. Heittola, T. Virtanen. Active Learning for Sound Event Classification by Clustering Unlabeled Data. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2017.

A. Mesaros, T. Heittola, T. Virtanen. Metrics for Polyphonic Sound Event Detection. Applied Sciences, Volume 6, Issue 6, 2016.

T. Barker and T. Virtanen. Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorisation of Modulation Spectograms., IEEE/ACM Transactions on Audio, Speech and Language Processing. Volume: 24, Issue: 12, 2016.

G. Naithani, G. Parascandolo, T. Barker, N. H. Pontoppidan, T. Virtanen. Low-Latency Sound Source Separation Using Deep Neural Networks. In proc IEEE Global Conference on Signal and Information Processing, 2016.

K. Drossos, M. Kaliakatsos-Papakostas, A. Floros, T. Virtanen. On the Impact of The Semantic Content of Sound Events in Emotion Elicitation. Journal of the Audio Engineering Society, Vol. 64, No. 7/8, 2016.

E. Cakir, E. Ozan, T. Virtanen. Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection. In proc. the International Joint Conference on Neural Networks 2016.

S. I. Mimilakis, K. Drossos, T. Virtanen, G. Schuller. Deep Neural Networks for Dynamic Range Compression in Mastering Applications. In proc. AES 140th Convention, 2016.

A. Mesaros, T. Heittola, T. Virtanen. TUT Database for Acoustic Scene Classification and Sound Event Detection. In proc. European Signal Processing Conference (EUSIPCO), 2016.

M. Valenti, A. Diment, G. Parascandolo, S. Squartini, T. Virtanen. DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2016.

S. Adavanne, G. Parascandolo, P. Pertilä, T. Heittola, T. Virtanen. Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2016.

A. Diment, M. Parviainen, T. Virtanen, R. Zelov, A. Glasman. Noise-Robust Detection of Whispering in Telephone Calls Using Deep Neural Networks. In proc. European Signal Processing Conference (EUSIPCO), 2016.

H. Kronholm, D. Caballero, A. Mansikkaniemi, R. Araya, S. Lehesvuori, P. Pertilä, T. Virtanen, M. Kurimo, and J. Viiri. The Automatic Analysis of Classroom Talk. Proceedings of the annual FMSERA symposium, 2016.

K. Mahkonen, A. Hurmalainen, T. Virtanen, J.-K. Kämäräinen. Cascade processing for speeding up sliding window sparse classification. In proc. European Signal Processing Conference (EUSIPCO), 2016.

G. Parascandolo, H. Huttunen, T. Virtanen. Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings. In Proc. ICASSP 2016.

J. Nikunen, A. Diment, T. Virtanen, M. Vilermo. Binaural rendering of microphone array captures based on source separation, Speech Communication, Volume 76, 2016.

T. Virtanen, J. F. Gemmeke, B. Raj, and P. Smaragdis. Compositional Models for Audio Processing. IEEE Signal Processing Magazine, March 2015.

U. Simsekli, T. Virtanen, A. T. Cemgil. Non-negative Tensor Factorization Models for Bayesian Audio Processing. In Digital Signal Processing, volume 47, 2015.

E. Räsänen, O. Pulkkinen, T. Virtanen, M. Zollner, H. Hennig. Fluctuations of Hi-Hat Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording. PLoS ONE 10(6), 2015.

D. Baby, T. Virtanen, J. F. Gemmeke, H. Van hamme. Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, November 2015.

A. Diment and T. Virtanen. Archetypal analysis for audio dictionary learning. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015.

A. Hurmalainen, R. Saeidi, and T. Virtanen. Noise Robust Speaker Recognition with Convolutive Sparse Coding. In proc. Interspeech 2015.

E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen. Multi-label vs. Combined Single-label Sound Event Detection With Deep Neural Networks. In proc. EUSIPCO 2015.

A. Diment, E. Cakir, T. Heittola, and T. Virtanen. Automatic recognition of environmental sound events using all-pole group delay features. In proc. EUSIPCO 2015.

E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen. Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks. In proc. the International Joint Conference on Neural Networks 2015.

S. Drgas and T. Virtanen. Speaker verification using adaptive dictionaries in non-negative spectrogram deconvolution. In proc. 12th International Conference on Latent Variable Analysis and Signal Separation, 2015.

T. Barker, T. V, Niels Henrik Pontoppidan. Low-latency sound-source-separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. In proc. the IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.

A. Mesaros, T. Heittola, O. Dikmen, T. Virtanen. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations. In proc. the IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.

D. Baby, J. F. Gemmeke, T. Virtanen, H. Van hamme. Exemplar-based speech enhancement for deep neural network based automatic speech recognition. In proc. the IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.

A. Hurmalainen, R. Saeidi, T. Virtanen. Similarity induced group sparsity for non-negative matrix factorisation. In proc. the IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.

J. Nikunen and T. Virtanen. Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 22, Issue 3, pp. 727 - 739, 2014.

O. Gencoglu, T. Virtanen, and H. Huttunen. Recognition of Acoustic Events Using Deep Neural Networks. In Proc. 22nd European Signal Processing Conference, 2014.

Z. Wu, T. Virtanen, E. S. Chng, and H. Li. Exemplar-based sparse representation with residual compensation for voice conversion. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 22, Issue 10, pp 1506 - 1521, 2014.

D. Baby, T. Virtanen, J. F. Gemmeke, T. Barker, H. Van hamme. Exemplar-based noise robust automatic speech recognition using modulation spectrogram features. In proc. IEEE Spoken Language Technology Workshop, 2014.

T. Virtanen, B. Raj, J. Gemmeke, H. Van hamme. Active-set Newton algorithm for non-negative sparse coding of audio. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2014.

A. Diment, R. Padmanabhan, T. Heittola, and T. Virtanen. Group delay function from all-pole models for musical instrument recognition. CMMR 2013 post-symposium proceedings, Lecture Notes in Computer Science, 2014.

T. Barker, T. Virtanen, O. Delhomme. Ultrasound-Coupled Semi-Supervised Nonnegative Matrix Factorisation for Speech Enhancement. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2014.

J. Nikunen, T. Virtanen. Multichannel Audio Separation by Direction of Arrival Based Spatial Covariance Model and Non-Negative Matrix Factorization. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2014.

D. Baby, T. Virtanen, T. Barker, H. Van hamme. Coupled Dictionary Training for Exemplar-based Speech Enhancement. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2014.

K. Mahkonen, J.-K. Kämäräinen, T. Virtanen. Lifelog Scene Change Detection Using Cascades of Audio and Video Detectors. In Proc. Asian Conf. on Computer Vision (ACCV) Workshop on Intelligent Mobile and Egocentric Vision, 2014.

T. Barker, H. Van hamme, T. Virtanen. Modelling Primitive Streaming of Simple Tone Sequences Through Factorisation of Modulation Pattern Tensors. In Proc. Interspeech, 2014.

T. Barker and T. Virtanen. Semi-Supervised Non-Negative Tensor Factorisation of Modulation Spectrograms for Monaural Speech Separation. In proc. the 2014 International Joint Conference on Neural Networks.

T. Heittola, A. Mesaros, D. Korpi, A. Eronen, and T. Virtanen. Method for creating location-specific audio textures. EURASIP Journal on Audio, Speech, and Music Processing, 2014:9.

T. Virtanen, J. F. Gemmeke, and B. Raj Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio. IEEE Transactions on Audio, Speech, and Language Processing, volume 21 issue 11, 2013.

T. Heittola, A. Mesaros, A. Eronen and T. Virtanen. Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:1

T. Barker and T. Virtanen. Non-negative Tensor Factorisation of Modulation Spectrograms for Monaural Sound Source Separation. In proc. Interspeech 2013.

A. Hurmalainen, J. F. Gemmeke, and T. Virtanen. Modelling Non-stationary Noise with Spectral Factorisation in Automatic Speech Recognition. Computer Speech and Language, volume 28, issue 3, 2013. (preprint)

A. Hurmalainen and T. Virtanen. Learning state labels for sparse classfication of speech with matrix deconvolution. In proc. Automatic Speech Recognition and Understanding Workshop (ASRU) 2013.

J. Kauppinen, A. Klapuri, and T. Virtanen. Music Self-Similarity Modeling Using Augmented Nonnegative Matrix Factorization of Block and Stripe Patterns. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013.

A. Diment, P. Rajan, T. Heittola, and T. Virtanen. Modified Group Delay Feature for Musical Instrument Recognition. in proc. the 10th International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, 2013.

A. Diment, T. Heittola and T. Virtanen. Semi-supervised learning for musical instrument recognition. In proc. the 21st European Signal Processing Conference (EUSIPCO), Marrakech, Morocco, 2013.

A. Hurmalainen and T. Virtanen. Acquiring Variable Length Speech Bases for Factorisation-Based Noise Robust Speech Recognition. In proc. the 21st European Signal Processing Conference (EUSIPCO), Marrakech, Morocco, 2013.

Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, Exemplar-based unit selection for voice conversion utilizing temporal information, In proc. Interspeech 2013.

Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng and Haizhou Li. Exemplar-based Voice Conversion using Non-negative Spectrogram Deconvolution, in proc. 8th ISCA Speech Synthesis Workshop, 2013.

F. Briggs, H. Yonghong, R. Raich, K. Eftaxias, L. Zhong, W. Cukierski, S. Hadley, A. Hadley, M. Betts, X. Fern, J. Irvine, L. Neal, A. Thomas, G. Fodor, G. Tsoumakas, W. Hong, N. Thi, H. Huttunen, P. Ruusuvuori, T. Manninen, A. Diment, T. Virtanen, J. Marzat, J. Defretin, D. Callender, C. Hurlburt, K. Larrey, M. Milakov. The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In Proc. IEEE International Workshop on Machine Learning for Signal Processing, Southampton, UK, September 2013.

K. Mahkonen, A. Eronen, T. Virtanen, E. Helander, V. Popa, J. Leppänen, and I. D. D. Curcio. Music dereverberation by spectral linear prediction in live recordings. In Proc. 16th International Conference on Digital Audio Effects, Maynooth, Ireland, 2013.

J. F. Gemmeke, T. Virtanen, and K. Demuynck. Exemplar-based joint channel and noise compensation. In Proc. International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 2013.

T. Heittola, A. Mesaros, T. Virtanen, and M. Gabbouj, Supervised model training for overlapping sound events based on unsupervised source separation. In Proc. International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 2013.

J. F. Gemmeke, T. Virtanen and A. Hurmalainen. HMM-regularization for NMF-based noise robust ASR. In proc. The 2nd International Workshop on Machine Listening in Multisource Environments, Vancouver, Canada, 2013.

J. T. Geiger, F. Weninger, A. Hurmalainen, J. F. Gemmeke, M. Wöllmer, B. Schuller, G. Rigoll, and T. Virtanen. The TUM+TUT+KUL approach to the 2nd CHiME Challenge: Multi-stream ASR exploiting BLSTM networks and sparse NMF. In proc. The 2nd International Workshop on Machine Listening in Multisource Environments, Vancouver, Canada, 2013.

A. Hurmalainen, J. F. Gemmeke, and T. Virtanen. Compact Long Context Spectral Factorisation Models for Noise Robust Recognition of Medium Vocabulary Speech. In proc. The 2nd International Workshop on Machine Listening in Multisource Environments, Vancouver, Canada, 2013.

T. Virtanen, R. Singh, B. Raj. (eds). Techniques for Noise Robustness in Automatic Speech Recognition, Wiley, 2012.

D. Korpi, T. Heittola, T. Partala, A. Eronen, A. Mesaros, and T. Virtanen. On the human ability to discriminate audio ambiances from similar locations of an urban environment. Personal and Ubiquitous Computing, November 2012. (pdf preprint)

J. Nikunen, T. Virtanen, M. Vilermo. Multichannel Audio Upmixing by Time-Frequency Filtering Using Non-Negative Tensor Factorization. Journal of the Audio Engineering Society, Volume 60, Issue 10, pp. 794-806; October 2012.

E. Helander, H. Silen, T. Virtanen, M. Gabbouj. Voice Conversion Using Dynamic Kernel Partial Least Squares Regression. IEEE Transactions on Audio, Speech and Language processing, Volume 20, Issue 3, 2012.

R. Saeidi, A. Hurmalainen, T. Virtanen, D.A. van Leeuwen. Exemplar-based Sparse Representation and Sparse Discrimination for Noise Robust Speaker Identification. In. proc. Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, 2012.

A. Hurmalainen, R. Saeidi and T. Virtanen. Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition. In proc. Interspeech, 13th Annual Conference of the International Speech Communication Association, Portland, USA, 2012.

A. Hurmalainen, J. Gemmeke and T. Virtanen. Detection, Separation and Recognition of Speech From Continuous Signals Using Spectral Factorisation. In proc. 20th European Signal Processing Conference, Bucharest, Romania. 2012.

J. Nikunen, T. Virtanen, P. Pertilä and M. Vilermo. Permutation Alignment of Frequency-domain ICA by Maximization of Intra-source Envelope Correlations. In proc. 20th European Signal Processing Conference, Bucharest, Romania, 2012.

F. Weninger, M. Wöllmer, J. Geiger, B. Schuller, J. Gemmeke, A. Hurmalainen, T. Virtanen, and G. Rigoll. Non-Negative Matrix Factorization for Highly Noise-Robust ASR: to Enhance or to Recognize? In proc. 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012.

A. Hurmalainen and T. Virtanen. Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition. In proc. 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012.

F. Mazhar, T. Heittola, T. Virtanen, and J. Holm. Automatic Scoring of Guitar Chords. In Proc. AES 45th International Conference, Helsinki, Finland, 2012.

F. J. Rodriguez-Serrano, J. J. Carabias-Orti, P. Vera-Candeas, T. Virtanen, and N. Ruiz-Reyes. Multiple Instrument Mixtures Source Separation Evaluation Using Instrument-Dependent NMF Models. In proc. The 10th International Conference on Latent Variable Analysis and Source Separation, Israel, 2012. Lecture Notes in Computer Science, 2012, Volume 7191/2012, pp. 380-387.

Ali Bahrami Rad and Tuomas Virtanen. Phase spectrum prediction of audio signals, 5th International Symposium on Communications, Control and Signal Processing Rome, Italy, 2012.

J. F. Gemmeke and T. Virtanen and A. Hurmalainen Exemplar-based sparse representations for noise robust automatic speech recognition,, IEEE Trans. Audio, Speech and Language Processing, Volume: 19, Issue: 7, 2011.

J.J. Carabias-Orti, T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes and F.J. Canadas-Quesada. Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization. IEEE Journal of Selected Topics in Signal Processing, Volume: 5, Issue: 6, 2011.

B. Raj, R. Singh, and T. Virtanen. Phoneme-dependent NMF for speech enhancement in monaural mixtures. 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011.

K. Mahkonen, A. Hurmalainen, T. Virtanen, and J. Gemmeke. Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011.

H. Kallasjoki, U. Remes, J. F. Gemmeke, T. Virtanen, and K. J. Palomäki. Uncertainty measures for improving exemplar-based source separation. 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011.

J. Nikunen, T. Virtanen, and M. Vilermo. Multichannel audio upmixing based on non-negative tensor factorization representation. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 2011.

J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and Yang Sun. Toward a Practical Implementation of Exemplar-Based Noise Robust ASR. In proc. the 19th European Signal Processing Conference 2011, Barcelona, Spain

A. Hurmalainen, K. Mahkonen, J. F. Gemmeke, and T. Virtanen. Exemplar-Based Recognition of Speech in Highly Variable Noise. International Workshop on Machine Listening in Multisource Environments, Florence, Italy, 2011.

T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen. Sound Event Detection in Multisource Environments Using Source Separation. International Workshop on Machine Listening in Multisource Environments, Florence, Italy, 2011.

J. F. Gemmeke, T. Virtanen, and A. Hurmalainen. Exemplar-Based Speech Enhancement and its Application to Noise-Robust Automatic Speech Recognition. International Workshop on Machine Listening in Multisource Environments, Florence, Italy, 2011. Demonstration signals.

A. Hurmalainen, J. Gemmeke, and T. Virtanen. Non-negative matrix deconvolution in noise robust speech recognition, in proc. ICASSP 2011.

T. Virtanen, J. Gemmeke, and A. Hurmalainen. State-based labelling for a sparse representation of speech and its application to robust speech recognition, in proc. Interspeech 2010.

B. Raj, T. Virtanen, S. Chaudhure, and R. Singh. Non-negative matrix factorization based compensation of music for automatic speech recognition , presented in Interspeech 2010.

J. Gemmeke and T. Virtanen. Artificial and online acquired noise dictionaries for noise robust ASR , presented in Interspeech 2010.

A. Mesaros and T. Virtanen. Automatic recognition of lyrics in singing, EURASIP Journal on Audio, Speech and Music Processing, Volume 2010 Article ID 546047. (online version and pdf)

A. Klapuri and T. Virtanen, Representing Musical Sounds with an Interpolating State Model,, IEEE Trans. Audio, Speech and Language Processing, vol 18. no. 3, 2010.

E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj. Voice Conversion Using Partial Least Squares Regression. IEEE Transactions on Audio, Speech, and Language Processing, 18 (5), 2010.

M. Helen and T. Virtanen, Audio query by example using similarity measures between probability density functions of features, EURASIP Journal on Audio, Speech and Music Processing, Volume 2010, Article ID 179303. (online version and pdf)

T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen. Audio context recognition using audio event histograms, in proc. 2010 European Signal Processing Conference (EUSIPCO-2010)

A. Mesaros, T. Heittola,A. Eronen, and T. Virtanen. Acoustic event detection in real life recordings, in proc. 2010 European Signal Processing Conference (EUSIPCO-2010)

S. Keronen, U. Remes, K. Palomäki, T. Virtanen, and M. Kurimo. Comparison of Noise Robust Methods in Large Vocabulary Speech Recognition, in proc. 2010 European Signal Processing Conference (EUSIPCO-2010)

J. Nikunen and T. Virtanen, Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation, in proc. 128th Audio Engineering Society Convention, London, UK, 2010.

J. F. Gemmeke and T. Virtanen Noise robust exemplar-based connected digit recognition, in proc. of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, 2010.

A. Klapuri, T. Virtanen, and T. Heittola. Sound source separation in monaural music signals using excitation-filter model and EM algorithm, in proc. of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, 2010.

A. Mesaros and T. Virtanen, Recognition of phonemes and words in singing, in proc. of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, 2010.

J. Nikunen and T. Virtanen, Noise-to-mask ratio minimization by weighted non-negative matrix factorization, in proc. of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, 2010.

T. Heittola, A. Klapuri, and T. Virtanen. Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation, in Proc. 10th Int. Society for Music Information Retrieval Conf. (ISMIR 2009), Kobe, Japan, 2009. The paper won the best paper award of the conference.

T. Virtanen and T. Heittola. Interpolating Hidden Markov Model and Its Application to Automatic Instrument Recognition, in proc. ICASSP 2009.

A. Mesaros. and T. Virtanen. Adaptation of a speech recognizer for singing voice , in EUSIPCO 2009.

T. Virtanen. Spectral Covariance in Prior Distributions of Non-Negative Matrix Factorization Based Speech Separation , in EUSIPCO 2009.

M. Myllymäki and T. Virtanen. Non-Stationary Noise Model Compensation in Voice Activity Detection , in EUSIPCO 2009.

T. Virtanen and A. T. Cemgil. Mixtures of Gamma Priors for Non-Negative Matrix Factorization Based Speech Separation, in proc. International Conference on Independent Component Analysis and Signal Separation, 2009. Copyright Springer-Verlag. The publication is also available at springerlink.com.

T. Virtanen, A. Mesaros, M. Ryynänen. Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music, SAPA 2008.

T. Virtanen, A. T. Cemgil, and S. J. Godsill. Bayesian Extensions to Non-negative Matrix Factorisation for Audio Signal Modelling, ICASSP 2008. This work was carried out at University of Cambridge, Signal Processing and Communications Laboratory.

A. Mesaros and T. Virtanen. Automatic Alignment of Music Audio and Lyrics, DAFX08.

M. Myllymäki and T. Virtanen. Voice Activity Detection in the Presence of Breathing Noise Using Neural Network and Hidden Markov Model, EUSIPCO 2008.

M. Ryynänen, T. Virtanen, J. Paulus, and A. Klapuri, Accompaniment Separation and Karaoke Application Based on Automatic Melody Transcription, in Proc. 2008 IEEE International Conference on Multimedia & Expo (ICME'08), Hannover, Germany, June 2008. (demonstrations)

A. Klapuri and T. Virtanen, Automatic music transcription, In Handbook of Signal Processing in Acoustics, David Havelock, Sonoko Kuwano, and Michael Vorlander (Eds.), Springer-Verlag, 2008.

Virtanen, Tuomas., Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria, IEEE Transactions on Audio, Speech, and Language Processing, vol 15, no. 3, March 2007.

Virtanen, T., Helen, M., Probabilistic Model Based Similarity Measures for Audio Query-by-Example , in proc. WASPAA 2007.

Mesaros, A., Virtanen, T., Klapuri, A. Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods, International Conference on Music Information Retrieval, Vienna, Austria, 2007.

Helen, M., Virtanen, T., Query by Example of Audio signals Using Euclidean Distance Between Gaussian Mixture Models, in proc. ICASSP 2007. Note: two small errors in equations (8) - (11) have been corrected. The corrections do not appear in the ICASSP conference proceedings.

Helen, M., Virtanen, T., A Similarity Measure for Audio Query by Example Based on Perceptual Coding and Compression, in proc. 10th International Conference on Digital Audio Effects (DAFx-07), September 10-15. 2007.

Virtanen, Tuomas, Monaural Sound Source Separation by Perceptually Weighted Non-Negative Matrix Factorization, Technical report, Tampere University of Technology, Institute of Signal Processing, 2007.

Virtanen, T., Klapuri, A., Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, 2006 (extended abstract).

Virtanen, Tuomas., Speech Recognition Using Factorial Hidden Markov Models for Separation in the Feature Space, in proc. Interspeech 2006, Pittsburgh, USA. (demonstrations). The second best results among the papers presented in Interspeech 2006 Speech Separation Challenge special session.

Virtanen, Tuomas. Unsupervised Learning Methods for Source Separation, in "Signal Processing Methods for Music Transcription", eds. Klapuri, A., Davy, M., Springer-Verlag, 2006.

Helen, M., Virtanen, T., Separation of Drums From Polyphonic Music Using Non-Negative Matrix Factorization and Support Vector Machine, in proc. 13th European Signal Processing Conference Antalaya, Turkey, 2005. (demonstrations)

Klapuri, A., Virtanen, T., Helen, M., Modeling musical sounds with an interpolating state model, in proc. 13th European Signal Processing Conference, Antalya, Turkey, 2005.

Paulus, J., Virtanen, T., Drum Transcription with Non-negative Spectrogram Factorisation, in proc. 13th European Signal Processing Conference Antalaya, Turkey, 2005 (demonstrations)

Virtanen, Tuomas, Separation of Sound Sources by Convolutive Sparse Coding, ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2004.(demonstrations)

M.Helen, T.Virtanen, Perceptually Motivated Parametric Representation for Harmonic Sounds for Data Compression Purposes, 6th International conference on Digital Audio Effects (DAFx-03), 2003, London, UK.

Virtanen, Tuomas, Algorithm for the separation of harmonic sounds with time-frequency smoothness constraint, in proc. the 6th International Conference on Digital Audio Effects (DAFx-03), London, UK.

Virtanen, Tuomas, Sound Source Separation Using Sparse Coding with Temporal Continuity Objective, International Computer Music Conference, ICMC 2003. (demonstrations)

Parviainen, M., Virtanen, T., Two-channel separation of speech using direction-of-arrival estimation and sinusoids plus transients modeling, IEEE International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS 2003.

Virtanen, T., Klapuri A., Separation of Harmonic Sounds Using Linear Models for the Overtone Series, IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2002. (demonstrations)

Virtanen, Tuomas, Accurate Sinusoidal Model Analysis and Parameter Reduction by Fusion of Components, 110th Audio Engineering Society Convention, Amsterdam, Netherlands 2001.

A. Klapuri, T. Virtanen, A. Eronen, J. Seppänen. Automatic transcription of musical recordings. In proc. Consistent & Reliable Acoustic Cues Workshop, CRAC-01, Aalborg, Denmark, 2001.

Virtanen, T., Klapuri A. Separation of Harmonic Sounds Using Multipitch Analysis and Iterative Parameter Estimation, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, 2001. (demonstrations)

Klapuri, A., Virtanen, T., Holm, J.-M., Robust multipitch estimation for the analysis and manipulation of polyphonic musical signals. In Proc. COST-G6 Conference on Digital Audio Effects, DAFx-00, Verona, Italy, 2000.

Sillanpää, J., Klapuri, A., Seppänen, J., Virtanen, T., Recognition of acoustic noise mixtures by combined bottom-up and top-down processing. Proceedings of the European Signal Processing Conference EUSIPCO, 2000.

Virtanen, T., Klapuri, A. Separation of Harmonic Sound Sources Using Sinusoidal Modeling, IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2000. (demonstrations)

Virtanen, Tuomas, Sound Source Separation in Monaural Music Signals, PhD thesis, Tampere University of Technology, 2006.

Virtanen, Tuomas, Audio Signal Modeling with Sinusoids Plus Noise, MSc thesis, Tampere University of Technology 2001. (demonstrations 1, demonstrations 2)

S. Jakob, I. Korhonen, E. Ruokonen, T. Virtanen, A. Kogan, and J. Takala. Detection of artifacts in monitored trends in intensive care, Computer Methods and Programs in Biomedicine, 63 (200), 2000.

IEEE-Copyrighted Material: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: +Intl. 908-562-3966.

- Tuomas Virtanen, tuomas.virtanen@tuni.fi