Publications
- A. Triantafyllopoulos, I. Tsangko, A. Gebhard, A. Mesaros, T. Virtanen, and B. Schuller. Computer Audition: From Task-Specific Machine Learning to Foundation Models. Proceedings of the IEEE, accepted for publication.
- Y. Wang, A. Politis, K. Drossos, and T. Virtanen. Multi-Utterance Speech Separation and Association Trained on Short Segments. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2025, accepted for publication.
-
P. Sudarsanam, I. Martin-Morato, A. Hakala, and T. Virtanen. AVCaps: An Audio-visual Dataset with Modality-specific Captions. IEEE Open Journal of Signal Processing, Volume 6, 2025.
-
Y. Wang, A. Politis, K. Drossos, and T. Virtanen. Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers. In proc. Interspeech 2025, accepted for publication.
-
W. Dai, A. Politis, and T. Virtanen. Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction. In proc. Interspeech 2025, accepted for publication.
-
S. Zhang and T. Virtanen. Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection. In proc. European Signal Processing Conference 2025, accepted for publication.
-
E. Tunturi, D. Diaz-Guerra, A. Politis, and T. Virtanen. Score-Informed Music Source Separation: Improving Synthetic-To-Real Generalization in Classical Music. In proc. European Signal Processing Conference 2025, accepted for publication.
-
P. Sudarsanam, I. Martín-Morató, and T. Virtanen. Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities. In proc. European Signal Processing Conference 2025, accepted for publication.
-
M. Neri and T. Virtanen. Impact of Microphone Array Mismatches to Learning-Based Replay Speech Detection. In proc. European Signal Processing Conference 2025, accepted for publication.
-
M. Neri and T. Virtanen. Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer. IEEE Open Journal of Signal Processing, Volume 6, 2025.
-
M. Heikkinen, A. Politis, K. Drossos, and T. Virtanen. Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2025, accepted for publication.
- A. Mesaros, R. Serizel, T. Heittola, T. Virtanen, and M. Plumbley. A Decade of DCASE: Achievements, Practices, Evaluations and Future Challenges. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2025, accepted for publication.
-
H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Text-based Audio Retrieval by Learning from Similarities between Audio Captions. IEEE Signal Processing Letters, Volume 32, 2025.
-
J. Garcia-Martinez, D. Diaz-Guerra, A. Politis, T Virtanen, J. J. Carabias-Orti, and P. Vera-Candeas. SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation. IEEE Open Journal of Signal Processing, Volume 6, 2025.
-
M. Moritz, T. Olán, and T. Virtanen. Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking. In proc. IEEE International Symposium on the Internet of Sounds 2024.
-
H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2024.
-
D. Diaz-Guerra, A. Politis, P. Sudarsanam, K. Shimada, D. Krause, K. Uchida, Y. Koyama, N. Takahashi, S. Takahashi, T. Shibuya, Y. Mitsufuji, and T. Virtanen. Baseline models and evaluation of sound event localization and detection with distance estimation in DCASE 2024 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2024.
-
D. Dogan, H. Xie, T. Heittola, and T. Virtanen. Multi-Label Zero-Shot Audio Classification with Temporal Attention. In proc. 18th International Workshop on Acoustic Signal Enhancement, 2024.
-
L. Hekanaho, M. Hirvonen, and T. Virtanen. Language-based machine perception: linguistic perspectives on the compilation of captioning datasets. Digital Scholarship in the Humanities, 2024
-
J. Martinsson, O. Mogren, M. Sandsten, and T. Virtanen. From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning. In proc. European Signal Processing Conference 2024.
-
W. Dai, X. Li, A. Politis, and T. Virtanen. Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement. In proc. European Signal Processing Conference 2024.
-
A. Hakala, T. Kincy, and T. Virtanen. Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning. In proc. European Signal Processing Conference 2024.
-
M. Neri , A. Politis , D. Krause , M. Carli, and T. Virtanen. Speaker Distance Estimation in Enclosures from Single-Channel Audio. IEEE/ACM Transactions on Audio, Speech and Language Processing. volume 32, 2024.
-
S. Drgas, L. Bramsløw, A. Politis, G. Naithani, and T. Virtanen. Dynamic Processing Neural Network Architecture for Hearing Loss Compensation. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 32, 2024.
-
S. Gharib, M. Tran, D. Luong, K. Drossos, and T. Virtanen. Adversarial Representation Learning for Robust Privacy Preservation in Audio.. IEEE Open Journal of Signal Processing, volume: 5, 2024.
-
M. Heikkinen, A. Politis, and T. Virtanen. Neural Ambisonics Encoding for Compact Irregular Microphone Arrays. In proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024.
-
Y. Wang, A. Politis, and T. Virtanen. Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios. In proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024.
- K. Shimada, A. Politis, P. Sudarsanam, D. Krause, K. Uchida, S. Adavanne, A. Hakala, Y. Koyama, N. Takahashi, S. Takahashi, T. Virtanen, and Y Mitsufuji. STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events. In proc. NeurIPS 2023.
- M. Neri, A. Politis, D. Krause , M. Carli, and T. Virtanen. Single-Channel Speaker Distance Estimation in Reverberant Environments. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2023.
- D. Luong, M. Tran, S. Gharib, K. Drossos, and T. Virtanen. Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2023.
- H. Xie, K. Khorrami, O. Räsänen, and T. Virtanen. Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2023.
- D. Diaz-Guerra, A. Politis, A. Miguel, J. R. Beltran, and T. Virtanen. Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications. In proc. 10th
Convention of the European Acoustics Association – Forum Acusticum 2023.
- P. Ariyakulamsudarsanam and T. Virtanen. Attention-Based Methods for Audio Question Answering. In proc. 31st European Signal Processing Conference, 2023.
- D. Diaz-Guerra, A. Politis, and T. Virtanen. Position Tracking of a Varying Number of Sound Sources with Sliding Permutation Invariant Training. In proc. 31st European Signal Processing Conference, 2023.
- K. Khorrami, M. A. C. Blandón, T. Virtanen, and O. Räsänen. Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System. In proc. 31st European Signal Processing Conference, 2023.
- P. Magron and T. Virtanen. Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints. In proc. 31st European Signal Processing Conference, 2023.
- W. Xie, Y. Li, Q. He, W. Cao, and T. Virtanen. Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes. In proc. Interspeech 2023.
- H. Xie, O. Räsänen, and T. Virtanen. On negative sampling for contrastive audio-text retrieval. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2023.
- S. Wang, A. Politis, A. Mesaros, and T. Virtanen. Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment. IEEE Journal of Selected Topics in Signal Processing, Volume 16, Issue 6, 2022.
- H. Xie, S. Lipping, and T. Virtanen. Language-based Audio Retrieval Task in DCASE 2022 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.
- A. Politis, K. Shimada, P. Sudarsanam, S. Adavanne, D. Krause, Y. Koyama, N. Takahashi, S. Takahashi, Y. Mitsufuji, and T. Virtanen. STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.
- I. Martin, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen. Low-complexity acoustic scene classification in DCASE 2022 Challenge. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2022.
- Y. Li, W. Cao, K. Drossos, and T. Virtanen. Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network. In proc. International Workshop on Multimedia Signal Processing, 2022.
- G. Naithani, K. Pietilä, R. Niemistö, E. Paajanen, T. Takala, and T. Virtanen. Subjective Evaluation of Deep Neural Network Based Speech Enhancement Systems in Real-World Conditions. In proc. International Workshop on Multimedia Signal Processing, 2022.
- S. Lipping, P. Sudarsanam, K. Drossos, and T. Virtanen. Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering. In proc. European Signal Processing Conference, 2022.
- D. Dogan, H. Xie, T. Heittola, and T. Virtanen. Zero-Shot Audio Classification using Image Embeddings. In proc. European Signal Processing Conference, 2022.
- V.-V. Eklund, A. Diment, and T. Virtanen. Noise, Device and Room Robustness Methods for Pronunciation Error Detection. In proc. European Signal Processing Conference, 2022.
- H. Xie, O. Räsänen, K. Drossos, and T. Virtanen. Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2022.
- B. W. Schuller, T. Virtanen, M. Riveiro, G. Rizos, J. Han, A. Mesaros, and K. Drossos. Towards Sonification in Multimodal and User-friendly Explainable Artificial Intelligence. In proc. the 2021 International Conference on Multimodal Interaction, 2021.
- S. Wang, T. Heittola, A. Mesaros, and T. Virtanen. Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.
- I. Martín-Morató, T. Heittola, A. Mesaros, and Tuomas Virtanen. Low-complexity acoustic scene classification for multi-device audio: analysis of DCASE 2021 Challenge systems. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.
- A. Politis, S. Adavanne, D. Krause, A. Deleforge, P. Srivastava, and T. Virtanen. A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2021.
- S. Djukanović, J. Matas, and T. Virtanen. Acoustic vehicle speed estimation from single sensor measurements. IEEE Sensors Journal, Volume 21, Issue 20, 2021.
- S. Adavanne, A. Politis, and T. Virtanen. Differentiable Tracking-Based Training of Deep-Learning Sound Source Localizers. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021.
- A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley. Sound Event Detection: A Tutorial. In IEEE Signal Processing Magazine, Volume: 38, Issue 5, 2021.
- S. Wang, G. Naithani, A. Politis, and T. Virtanen. Deep Neural Network Based Low-Latency Speech Separation with Asymmetric Analysis-Synthesis Window Pair. In proc. 29th European Signal Processing Conference, 2021.
- S. Djukanović, Y. Patel, J. Matas, and T. Virtanen. Neural network-based acoustic vehicle counting. In proc. 29th European Signal Processing Conference, 2021.
- A. Tran, K. Drossos, and T. Virtanen. WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information. In proc. 29th European Signal Processing Conference, 2021.
- P. Pertilä, E. Cakir, A. Hakala, E. Fagerlund, T. Virtanen, A. Politis, and A. Eronen. Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments. In proc. 29th European Signal Processing Conference, 2021.
- S. Drgas and T. Virtanen. Joint Speaker Separation And Recognition Using Non-Negative Matrix Deconvolution With Adaptive Dictionary. Computer Speech & Language, volume 70, 2021.
- H. Xie and T. Virtanen. Zero-Shot Audio Classification via Semantic Embeddings. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 29, 2021.
- S. Wang, A. Mesaros, T. Heittola, and T. Virtanen. A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.
- X. Favory, K. Drossos, T. Virtanen, and Xavier Serra. Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.
- H. Zie, O. Räsänen, and T. Virtanen. Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2021.
- A. Politis, A. Mesaros, S. Adavanne, T. Heittola, and T. Virtanen. Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol 29, 2021.
- A. Kivinummi, G. Naithani, O. Tammela, T. Virtanen, E. Kurkela, M. Alhainen, D. J. Niehaus, A. Lachman, J. M. Leppänen, and M. J. Peltola. Associations between neonatal cry acoustics and visual attention during the first year. Frontiers in Psychology, September 30, 2020.
- S. Zhao, T. Heittola, and T. Virtanen. Active Learning for Sound Event Detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, 2020.
- E. Çakır, K. Drossos, and T. Virtanen. Multi-task Regularization Based on Infrequent Classes for Audio Captioning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.
- P. Pyykkönen, S. I. Mimilakis, K. Drossos, and T. Virtanen. Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation. In proc. IEEE International Workshop on Multimedia Signal Processing, 2020.
- T. Heittola, A. Mesaros, and T. Virtanen. Acoustic scene classification in DCASE 2020 Challenge: generalization across devices and low complexity solutions. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.
- K. Nguyen, K. Drossos, and T. Virtanen. Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.
- A. Politis, S. Adavanne, and T. Virtanen. A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2020.
- X. Favory, K. Drossos, T. Virtanen, X. Serra. COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations. In proc. ICML 2020 Workshop on Self-supervision in Audio and Speech.
- S. Djukanovic, J. Matas, and T. Virtanen. Robust Audio-Based Vehicle Counting in Low-to-Moderate Traffic Flow. In proc. IEEE Intelligent Vehicles Symposium, 2020.
- K. Drossos, S. I. Mimilakis, S. Gharib, Y. Li, and T. Virtanen. Sound Event Detection with Depthwise Separable and Dilated Convolutions. In proc. International Joint Conference on Neural Networks, 2020.
- N. Nicodemo, G. Naithani, K. Drossos, T. Virtanen, and R. Saletti. Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters. In proc. 28th European Signal Processing Conference, 2020.
- P. Magron and T. Virtanen. Online Spectrogram Inversion for Low-Latency Audio Source Separation. IEEE Signal Processing Letters, volume 27, 2020.
- K. Drossos, S. Lipping, and T. Virtanen.
Clotho: An Audio Captioning Dataset. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2020.
- Y. Li, M. Liu, K. Drossos, and T. Virtanen. Sound event detection via dilated convolutional recurrent neural networks. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2020.
- H. Purwins , B. Li , T. Virtanen , J. Schlüter , S.-Y. Chang, and T. Sainath. Deep Learning for Audio Signal Processing. IEEE Journal of Selected Topics in Signal Processing, volume 13, issue 2, 2019.
- A. Mesaros, T. Heittola, and T. Virtanen. Acoustic scene classification in DCASE 2019 Challenge: closed and open set classification and data mismatch setups. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
- S. Lipping, K. Drossos, and T. Virtanen.
Crowdsourcing a dataset of audio captions. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
- S. Adavanne, A. Politis, and T. Virtanen.
Localization, detection and tracking of multiple moving sound sources with a convolutional recurrent neural network. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
- K. Drossos, S. Gharib, P. Magron, and T. Virtanen. Language modelling for sound event detection with teacher forcing and scheduled sampling. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
- S. Adavanne, A. Politis, and T. Virtanen.
A multi-room reverberant dataset for sound event localization and detection. Workshop on Detection and Classification of Acoustic Scenes and Events, 2019.
- H. Xie and T. Virtanen. Zero-Shot Audio Classification Based on Class Label Embeddings. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
- K. Drossos, P. Magron, and T. Virtanen. Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
- A. Mesaros, S. Adavanne, A. Politis, T. Heittola, and T. Virtanen. Joint Measurement of Localization and Detection of Sound Events. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
- H. L. Bear, T. Heittola, A. Mesaros, E. Benetos, and T. Virtanen. City classification from multiple real-world sound scenes. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
- M. C. Green, D. Murphy, S. Adavanne, and T. Virtanen. Acoustic Scene Classification Using Higher-Order Ambisonic Features. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019.
- I. Ahsan, C. Kertesz, A. Mesaros, T. Heittola, A. Knight, and T. Virtanen. Audio-Based Epileptic Seizure Detection. In proc. European Signal Processing Conference, 2019.
- A. Mesaros, A. Diment, B. Elizalde, T. Heittola, E. Vincent, B. Raj, and T. Virtanen. Sound event detection in the DCASE 2017 Challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 27, issue 6, 2019.
- S. Wang, G. Naithani, and T. Virtanen. Low-Latency Deep Clustering For Speech Separation. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2019.
- I. Martín-Morató, A. Mesaros, T. Heittola, T. Virtanen, M. Cobos, F. J. Ferri. Sound Event Envelope Estimation in Polyphonic Mixtures. In proc. International Conference on Acoustics, Speech, and Signal Processing, 2019.
- A. Diment, E. Fagerlund, A. Benfield and T. Virtanen. Detection of Typical Pronunciation Errors in Non-native English Speech Using Convolutional Recurrent Neural Networks. In proc. International Joint Conference on Neural Networks, 2019.
- V. M. Garcia-Molla, P. S. Juan, T. Virtanen, A. M. Vidala, and P. Alonso. Generalization of the K-SVD algorithm for minimization of β-divergence. In Digital Signal Processing, volume 92, 2019.
- P. Magron and T. Virtanen. Complex ISNMF: A Phase-Aware Model for Monaural Audio Source Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 27, Issue 1, 2019.
- T. Virtanen, M. D. Plumbley, D. Ellis (eds). Computational Analysis of Sound Scenes and Events. Springer, 2018.
- E. Vincent, T. Virtanen, and S. Gannot (eds). Audio Source Separation and Speech Enhancement. Wiley, 2018.
- S. Adavanne, A. Politis, J. Nikunen, T. Virtanen. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks. IEEE Journal of Selected Topics in Signal Processing, volume 13, issue 1, 2019.
- L. Bramsløw, G. Naithani, A. Hafez, T. Barker, N. H. Pontoppidan, and T. Virtanen. Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America, Vol. 144, No. 1, 2018. Copyright 2018 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America. The article appeared in the journal may be found here.
- S. Gharib, K. Drossos, E. Cakir, D. Serdyuk, and T. Virtanen. Unsupervised adversarial domain adaptation for acoustic scene classification. In proc. Detection and Classification of Acoustic Scenes and Events 2018 Workshop.
- A. Mesaros, T. Heittola, and T. Virtanen. A multi-device dataset for urban acoustic scene classification. In proc. Detection and Classification of Acoustic Scenes and Events 2018 Workshop.
- A. Mesaros, T. Heittola, and T. Virtanen. Acoustic Scene Classification: an Overview of DCASE 2017 Challenge Entries. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- P. Magron and T. Virtanen. Towards complex nonnegative matrix factorization with the beta-divergence. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- G. Huang, T. Heittola, and T. Virtanen. Using Sequential Information in Polyphonic Sound Event Detection. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- M. Parviainen, P. Pertilä, T. Virtanen, and P. Grosche. Time-Frequency Masking Strategies for Single-Channel Low-latency Speech Enhancement Using Neural Networks. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- P. Magron and T. Virtanen. On modeling the STFT phase of audio signals with the von Mises distribution. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- K. Drossos, P. Magron, S. I. Mimilakis, and Tuomas Virtanen. Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- S. Zhao, T. Heittola, and T. Virtanen. An Active Learning Method Using Clustering and Committee-Based Sample Selection for Sound Event Classification. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- G. Naithani, J. Nikunen, L. Bramsløw, and T. Virtanen. Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. In proc. International Workshop on Acoustic Signal Enhancement, 2018.
- E. Cakir and T. Virtanen. Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders. In proc. AES 145th Convention.
- S. Gharib, H. Derrar, D. Niizumi, T. Senttula, J. Tommola, T. Heittola, T. Virtanen, and H. Huttunen. Acoustic Scene Classification: A Competition Review. In proc. IEEE International Workshop on Machine Learning for Signal Processing, 2018.
- K. Mahkonen, T. Virtanen, and J. Kämäräinen. Cascade of Boolean detector combinations. EURASIP Journal on Image and Video Processing, 2018:61, 2018.
- J.J. Carabias-Orti, J. Nikunen, T. Virtanen, and P. Vera-Candeas. Multichannel Blind Sound Source Separation using
Spatial Covariance Model with Level and Time Differences and Non-Negative Matrix Factorization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 26, Issue: 9, 2018.
- J. Nikunen, A. Diment, and T. Virtanen. Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, 2018.
- A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, 2018.
- P. Maijala, S. Zhao, T. Heittola, T. Virtanen. Environmental noise monitoring using source classification in sensors. Applied Acoustics, Volume 129, 2018.
- S. Adavanne, A. Politis, and T. Virtanen. Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network. In proc. European Signal Processing Conference, 2018.
- Pablo San Juan Sebastián, Tuomas Virtanen, Victor M. Garcia-Molla, Antonio M. Vidal. Analysis of an efficient parallel implementation of active-set Newton algorithm. The Journal of Supercomputing, May, 2018.
- P. Magron and T. Virtanen. Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization. In proc. Interspeech, 2018.
- P. Magron, K. Drossos, S. Mimilakis, and T. Virtanen. Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation. In proc. Interspeech, 2018.
- E. Cakir and T. Virtanen. End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input. In proc. International Joint Conference on Neural Networks, 2018.
- S. Adavanne, A. Politis, T. Virtanen. Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features. In proc. International Joint Conference on Neural Networks, 2018.
- K. Drossos, S.I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, and Y. Bengio. MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation. In proc. International Joint Conference on Neural Networks, 2018.
- S.I. Mimilakis, K. Drossos, J.F. Santos, G. Schuller, T. Virtanen, and Y. Bengio. Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.
- J. Nikunen and T. Virtanen. Estimation of Time-Varying Room Impulse Responses of Multiple Sound Sources from Observed Mixture and Isolated Source Signals. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.
- P. Magron and T. Virtanen. Bayesian Anisotropic Gaussian Model for Audio Source Separation. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018.
- G. Naithani, J. Kivinummi, T. Virtanen, O. Tammela, M.J. Peltola, and J.M. Leppänen. Automatic segmentation of infant cry signals using hidden Markov models. EURASIP Journal on Audio, Speech, and Music Processing 2018, 2018:1.
- E. Çakır, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 25, issue 6, 2017. TUT-SED Synthetic 2016 database used in the article.
- S. Adavanne and T. Virtanen. Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.
- E. Cakir and T. Virtanen. Convolutional Recurrent Neural Networks for Rare Sound Event Detection. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.
- A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen. DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2017.
- D. Caballero, R. Araya, H. Kronholm, J. Viiri, A. Mansikkaniemi, S. Lehesvuori, T. Virtanen, and M. Kurimo. ASR in Classroom Today: Automatic Visualization of Conceptual Network in Science Classrooms. European Conference on Technology Enhanced Learning, 2017.
- A. Diment and T. Virtanen. Transfer Learning of Weakly Labelled Audio. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- K. Drossos, S. Adavanne, and T. Virtanen. Automated Audio Captioning with Recurrent Neural Networks. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- A. Mesaros, T. Heittola, and T. Virtanen. Assessment of Human and Machine Performance in Acoustic Scene Classification: DCASE 2016 Case Study. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- S. Zhao, T. Heittola, and T. Virtanen. Learning Vocal Mode Classifiers from Heterogeneous Data Sources. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- P. Magron, J. Le Roux, and T. Virtanen. Consistent Anisotropic Wiener Filtering for Audio Source Separation. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- G. Naithani, T. Barker, G. Parascandolo, L. Bramsløw, N. H. Pontoppidan, and T. Virtanen. Low Latency Sound Source Separation Using Convolutional Recurrent Neural Networks. In proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017.
- E. Cakir, S. Adavanne, G. Parascandolo, K. Drossos, and T. Virtanen. Convolutional Recurrent Neural Networks for Bird Audio Detection. In proc. European Signal Processing Conference, 2017.
- S. Adavanne, K. Drossos, E. Cakir and T. Virtanen. Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection. In proc. European Signal Processing Conference, 2017.
- J. Nikunen and T. Virtanen. Time-difference of Arrival Model for Spherical Microphone Arrays and Application to Direction of Arrival Estimation. In proc. European Signal Processing Conference, 2017.
- S. I. Mimilakis, K. Drossos, T. Virtanen, G. Schuller. A Recurrent Encoder-Decoder Approach With Skip-Filtering Connections For
Monaural Singing Voice Separation. In IEEE Machine Learning for Signal Processing Workshop.
- M. Malik, S. Adavanne, K. Drossos, T. Virtanen, D. Ticha, and R. Jarina. Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition. In proc. Sound and Music Computing Conference, 2017.
- S. Drgas, T. Virtanen, J. Lücke, A. Hurmalainen. Binary non-negative matrix deconvolution for audio dictionary learning. IEEE/ACM Transactions on Audio, Speech and Language Processing, volume 25, issue 8, 2017.
- M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen. A Convolutional Neural Network Approach for Acoustic Scene Classification. In proc. The Joint International Conference on Neural Networks 2017.
- S. Adavanne, P. Pertilä, and T. Virtanen. Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2017.
- Zhao S.Y., T. Heittola, T. Virtanen. Active Learning for Sound Event Classification by Clustering Unlabeled Data. In proc. IEEE International Conference on Acoustics, Speech and Signal Processing 2017.
- A. Mesaros, T. Heittola, T. Virtanen. Metrics
for Polyphonic Sound Event Detection. Applied Sciences,
Volume 6, Issue 6, 2016.
- T. Barker and T. Virtanen. Blind Separation of Audio Mixtures
Through Nonnegative Tensor Factorisation of Modulation Spectograms., IEEE/ACM
Transactions on Audio, Speech and Language Processing. Volume: 24, Issue: 12, 2016.
- G. Naithani, G. Parascandolo, T. Barker, N. H. Pontoppidan, T. Virtanen. Low-Latency Sound Source Separation Using Deep
Neural Networks. In proc IEEE Global Conference on Signal and Information Processing, 2016.
- K. Drossos, M. Kaliakatsos-Papakostas, A. Floros, T. Virtanen. On the Impact of The
Semantic Content of Sound Events in Emotion Elicitation. Journal of the Audio Engineering
Society, Vol. 64, No. 7/8, 2016.
- E. Cakir, E. Ozan, T. Virtanen. Filterbank Learning for
Deep Neural Network
Based Polyphonic Sound Event Detection. In proc. the International Joint Conference on Neural
Networks 2016.
- S. I. Mimilakis, K. Drossos, T. Virtanen, G. Schuller. Deep Neural Networks for Dynamic Range Compression in
Mastering Applications. In proc. AES 140th Convention, 2016.
- A. Mesaros, T. Heittola, T. Virtanen. TUT Database for Acoustic
Scene Classification and Sound Event
Detection. In proc. European Signal Processing Conference (EUSIPCO), 2016.
- M. Valenti, A. Diment, G. Parascandolo, S. Squartini, T. Virtanen. DCASE 2016 Acoustic Scene
Classification
Using Convolutional Neural Networks. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2016.
- S. Adavanne, G. Parascandolo, P. Pertilä, T. Heittola, T. Virtanen. Sound Event Detection in Multichannel Audio
Using Spatial and Harmonic Features. In proc. Workshop on Detection and Classification of Acoustic Scenes and Events, 2016.
- A. Diment, M. Parviainen, T. Virtanen, R. Zelov, A. Glasman. Noise-Robust
Detection of Whispering in
Telephone Calls Using Deep Neural Networks. In proc. European Signal Processing Conference (EUSIPCO),
2016.
- H. Kronholm, D. Caballero, A. Mansikkaniemi, R. Araya, S. Lehesvuori, P. Pertilä, T. Virtanen, M. Kurimo, and J. Viiri.
The Automatic Analysis of Classroom Talk. Proceedings of the annual FMSERA symposium, 2016.
- K. Mahkonen, A. Hurmalainen, T. Virtanen, J.-K. Kämäräinen. Cascade processing for speeding up
sliding window sparse classification. In proc. European Signal Processing Conference (EUSIPCO), 2016.
- G. Parascandolo, H. Huttunen, T. Virtanen. Recurrent Neural
Networks
for Polyphonic Sound Event Detection in Real Life Recordings. In Proc. ICASSP 2016.
- J. Nikunen, A. Diment, T. Virtanen, M. Vilermo. Binaural
rendering of microphone array captures based on source
separation, Speech Communication, Volume 76, 2016.
- T. Virtanen, J. F. Gemmeke, B. Raj, and
P. Smaragdis. Compositional
Models for Audio Processing. IEEE Signal Processing Magazine,
March 2015.
- U. Simsekli, T. Virtanen, A. T. Cemgil. Non-negative Tensor Factorization Models for
Bayesian Audio Processing. In Digital Signal Processing, volume 47, 2015.
- E. Räsänen, O. Pulkkinen, T. Virtanen, M. Zollner, H. Hennig. Fluctuations of
Hi-Hat
Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording. PLoS ONE 10(6),
2015.
- D. Baby, T. Virtanen, J. F. Gemmeke, H. Van hamme. Coupled
Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech
Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, November 2015.
- A. Diment and T. Virtanen. Archetypal analysis
for audio dictionary learning. In proc. IEEE Workshop on Applications
of Signal Processing to Audio and Acoustics (WASPAA), 2015.
- A. Hurmalainen, R. Saeidi, and T. Virtanen. Noise Robust Speaker Recognition with
Convolutive Sparse Coding. In proc. Interspeech 2015.
- E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen. Multi-label vs. Combined Single-label
Sound Event Detection With Deep Neural Networks. In proc. EUSIPCO 2015.
- A. Diment, E. Cakir, T. Heittola, and T. Virtanen. Automatic recognition of
environmental sound events using all-pole group delay features. In proc. EUSIPCO 2015.
- E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen. Polyphonic Sound Event
Detection Using
Multi Label Deep Neural Networks.
In proc. the International Joint Conference on Neural
Networks
2015.
- S. Drgas and T. Virtanen. Speaker
verification using adaptive dictionaries in non-negative spectrogram
deconvolution. In proc. 12th International Conference on Latent Variable Analysis and Signal Separation,
2015.
- T. Barker, T. V, Niels Henrik Pontoppidan. Low-latency
sound-source-separation using
non-negative matrix factorisation with coupled analysis and synthesis dictionaries. In proc.
the
IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.
- A. Mesaros, T. Heittola, O. Dikmen, T. Virtanen. Sound event detection in real
life
recordings using coupled matrix factorization of spectral representations and class activity
annotations. In proc. the IEEE International Conference on Acoustics, Speech and Signal
Processing, 2015.
- D. Baby, J. F. Gemmeke, T. Virtanen, H. Van hamme. Exemplar-based
speech enhancement for
deep neural network based automatic speech recognition. In proc. the IEEE International
Conference
on Acoustics, Speech and Signal Processing, 2015.
- A. Hurmalainen, R. Saeidi, T. Virtanen. Similarity induced
group sparsity for non-negative
matrix factorisation. In proc. the IEEE International Conference on Acoustics, Speech and
Signal
Processing, 2015.
- J. Nikunen and T. Virtanen. Direction of Arrival Based Spatial Covariance
Model for Blind Sound Source Separation. IEEE/ACM Transactions on
Audio, Speech, and Language Processing, Volume 22, Issue 3, pp. 727
- 739, 2014.
- O. Gencoglu, T. Virtanen, and H. Huttunen. Recognition of Acoustic
Events Using Deep Neural Networks. In Proc. 22nd European Signal Processing Conference, 2014.
- Z. Wu, T. Virtanen, E. S. Chng, and H. Li. Exemplar-based sparse
representation with residual compensation for voice conversion. IEEE/ACM Transactions on
Audio, Speech, and Language Processing, Volume 22, Issue 10, pp
1506 - 1521, 2014.
- D. Baby, T. Virtanen, J. F. Gemmeke, T. Barker, H. Van
hamme. Exemplar-based noise robust automatic speech recognition
using modulation spectrogram features. In proc. IEEE Spoken Language
Technology Workshop, 2014.
- T. Virtanen, B. Raj, J. Gemmeke, H. Van hamme. Active-set Newton
algorithm for non-negative sparse coding of audio. In
Proc. International Conference on Acoustics, Speech, and Signal
Processing, 2014.
- A. Diment, R. Padmanabhan, T. Heittola, and T. Virtanen. Group delay function from all-pole models for musical instrument recognition. CMMR 2013 post-symposium proceedings, Lecture Notes in Computer Science, 2014.
- T. Barker, T. Virtanen, O. Delhomme. Ultrasound-Coupled
Semi-Supervised Nonnegative Matrix Factorisation for Speech
Enhancement. In
Proc. International Conference on Acoustics, Speech, and Signal
Processing, 2014.
- J. Nikunen, T. Virtanen. Multichannel Audio Separation by
Direction of Arrival Based Spatial Covariance Model and Non-Negative
Matrix Factorization. In
Proc. International Conference on Acoustics, Speech, and Signal
Processing, 2014.
- D. Baby, T. Virtanen, T. Barker, H. Van hamme. Coupled
Dictionary Training for Exemplar-based Speech Enhancement. In
Proc. International Conference on Acoustics, Speech, and Signal
Processing, 2014.
- K. Mahkonen, J.-K. Kämäräinen, T. Virtanen. Lifelog Scene Change
Detection Using Cascades of Audio and Video Detectors. In
Proc. Asian Conf. on Computer Vision (ACCV) Workshop on Intelligent Mobile and Egocentric Vision, 2014.
- T. Barker, H. Van hamme, T. Virtanen. Modelling Primitive
Streaming of Simple Tone Sequences Through Factorisation of
Modulation Pattern Tensors. In Proc. Interspeech, 2014.
- T. Barker and T. Virtanen. Semi-Supervised Non-Negative Tensor
Factorisation of Modulation Spectrograms for Monaural Speech
Separation. In proc. the 2014 International Joint Conference
on Neural Networks.
- T. Heittola, A. Mesaros, D. Korpi, A. Eronen, and T. Virtanen.
Method for creating location-specific audio textures. EURASIP Journal on Audio, Speech, and Music
Processing, 2014:9.
- T. Virtanen, J. F. Gemmeke, and
B. Raj Active-Set
Newton Algorithm for Overcomplete Non-Negative Representations of
Audio. IEEE Transactions on Audio, Speech, and Language
Processing, volume 21 issue 11, 2013.
- T. Heittola, A. Mesaros, A. Eronen and
T. Virtanen. Context-dependent
sound event detection. EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:1
- T. Barker and T. Virtanen. Non-negative Tensor Factorisation of Modulation Spectrograms for Monaural
Sound Source Separation. In proc. Interspeech 2013.
- A. Hurmalainen, J. F. Gemmeke, and T. Virtanen. Modelling Non-stationary Noise with Spectral
Factorisation in Automatic Speech Recognition. Computer Speech and
Language, volume 28, issue 3,
2013. (preprint)
- A. Hurmalainen and T. Virtanen. Learning state labels for sparse
classfication of speech with matrix deconvolution. In
proc. Automatic Speech Recognition and Understanding Workshop (ASRU)
2013.
- J. Kauppinen, A. Klapuri, and T. Virtanen. Music Self-Similarity Modeling
Using Augmented Nonnegative Matrix Factorization of Block and
Stripe Patterns. In proc. IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), 2013.
- A. Diment, P. Rajan, T. Heittola, and
T. Virtanen. Modified
Group Delay Feature for Musical Instrument Recognition. in proc. the 10th International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, 2013.
- A. Diment, T. Heittola and
T. Virtanen.
Semi-supervised learning for musical instrument recognition. In
proc. the 21st European Signal Processing Conference
(EUSIPCO), Marrakech, Morocco, 2013.
- A. Hurmalainen and T. Virtanen. Acquiring Variable Length Speech
Bases for Factorisation-Based Noise Robust Speech Recognition.
In proc. the 21st European Signal Processing Conference
(EUSIPCO), Marrakech, Morocco, 2013.
- Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng,
Haizhou
Li, Exemplar-based
unit selection for voice conversion utilizing temporal
information, In proc. Interspeech 2013.
- Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng and Haizhou Li.
Exemplar-based Voice Conversion using Non-negative Spectrogram
Deconvolution, in proc. 8th ISCA Speech Synthesis Workshop, 2013.
- F. Briggs, H. Yonghong, R. Raich, K. Eftaxias, L. Zhong,
W. Cukierski, S. Hadley, A. Hadley, M. Betts, X. Fern, J. Irvine,
L. Neal, A. Thomas, G. Fodor, G. Tsoumakas, W. Hong, N. Thi,
H. Huttunen, P. Ruusuvuori, T. Manninen, A. Diment, T. Virtanen,
J. Marzat, J. Defretin, D. Callender, C. Hurlburt, K. Larrey,
M. Milakov. The 9th annual MLSP competition: New methods for
acoustic classification of multiple simultaneous bird species in a
noisy environment. In Proc. IEEE International Workshop on Machine Learning for Signal Processing, Southampton, UK, September 2013.
- K. Mahkonen, A. Eronen, T. Virtanen, E. Helander, V. Popa,
J. Leppänen, and I. D. D. Curcio. Music dereverberation by
spectral
linear prediction in live recordings. In Proc. 16th International
Conference on Digital Audio Effects, Maynooth, Ireland, 2013.
- J. F. Gemmeke, T. Virtanen, and K. Demuynck. Exemplar-based joint channel and noise compensation.
In Proc. International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 2013.
- T. Heittola, A. Mesaros, T. Virtanen, and M. Gabbouj,
Supervised model
training for overlapping sound events based on unsupervised source separation.
In Proc. International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 2013.
- J. F. Gemmeke, T. Virtanen and A. Hurmalainen. HMM-regularization for NMF-based noise robust ASR.
In proc. The 2nd International Workshop on
Machine Listening in Multisource Environments, Vancouver, Canada, 2013.
- J. T. Geiger, F. Weninger, A. Hurmalainen, J. F. Gemmeke, M. Wöllmer, B. Schuller, G. Rigoll, and
T. Virtanen. The TUM+TUT+KUL approach to the 2nd CHiME Challenge: Multi-stream ASR exploiting BLSTM networks and sparse NMF.
In proc. The 2nd International Workshop on
Machine Listening in Multisource Environments, Vancouver, Canada, 2013.
- A. Hurmalainen, J. F. Gemmeke, and T. Virtanen.
Compact Long Context Spectral Factorisation Models for Noise
Robust Recognition of Medium Vocabulary Speech.
In proc. The 2nd International Workshop on
Machine Listening in Multisource Environments, Vancouver, Canada, 2013.
- T. Virtanen, R. Singh,
B. Raj. (eds). Techniques
for Noise Robustness in Automatic Speech Recognition, Wiley, 2012.
- D. Korpi, T. Heittola, T. Partala, A. Eronen, A. Mesaros, and
T. Virtanen.
On the human ability to discriminate audio ambiances
from similar locations of an urban environment. Personal and
Ubiquitous Computing, November
2012. (pdf preprint)
- J. Nikunen, T. Virtanen, M. Vilermo. Multichannel Audio Upmixing
by Time-Frequency Filtering Using Non-Negative Tensor
Factorization. Journal of the Audio Engineering Society, Volume 60, Issue 10, pp. 794-806; October 2012.
- E. Helander, H. Silen, T. Virtanen, M. Gabbouj.
Voice Conversion Using Dynamic Kernel Partial Least Squares
Regression. IEEE Transactions on Audio, Speech and Language
processing, Volume 20, Issue 3, 2012.
-
R. Saeidi, A. Hurmalainen, T. Virtanen, D.A. van Leeuwen.
Exemplar-based Sparse Representation and Sparse Discrimination for Noise Robust Speaker Identification. In. proc.
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, 2012.
- A. Hurmalainen, R. Saeidi and T. Virtanen. Group Sparsity for
Speaker Identity Discrimination in Factorisation-based Speech
Recognition. In proc. Interspeech, 13th Annual Conference of the
International Speech Communication Association, Portland, USA, 2012.
- A. Hurmalainen, J. Gemmeke and T. Virtanen. Detection, Separation
and Recognition of Speech From Continuous Signals Using Spectral
Factorisation. In proc. 20th European Signal Processing Conference,
Bucharest, Romania. 2012.
-
J. Nikunen, T. Virtanen, P. Pertilä and M. Vilermo. Permutation
Alignment of Frequency-domain ICA by Maximization of Intra-source
Envelope Correlations. In proc. 20th European Signal Processing
Conference, Bucharest, Romania, 2012.
- F. Weninger, M. Wöllmer, J. Geiger, B. Schuller, J. Gemmeke,
A. Hurmalainen, T. Virtanen, and G. Rigoll. Non-Negative Matrix
Factorization for Highly Noise-Robust ASR: to Enhance or to
Recognize? In proc. 37th International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012.
- A. Hurmalainen and T. Virtanen. Modelling spectro-temporal
dynamics in factorisation-based noise-robust automatic speech
recognition. In proc. 37th International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), Kyoto, Japan, 2012.
- F. Mazhar, T. Heittola, T. Virtanen, and J. Holm. Automatic Scoring
of Guitar Chords. In Proc. AES 45th International Conference,
Helsinki, Finland, 2012.
- F. J. Rodriguez-Serrano, J. J. Carabias-Orti, P. Vera-Candeas,
T. Virtanen, and
N. Ruiz-Reyes. Multiple
Instrument Mixtures Source Separation Evaluation Using
Instrument-Dependent NMF Models. In proc. The 10th International
Conference on Latent Variable Analysis and Source Separation,
Israel, 2012. Lecture Notes in Computer Science, 2012, Volume
7191/2012, pp. 380-387.
- Ali Bahrami Rad and Tuomas Virtanen. Phase spectrum prediction of
audio signals,
5th International Symposium on Communications, Control and Signal
Processing Rome, Italy, 2012.
- J. F. Gemmeke and T. Virtanen and A. Hurmalainen Exemplar-based
sparse representations for noise robust automatic speech recognition,, IEEE
Trans. Audio, Speech and Language Processing, Volume: 19, Issue: 7, 2011.
- J.J. Carabias-Orti, T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes and
F.J. Canadas-Quesada. Musical Instrument Sound Multi-Excitation Model for
Non-Negative Spectrogram Factorization. IEEE Journal of Selected
Topics in Signal Processing, Volume: 5, Issue: 6, 2011.
- B. Raj, R. Singh, and T. Virtanen. Phoneme-dependent NMF for speech
enhancement in monaural mixtures. 12th Annual Conference of the International Speech
Communication Association, Florence, Italy, 2011.
- K. Mahkonen, A. Hurmalainen, T. Virtanen, and J. Gemmeke. Mapping Sparse
Representation to State Likelihoods in Noise-Robust Automatic Speech
Recognition, 12th Annual Conference of the International Speech
Communication Association, Florence, Italy, 2011.
- H. Kallasjoki, U. Remes, J. F. Gemmeke, T. Virtanen, and K. J. Palomäki.
Uncertainty measures for improving exemplar-based source separation.
12th Annual Conference of the International Speech
Communication Association, Florence, Italy, 2011.
- J. Nikunen, T. Virtanen, and M. Vilermo. Multichannel audio upmixing
based on non-negative tensor factorization representation. IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics,
New Paltz, NY, 2011.
- J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and Yang Sun.
Toward a Practical Implementation of Exemplar-Based Noise Robust
ASR. In proc. the 19th European Signal Processing
Conference 2011, Barcelona, Spain
- A. Hurmalainen, K. Mahkonen, J. F. Gemmeke, and T. Virtanen.
Exemplar-Based Recognition of Speech in Highly Variable Noise.
International Workshop on Machine Listening in Multisource
Environments, Florence, Italy, 2011.
- T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen.
Sound Event Detection in Multisource Environments Using Source
Separation.
International Workshop on Machine Listening in Multisource
Environments, Florence, Italy, 2011.
- J. F. Gemmeke, T. Virtanen, and A. Hurmalainen. Exemplar-Based Speech
Enhancement and its Application to Noise-Robust Automatic Speech
Recognition.
International Workshop on Machine Listening in Multisource
Environments, Florence, Italy,
2011. Demonstration signals.
-
A. Hurmalainen, J. Gemmeke, and T. Virtanen. Non-negative
matrix deconvolution in noise robust speech recognition, in proc. ICASSP 2011.
-
T. Virtanen, J. Gemmeke, and A. Hurmalainen. State-based
labelling for a sparse representation of speech and its application to
robust speech recognition, in proc. Interspeech 2010.
-
B. Raj, T. Virtanen, S. Chaudhure, and R. Singh.
Non-negative matrix factorization based compensation of music for
automatic speech recognition
, presented in Interspeech 2010.
-
J. Gemmeke and T. Virtanen.
Artificial and online acquired noise dictionaries for noise robust ASR
, presented in Interspeech 2010.
-
A. Mesaros and T. Virtanen. Automatic recognition of lyrics in
singing, EURASIP Journal on Audio, Speech and Music Processing, Volume
2010 Article ID 546047. (online
version and pdf)
- A. Klapuri and T. Virtanen, Representing Musical Sounds with an
Interpolating State Model,, IEEE Trans. Audio, Speech and Language
Processing, vol 18. no. 3, 2010.
- E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj.
Voice Conversion Using
Partial Least Squares Regression. IEEE Transactions on
Audio, Speech, and Language Processing, 18 (5), 2010.
- M. Helen and T. Virtanen, Audio query by example using similarity
measures between probability density functions of features, EURASIP
Journal on Audio, Speech and Music Processing, Volume 2010, Article ID
179303. (online
version and pdf)
- T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen. Audio
context recognition using audio event histograms, in proc.
2010 European Signal Processing Conference (EUSIPCO-2010)
- A. Mesaros, T. Heittola,A. Eronen, and T. Virtanen. Acoustic
event detection in real life recordings, in proc. 2010 European Signal Processing Conference (EUSIPCO-2010)
- S. Keronen, U. Remes, K. Palomäki, T. Virtanen, and M. Kurimo.
Comparison of Noise Robust Methods in Large Vocabulary Speech
Recognition, in proc.
2010 European Signal Processing Conference (EUSIPCO-2010)
- J. Nikunen and T. Virtanen, Object-Based Audio Coding Using
Non-Negative Matrix Factorization for the Spectrogram
Representation, in proc. 128th Audio Engineering Society
Convention, London, UK, 2010.
- J. F. Gemmeke and T. Virtanen
Noise robust exemplar-based connected digit recognition,
in proc. of the 35th International Conference on Acoustics, Speech, and
Signal Processing (ICASSP),
Dallas, USA, 2010.
- A. Klapuri, T. Virtanen, and T. Heittola. Sound source separation
in monaural music signals using excitation-filter model and EM
algorithm, in proc. of the 35th International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, 2010.
- A. Mesaros and T. Virtanen, Recognition of phonemes and words in
singing, in proc. of
the 35th International Conference on Acoustics, Speech, and
Signal Processing (ICASSP),
Dallas, USA, 2010.
- J. Nikunen and T. Virtanen, Noise-to-mask ratio minimization by
weighted non-negative matrix factorization, in proc. of
the 35th International Conference on Acoustics, Speech, and
Signal Processing (ICASSP),
Dallas, USA, 2010.
- T. Heittola, A. Klapuri, and T. Virtanen.
Musical Instrument Recognition in Polyphonic Audio Using Source-Filter
Model for Sound Separation, in Proc. 10th Int. Society for
Music Information Retrieval Conf. (ISMIR 2009), Kobe, Japan, 2009. The
paper won the best paper award of the conference.
- T. Virtanen and T. Heittola.
Interpolating Hidden Markov Model and Its Application to
Automatic Instrument Recognition, in proc. ICASSP 2009.
- A. Mesaros. and T. Virtanen.
Adaptation of a speech recognizer for singing voice
, in EUSIPCO 2009.
- T. Virtanen.
Spectral Covariance in Prior Distributions of Non-Negative Matrix
Factorization Based Speech Separation
, in EUSIPCO 2009.
- M. Myllymäki and T. Virtanen.
Non-Stationary Noise Model Compensation in Voice Activity Detection
, in EUSIPCO 2009.
- T. Virtanen and A. T. Cemgil.
Mixtures of Gamma Priors for Non-Negative Matrix
Factorization Based Speech Separation, in proc. International Conference on Independent Component Analysis and Signal Separation, 2009.
Copyright Springer-Verlag. The publication is also available at springerlink.com.
- T. Virtanen, A. Mesaros, M. Ryynänen. Combining
Pitch-Based Inference and
Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic
Music, SAPA 2008.
- T. Virtanen, A. T. Cemgil, and S. J. Godsill. Bayesian
Extensions to Non-negative Matrix Factorisation for Audio Signal
Modelling, ICASSP 2008. This work was carried out at University of Cambridge, Signal Processing and
Communications Laboratory.
- A. Mesaros and T. Virtanen. Automatic
Alignment of Music Audio and Lyrics, DAFX08.
- M. Myllymäki and T. Virtanen. Voice
Activity Detection
in the Presence of Breathing Noise Using Neural Network and Hidden Markov
Model, EUSIPCO
2008.
- M. Ryynänen, T. Virtanen, J. Paulus, and A. Klapuri, Accompaniment
Separation and Karaoke Application Based on Automatic Melody
Transcription, in Proc. 2008 IEEE International Conference on
Multimedia & Expo (ICME'08), Hannover, Germany, June 2008. (demonstrations)
- A. Klapuri and T. Virtanen, Automatic music
transcription, In Handbook of Signal Processing in Acoustics, David
Havelock, Sonoko Kuwano, and Michael Vorlander (Eds.), Springer-Verlag,
2008.
- Virtanen, Tuomas., Monaural Sound Source Separation by Nonnegative Matrix
Factorization with Temporal Continuity and Sparseness Criteria,
IEEE Transactions on Audio, Speech, and Language Processing, vol 15, no. 3, March 2007.
- Virtanen, T., Helen, M.,
Probabilistic Model Based Similarity Measures for Audio Query-by-Example
, in proc. WASPAA 2007.
- Mesaros, A., Virtanen, T., Klapuri, A.
Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition
Methods,
International Conference on Music Information Retrieval, Vienna, Austria, 2007.
- Helen, M., Virtanen, T., Query
by Example of Audio signals Using Euclidean Distance Between Gaussian
Mixture Models, in proc. ICASSP 2007. Note: two small
errors in equations (8) - (11) have been corrected. The corrections do
not appear in the ICASSP conference proceedings.
- Helen, M., Virtanen, T.,
A Similarity Measure for Audio Query by Example Based on Perceptual Coding and Compression, in proc. 10th International Conference on Digital Audio Effects (DAFx-07), September 10-15. 2007.
- Virtanen, Tuomas, Monaural
Sound Source Separation by Perceptually Weighted Non-Negative Matrix
Factorization, Technical report, Tampere University of Technology,
Institute of Signal Processing, 2007.
- Virtanen, T., Klapuri, A., Analysis of polyphonic audio using source-filter model and non-negative matrix factorization, in Advances in Models for Acoustic Processing, Neural Information Processing Systems Workshop, 2006 (extended abstract).
- Virtanen, Tuomas., Speech Recognition Using Factorial Hidden Markov Models for Separation in the Feature Space, in proc. Interspeech 2006, Pittsburgh, USA. (demonstrations). The second best results among the papers presented in Interspeech 2006 Speech Separation Challenge special session.
- Virtanen, Tuomas. Unsupervised Learning Methods for Source Separation, in "Signal Processing Methods for Music Transcription", eds. Klapuri, A., Davy, M., Springer-Verlag, 2006.
- Helen, M., Virtanen, T., Separation of
Drums
From Polyphonic Music Using Non-Negative Matrix Factorization and Support Vector Machine, in proc. 13th European Signal Processing Conference Antalaya, Turkey, 2005.
(demonstrations)
- Klapuri, A., Virtanen, T., Helen, M., Modeling musical
sounds
with an interpolating state model, in proc. 13th European Signal Processing Conference, Antalya, Turkey, 2005.
- Paulus, J., Virtanen, T., Drum Transcription with Non-negative Spectrogram Factorisation, in proc. 13th European Signal Processing Conference Antalaya, Turkey, 2005
(demonstrations)
- Virtanen, Tuomas,
Separation of Sound Sources by Convolutive Sparse Coding,
ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2004.(demonstrations)
-
M.Helen, T.Virtanen, Perceptually
Motivated Parametric Representation for Harmonic Sounds for Data Compression Purposes, 6th International conference on Digital Audio Effects (DAFx-03), 2003, London, UK.
- Virtanen, Tuomas,
Algorithm for the separation of harmonic sounds with
time-frequency smoothness constraint, in proc. the 6th
International Conference on Digital Audio Effects (DAFx-03), London, UK.
- Virtanen, Tuomas,
Sound Source Separation Using Sparse Coding with Temporal
Continuity Objective, International Computer Music
Conference, ICMC 2003.
(demonstrations)
- Parviainen, M., Virtanen, T.,
Two-channel separation of
speech
using direction-of-arrival estimation and sinusoids plus transients
modeling, IEEE International Symposium on Intelligent Signal
Processing and Communication Systems, ISPACS 2003.
- Virtanen, T., Klapuri A.,
Separation of Harmonic Sounds Using Linear Models for the Overtone
Series, IEEE International Conference on Acoustics, Speech and
Signal Processing, ICASSP 2002.
(demonstrations)
- Virtanen, Tuomas,
Accurate Sinusoidal Model Analysis and Parameter Reduction
by Fusion of Components, 110th Audio Engineering Society Convention,
Amsterdam, Netherlands 2001.
- A. Klapuri, T. Virtanen, A. Eronen, J. Seppänen.
Automatic
transcription of musical recordings. In proc. Consistent & Reliable
Acoustic Cues Workshop, CRAC-01, Aalborg, Denmark, 2001.
- Virtanen, T., Klapuri A.
Separation of Harmonic Sounds Using Multipitch Analysis and Iterative
Parameter Estimation, Proc. IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics, New Paltz, New York, 2001.
(demonstrations)
- Klapuri, A., Virtanen, T., Holm, J.-M.,
Robust multipitch estimation for the analysis and manipulation of
polyphonic musical signals. In Proc. COST-G6 Conference
on Digital Audio Effects, DAFx-00, Verona, Italy, 2000.
- Sillanpää, J., Klapuri, A., Seppänen, J., Virtanen, T.,
Recognition of acoustic noise mixtures by combined bottom-up and
top-down processing. Proceedings of the European Signal
Processing Conference EUSIPCO, 2000.
- Virtanen, T., Klapuri, A.
Separation of Harmonic Sound Sources Using Sinusoidal Modeling,
IEEE International Conference on Acoustics, Speech and Signal Processing,
ICASSP 2000.
(demonstrations)
- Virtanen, Tuomas,
Sound Source
Separation in
Monaural
Music Signals, PhD thesis, Tampere University of Technology, 2006.
- Virtanen, Tuomas,
Audio Signal Modeling with Sinusoids Plus Noise, MSc
thesis, Tampere University of Technology 2001.
(demonstrations 1,
demonstrations 2)
-
S. Jakob, I. Korhonen, E. Ruokonen, T. Virtanen, A. Kogan,
and J. Takala. Detection of artifacts in monitored trends in intensive
care, Computer Methods and Programs in Biomedicine, 63 (200), 2000.
IEEE-Copyrighted Material:
Personal use of this material is permitted. However, permission to
reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or
redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works, must be obtained from the
IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service
Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331,
USA. Telephone: +Intl. 908-562-3966.
- Tuomas Virtanen, tuomas.virtanen@tuni.fi