From Programs to Interpretable Deep Models and Back

electronics-logo

Article Menu

/ajax/scifeed/subscribe

Article

An Interpretable Deep Learning Model for Automatic Sound Classification

1

Music Technology Group, Universitat Pompeu Fabra, 08018 Barcelona, Spain

2

Facultad de Ingeniería, Universidad de la República, Montevideo 11300, Uruguay

*

Author to whom correspondence should be addressed.

Academic Editors: Chiman Kwan, Alexander Lerch and Peter Knees

Received: 27 February 2021 / Revised: 29 March 2021 / Accepted: 31 March 2021 / Published: 2 April 2021

Abstract

Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to unintended effects, such as being susceptible to adversarial attacks or the reinforcement of biases. There is still a lack of research in the audio domain, despite the increasing interest in developing deep learning models that provide explanations of their decisions. To reduce this gap, we propose a novel interpretable deep learning model for automatic sound classification, which explains its predictions based on the similarity of the input to a set of learned prototypes in a latent space. We leverage domain knowledge by designing a frequency-dependent similarity measure and by considering different time-frequency resolutions in the feature space. The proposed model achieves results that are comparable to that of the state-of-the-art methods in three different sound classification tasks involving speech, music, and environmental audio. In addition, we present two automatic methods to prune the proposed model that exploit its interpretability. Our system is open source and it is accompanied by a web application for the manual editing of the model, which allows for a human-in-the-loop debugging approach. View Full-Text

Show Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

SciFeed

Share and Cite

MDPI and ACS Style

Zinemanas, P.; Rocamora, M.; Miron, M.; Font, F.; Serra, X. An Interpretable Deep Learning Model for Automatic Sound Classification. Electronics 2021, 10, 850. https://doi.org/10.3390/electronics10070850

AMA Style

Zinemanas P, Rocamora M, Miron M, Font F, Serra X. An Interpretable Deep Learning Model for Automatic Sound Classification. Electronics. 2021; 10(7):850. https://doi.org/10.3390/electronics10070850

Chicago/Turabian Style

Zinemanas, Pablo, Martín Rocamora, Marius Miron, Frederic Font, and Xavier Serra. 2021. "An Interpretable Deep Learning Model for Automatic Sound Classification" Electronics 10, no. 7: 850. https://doi.org/10.3390/electronics10070850

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

For more information on the journal statistics, click here.

Multiple requests from the same IP address are counted as one view.

Article Access Map by Country/Region

Author Biographies

Pablo Zinemanas is a PhD student at the Music Technology Group of Universitat Pompeu Fabra, Barcelona. In 2019, he obtained a master degree in Electrical Engineering from School of Engineering, Universidad de la República, Uruguay. He has previously worked as research and teaching assistant at Universidad de la República. His main research interests include Audio Signal Processing, Interpretable Machine Learning and Deep Learning.

Martín Rocamora is an Assistant Professor in Signal Processing at Universidad de la República (UDELAR), Uruguay. He has previously worked as Teaching Assistant in Music Technology at the School of Music, UDELAR. He holds B.Sc, M.Sc., and D.Sc. degrees in Electrical Engineering from the School of Engineering, UDELAR. His research focuses on the application of machine learning and signal processing to audio signals, with applications in machine listening, music information retrieval, and computational musicology.

Marius Miron is a Senior Researcher at Music Technology Group, Pompeu Fabra University, Barcelona. His research areas are audio signal processing and machine learning. He has previously worked as a Post-doctoral researcher for the European Commission on fairness and interpretability in machine learning. He obtained his PhD from Pompeu Fabra University on source separation for orchestral music mixtures.

Frederic Font Corbera is a senior researcher at the Music Technology Group of the Department of Information and Communication Technologies of Universitat Pompeu Fabra, Barcelona. In 2015, he obtained a PhD in Sound and Music Computing from Universitat Pompeu Fabra. His current research is focused on the understanding and analysis of large audio collections, including sound characterisation and classification, to improve sound retrieval techniques and, generally, to facilitate the reuse of large audio collections in creative and scientific contexts. Frederic is the coordinator of the Freesound website and related research and development projects, and has recently coordinated the EU funded Audio Commons Initiative.

Xavier Serra is a Professor of the Department of Information and Communication Technologies and Director of the Music Technology Group at the Universitat Pompeu Fabra in Barcelona. After a multidisciplinary academic education, he obtained a PhD in Computer Music from Stanford University in 1989. His research interests cover the computational analysis, description, and synthesis of sound and music signals, with a balance between basic and applied research. Dr. Serra is very active in the fields of Audio Signal Processing, Sound and Music Computing, Music Information Retrieval and Computational Musicology at the local and international levels, being involved in the editorial board of a number of journals and conferences and giving lectures on current and future challenges of these fields. He was awarded an Advanced Grant from the European Research Council to carry out the project CompMusic aimed at promoting multicultural approaches in music information research.

From Programs to Interpretable Deep Models and Back

Source: https://www.mdpi.com/2079-9292/10/7/850

0 Response to "From Programs to Interpretable Deep Models and Back"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel