Comunicacio presentada a: ATMM 2014 The 3rd International Conference on Audio Technologies for Music and Media, celebrat del 12 al 14 de Novembre de 2014 a Ankara i Istanbul, Turquia.
Research Interests:
We compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe...
moreWe compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe each system individually then compare the systems in terms of availability, the sound model(s) they use, interpolation models, noise modelling, the mutability of various sound models, the parameters that must be set to perform analysis, and characteristic artefacts. Although we have not directly compared the analysis results among the different systems, our work has made such a comparison possible.
This article brings forward the question of which acoustic features are the most adequate for identifying beats computationally in acoustic music pieces. We consider many different features computed on consecutive short portions of...
moreThis article brings forward the question of which acoustic features are the most adequate for identifying beats computationally in acoustic music pieces. We consider many different features computed on consecutive short portions of acoustic signal, among which those currently promoted in the literature on beat induction from acoustic signals and several original features, unmentioned in this literature. Evaluation of feature sets regarding their ability to provide reliable cues to the localization of beats is based on a machine learning methodology with a large corpus of beat-annotated music pieces, in audio format, covering distinctive music categories. Confirming common knowledge, energy is shown to be a very relevant cue to beat induction (especially the temporal variation of energy in various frequency bands, with the special relevance of frequency bands below 500 Hz and above 5 kHz). Some of the new features proposed in this paper are shown to outperform features currently prom...
In this paper a method for extracting semantic information from online music discussion forums is proposed. The semantic relations are inferred from the co-occurrence of musical concepts in forum posts, using network analysis. The method...
moreIn this paper a method for extracting semantic information from online music discussion forums is proposed. The semantic relations are inferred from the co-occurrence of musical concepts in forum posts, using network analysis. The method starts by defining a dictionary of common music terms in an art music tradition. Then, it creates a complex network representation of the online forum by matching such dictionary against the forum posts. Once the complex network is built we can study different network measures, ...
Intonation is a fffndamental mffsic concept that has a special relefiance in Indian art mffsic. It is characteristic of the rāga and intrinsic to the mffsical expression of the performer. Describing intonation is of importance to sefieral...
moreIntonation is a fffndamental mffsic concept that has a special relefiance in Indian art mffsic. It is characteristic of the rāga and intrinsic to the mffsical expression of the performer. Describing intonation is of importance to sefieral information retriefial tasks like the defielopment of rāga and artist similarity measffres. In offr prefiioffs flork, fle proposed a compact representation of intonation based on the parametrization of the pitch histogram of a performance and demonstrated the ffseffflness of this representation throffgh an ...
In this paper we propose a new approach for tonic identification in Indian art music and present a proposal for a complete iterative system for the same. Our method splits the task of tonic pitch identification into two stages. In the...
moreIn this paper we propose a new approach for tonic identification in Indian art music and present a proposal for a complete iterative system for the same. Our method splits the task of tonic pitch identification into two stages. In the first stage, which is applicable to both vocal and instrumental music, we perform a multi-pitch analysis of the audio signal to identify the tonic pitch-class. Multi-pitch analysis allows us to take advantage of the drone sound, which constantly reinforces the tonic. In the second stage we estimate the octave in ...
Xavier Amatriain, Jordi Bonada, Xavier Serra Audiovisual Institute, Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain {xamat,jboni,xserra}@iua.upf.es
http://www.iua.upf.es ... [Published in the Proceedings of the Digital Audio...
moreXavier Amatriain, Jordi Bonada, Xavier Serra Audiovisual Institute, Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain {xamat,jboni,xserra}@iua.upf.es
http://www.iua.upf.es ... [Published in the Proceedings of the Digital Audio Effects Workshop (DAFX98), 1998] ...
Sound content description is one of the aims of the MPEG-7 initiative. Although MPEG-7 focuses on indexing and retrieval of audio, there are other sound content-based processing applications waiting to be developed once we have a robust...
moreSound content description is one of the aims of the MPEG-7 initiative. Although MPEG-7 focuses on indexing and retrieval of audio, there are other sound content-based processing applications waiting to be developed once we have a robust set of descriptors and structures for putting them into relation and for expressing semantic concerns about sound. Spectral Modeling techniques provide a valuable framework for extracting and organizing sound content descriptions. All our descriptors can be considered low-or mid- ...
Expressive performance characterization is traditionally based on the analysis of the main differences between performances, players, playing styles and emotional intentions. This work addresses the characterization of expressive bassoon...
moreExpressive performance characterization is traditionally based on the analysis of the main differences between performances, players, playing styles and emotional intentions. This work addresses the characterization of expressive bassoon ornaments by analyzing audio recordings played by a professional bassoonist. This characterization is then used to generate expressive ornaments from symbolic representations by means of Machine Learning.
Recently, we have seen an increased use and support of the Sound Description Interchange Format (SDIF), among which the integration of SDIF in widely used environments such as MAX/MSP (Wright, Dudas, Khoury, Wang, Zicarelli, 1999) and...
moreRecently, we have seen an increased use and support of the Sound Description Interchange Format (SDIF), among which the integration of SDIF in widely used environments such as MAX/MSP (Wright, Dudas, Khoury, Wang, Zicarelli, 1999) and MPEG-4 (Wright, Scheirer, 1999). To follow and encourage this trend, we have added support for importing and exporting SDIF files in the latest version of the SMS applications, a group of applications for spectrum-modeling analysis and synthesis. In this paper we discuss the ...
We introduce two large open data collections of Indian Art Music, both its Carnatic and Hindustani traditions, comprising audio from vocal concerts, editorial metadata, and time-aligned melody, rhythm, and structure annotations. Shared...
moreWe introduce two large open data collections of Indian Art Music, both its Carnatic and Hindustani traditions, comprising audio from vocal concerts, editorial metadata, and time-aligned melody, rhythm, and structure annotations. Shared under Creative Commons licenses, they currently form the largest annotated data collections available for computational analysis of Indian Art Music. The collections are intended to provide audio and ground truth for several music information research tasks and large-scale data-driven analysis in musicological studies. A part of the Saraga Carnatic collection also has multitrack recordings, making it a valuable collection for research on melody extraction, source separation, automatic mixing, and performance analysis. We describe the tenets and the process of collection, annotation, and organization of the data. We provide easy access to the audio, metadata, and the annotations in the collections through an API, along with a companion website that has...
We present here a pipeline for the automated discovery of repeated motifs in audio. Our approach relies on state-of-the-art source separation, predominant pitch extraction and time series motif detection via the matrix profile. Owing to...
moreWe present here a pipeline for the automated discovery of repeated motifs in audio. Our approach relies on state-of-the-art source separation, predominant pitch extraction and time series motif detection via the matrix profile. Owing to the appropriateness of this approach for the task of motif recognition in the Carnatic musical style of South India, and with access to the recently released Saraga Dataset of Indian Art Music, we provide an example application on a recording of a performance in the Carnatic rāga, Rītigaul .a, finding 56 distinct patterns of varying lengths that occur at least 3 times in the recording. The authors include a discussion of the potential musicological significance of this motif finding approach in relation to the particular tradition and beyond.
Music synthesis is one of the most essential features of music notation software and applications aimed at navigating digital music score libraries. Currently, the majority of music synthesis tools are designed for Eurogenetic musics, and...
moreMusic synthesis is one of the most essential features of music notation software and applications aimed at navigating digital music score libraries. Currently, the majority of music synthesis tools are designed for Eurogenetic musics, and they are not able to address the culture-specific aspects (such as tuning, intonation and timbre) of many music cultures. In this paper, we focus on the tuning dimension in musical score playback for Turkish Makam Music (TMM). Based on existing computational tuning analysis methodologies, we propose an automatic synthesis methodology, which allows the user to listen to a music score synthesized according to the tuning extracted from an audio recording. As a proof-of-concept, we also present a desktop application, which allows the users to listen to playback of TMM music scores according to the theoretical temperament or a user specified reference recording. The playback of the synthesis using the tuning extracted from the recordings may provide a b...
We compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe...
moreWe compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe each system individually then compare the systems in terms of availability, the sound model(s) they use, interpolation models, noise modelling, the mutability of various sound models, the parameters that must be set to perform analysis, and characteristic artefacts. Although we have not directly compared the analysis results among the different systems, our work has made such a comparison possible.
On the web, searching for sounds is usually limited to text queries. This requires adding textual descriptions to each audio file, which is indexed effectively as a text document. Recent developments in browser technologies allow...
moreOn the web, searching for sounds is usually limited to text queries. This requires adding textual descriptions to each audio file, which is indexed effectively as a text document. Recent developments in browser technologies allow developers to access the audio input or microphone of the computer, enabling Query by Example (QbE) applications. We present a demonstration system that allows users to make queries on Freesound.org by recording audio in the browser. A basic prototype is available online.
Ney is an end-blown flute which is mainly used for Makam music. Although from the beginning of 20 th century a score representation based on extending the Western music is used, because of its rich articulation repertoire, actual Ney...
moreNey is an end-blown flute which is mainly used for Makam music. Although from the beginning of 20 th century a score representation based on extending the Western music is used, because of its rich articulation repertoire, actual Ney music can not be totally represented by written score. Ney is still taught and transmitted orally in Turkey. Because of that the performance has a distinct and important role in Ney music. Therefore signal analysis of ney performances is crucial for understanding the actual music. Another important aspect which is also a part of the performance is the articulations that performers apply. In Makam music in Turkey none of the articulations are taught even named by teachers. Articulations in Ney are valuable for understanding the real performance. Since articulations are not taught and their places are not marked in the score, the choice and character of the articulation is unique for each performer which also makes each performance unique. Our method anal...
DAFX is an established conference that has become a reference gathering for the researchers working on audio signal processing. In this presentation I will go back ten years to the beginning of this conference and to the ideas that...
moreDAFX is an established conference that has become a reference gathering for the researchers working on audio signal processing. In this presentation I will go back ten years to the beginning of this conference and to the ideas that promoted it. Then I will jump to the present, to the current context of our research field, different from the one ten years ago, and I will make some personal reflections on the current situation and the challenges that we are encountering.
Comunicacio prsentada a: International Conference on Digital Audio Effects (DAFx) celebrada del 8 al 12 de setembre de 2020 a Viena, Austria.
The current research in Music Information Retrieval (MIR) is showing the potential that the Information Technologies can have in music related applications. A major research challenge in that direction is how to automatically...
moreThe current research in Music Information Retrieval (MIR) is showing the potential that the Information Technologies can have in music related applications. A major research challenge in that direction is how to automatically describe/annotate audio recordings and how to use the resulting descriptions to discover and appreciate music in new ways. But music is a complex phenomenon and the description of an audio recording has to deal with this complexity. For example, each music culture has specificities and emphasizes different musical and communication aspects, thus the musical recordings of each culture should be described differently. At the same time these cultural specificities give us the opportunity to pay attention to musical concepts and facets that, despite being present in most world musics, are not easily noticed by listeners. In this paper we present some of the work done in the CompMusic project, including ideas and specific examples on how to take advantage of the cul...
In this paper, we propose an efficient and reproducible deep learning model for musical onset detection (MOD). We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack...
moreIn this paper, we propose an efficient and reproducible deep learning model for musical onset detection (MOD). We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures. Taking the above issues into account, we experiment with seven deep learning architectures. The most efficient one achieves equivalent performance to our implementation of the state-of-the-art architecture. However, it has only 28.3% of the total number of trainable parameters compared to the state-of-the-art. Our experiments are conducted using two different datasets: one mainly consists of instrumental music excerpts, and another developed by ourselves includes only solo singing voice excerpts. Further, inter-dataset transfer learning experiments are conducted...
Comunicacio presentada a: First International Workshop on Semantic Music and Media, SMAM, celebrada a Sydney (Australia) el 21 d'octubre de 2013.
The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not. In this paper, we describe our approach to this problem and the final system that was submitted to the...
moreThe Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not. In this paper, we describe our approach to this problem and the final system that was submitted to the challenge by our team from the Music Technology Group (MTG) under the name "aferraro". This system consists in combining the predictions of multiple boosting trees models trained with features extracted from the sessions and the tracks. The proposed approach achieves good overall performance (MAA of 0.554), with our model ranked 14th out of more than 600 submissions in the final leaderboard.