At EPFL, the Montreux Jazz Festival recordings are considered as an innovation platform, a research infrastructure that can be exploited in many different fields of studies, in particular for digital signal processing, data analysis technologies, or musicology where recent advances enable automatic recognition of musical structures and their linkage through metadata. However, access to the data for researchers must be strictly controlled because of the author’s rights, and Open Science is hindered by the copyright restrictions which limit access to the data.
Through this project, the EPFL Open Science Fund allowed us to implement the first version of a secured IT infrastructure that will significantly increase the open use of this unique dataset across worldwide research communities. Designed in collaboration with both the Swiss Data Science Center (SDSC) and the EPFL Digital and Cognitive Musicology Lab (DCML), this newly developed platform provides the machine learning partners of the Montreux Jazz Digital Project with a structure that allows to remotely execute algorithms on large Montreux Jazz datasets, without direct access to the audio/video media files. In parallel, an authorized small data subset was prepared for public sharing, which can be used to tune algorithms or showcase and explain the results to researchers, musicologists, and in a larger way to music lovers. Hence, respecting the author’s rights, the platform implements a secure way of using the archives for the researchers. It also allows monitoring and comparing performances and versions of the algorithms developed by the partners.
A large part of the scientific developments in the project were brought by the Swiss Data Science Center, which put together for the first time its monitoring / versioning tool Renku and the Swiss Data Custodian that controls data use and ownership. The structure of the platform is shown below:
- Front-end : allows to specify the data and the algorithms, and launch the execution
- Swiss Data Custodian : managing the user access and permissions
- Renku : managing algorithm versioning and monitoring
- Data backend : allows to make queries in the corpus of data, and pre-compute these data
- Compute backend : allows to execute algorithms on the data returned by the data backend
- Core backend : managing the communication/synchronization between the different modules
The platform takes benefit of modern design in user interface from Coteries, a start-up from the EPFL Innovation Park.
Providing state-of-the-art algorithms to analyze and visualize musical parameters, the researchers in musicology contributed significantly to the development and evaluation of the platform. From the Keyscape algorithm, an example of visualization was developed by Kirell Benzi to help in public understanding and promotion of the platform.
Several partner institutions of DCML, such as the Queen Mary University of London (Center for Digital Musicology), Durham University (Music Cognition), or Fraunhofer Institute (Digital Media Technology), will now start using the infrastructure in the frame of a 6-month testing phase. In the next developments, the platform will be improved to provide easier – metadata-based – selection of the audio/video samples. It will then be extended to any machine learning type of research, in both audio and video.
More information: EPFL Open Science Fund: Project presentation