A Karaoke version of a music piece is a version where the singer’s voice is no longer present in the song. Generally, such a version of the music is presented with subtitles of the lyrics allowing the user to sing to the rhythm of the “instrumental” piece.
Most of the time, these Karaoke versions are generated (“mastered”) by hand by a sound engineer. Entertainment companies already have large databases of this type of content. However, they can notcope with the amount of songs created every day, especially by amateur musicians, and must focus on the most famous songs. Thus, an automatic Karaoke generation tool would allow the general public to access a potentially infinite database of Karaoke. Similarly, in the case of streamed content, an automatic (and real-time) tool would also be required.
Approach
This internship will focus on the automatic generation of subtitles for such content, with the subtitles being synchronized with the music pieces.
A first axis of work would be the adaptation of a state of the art speech-to-text method for singing voice. A second axis of work would be to use the lyrics that are available online in plain text version. It would then be a matter of synchronizing and displaying this version based on a comparison with the version produced by the algo of speech-to-text.
Who are we looking for ?
Preparing an engineering degree or master’s degree, or even a PhD (3 month visit), you preferably have knowledge in the development and implementation of advanced algorithms for digital audio signal processing. Skills or experience in Natural Language Processing (NLP) or symbolic data processing would be a plus.
In addition, notions in the following various fields would be appreciated :
As well as experiences in the following areas :