Deep Learning in Automatic Piano Transcription

Automatic Piano Transcription

Automatic piano transcription (APT) is a challenging task. Although many algorithms have been developed, most suffer from shortcomings such as overlapping harmonics and long-lasting notes that no longer produce sound. Moreover, a significant proportion of the available labelled data consists of piano performances that are not well suited to this type of music. For these reasons, APT is still far from human expert performance.

However, the recent breakthroughs in deep learning have given rise to new, promising approaches to this problem. This thesis explores the use of convolutional neural networks for automatic music transcription, as compared to traditional generative models such as RNNs. In particular, we focus on using a new convolutional layer, the harmonic layer, that is designed to take into account a very basic principle in music: the harmonics of a musical note. This new model is analyzed and improved throughout this thesis. The performances of this model are compared to those of the Onsets and Frames algorithm and, more generally, to those of several other recent APT algorithms in both a qualitative evaluation, as well as in terms of pitch-wise F1 scores.

www.tartalover.net

To overcome some of the limitations of the Onsets and Frames algorithm, we propose a model that uses a score vector R88x1 as input, which is computed by combining all the frames in a piano sequence. The elements of this vector are weighted according to the probabilities that each frame belongs to a specific piano note, as defined by the probability that the corresponding key was activated. Each sequence is then evaluated with this score vector and, if it fails to pass the evaluation, it is discarded. This score vector is used as a training set for a model that generates the next piano sequence, together with a model that maps from the piano music space to that of the other instruments.

Deep Learning in Automatic Piano Transcription

In order to make the multi-instrument generation model work, we combine a MelodyCNN that learns a mapping between the previous piano sequence and the piano music space, and Conditional HarmonyCNNs that map from the piano music space to the other instruments. The resulting model is capable of generating multi-instrument sequences that sound musically complementary.

Furthermore, piano transcriptions offer a valuable avenue for pianists to develop their technical skills and musicianship. Many transcriptions present technical challenges that push pianists to hone their proficiency in areas such as finger dexterity, hand independence, and dynamic control. For instance, Liszt’s transcriptions of orchestral works often feature intricate passagework, rapid arpeggios, and wide leaps across the keyboard, demanding precision and agility from the performer. Likewise, jazz transcriptions may require mastery of complex syncopated rhythms, rapid chord changes, and improvisational techniques such as voicing and embellishment.

To improve the performances of this model, we experiment with various combinations of handcrafted features, such as acoustic features, harmonic features and feature aggregation. It turns out that a combination of global handcrafted features and deep learning on the piano roll part yields the best performance, with more than 10% improvement over either approach alone. This encourages the fusion of deep learning with handcrafted features for future applications.

Leave a Reply

Your email address will not be published. Required fields are marked *