Looking for pianists who would like to play "some" Boogie-Woogie

Back to forum

HARK0D posted on 10 giu 2020 #1

Member

Posts: 8

Joined: 5 mag 2020

Hi all,

Not a musician here, but an AI master student instead. I am currently doing my master project and I was hoping if we could help each other out.

To give you an idea what I am working on; it can be best described as a sort of dynamic low- and high-pass filter. The purpose of this system is to separate piano recordings by the playing hands. Simply put, an audio file of a piano recording comes in, two audio files come out. One contains the left hand performance, the other contains the right hand performance.
Considering that such a task (piano hand separation) has not been performed yet on raw audio, I wanted to keep it as feasible as possible by at first focusing on Boogie Woogie only. My assumption here is that Boogie Woogie would be the easiest genre to separate by playing hand due to the strong contrast between left- and right-hand performance that already exist.

The development of such systems does require data however, piano recordings in this case. I would need isolated recordings of left-hand and right-hand performance (playing Boogie Woogie). These are essential for evaluation purposes, but also for training a supervised Machine Learning approach (basically learning patterns from examples). As you might imagine however, such data does not exist. Therefore, a dataset has to be built.

This is where Wikiloops comes in!

So what would I exactly need? I am very aware that this a lot to ask, but I am aiming for at least around 50 recordings where the left- and right-hand are isolated in separate files. The reason being for why I would require so many recordings is mainly for training purposes. Supervised Machine Learning systems really do require many examples such to make a good generalization of the data it is training on. The timeline to finish this dataset can be easily spread out across 3 months however, as I can start off with a smaller dataset for prototyping and just scale up whenever the data is available.

I hope that there are member who are willing to lend out a helping hand. Not only out of sheer good-will however, because I want to propose something as well. Consider that the availability of good data is perhaps one of the greatest challenges in Machine Learning nowadays, this includes the field of music processing and blind audio source separation. The positive impact additional data has had on performance can for instance be seen in the MUSDB18 4-track music source separation task. MUSDB18 is a dataset of 150 songs where each song is separated in 4 tracks, namely: vocals, drums, bass and other. The purpose of this task therefore is to separate these tracks as good as possible. When looking at the best performing systems on this tasks (https://paperswithcode.com/sota/music-source-separation-on-musdb18), the top two were trained on additional data beyond the initial 150 tracks from the MUSDB18 dataset itself.
Now, consider that Wikiloops has roughly 151K backing tracks WITH EVERY INSTRUMENT SEPARATED IN ITS OWN TRACK, this website is in my opinion really a gold mine for AI music scientists to be discovered. If we manage to build this dataset, I will make sure to credit Wikiloops appropriately in any academic work I am going to publish.

I know that "promising exposure" is a laughed at cliche for artists nowadays, however the fact is that increased exposure of Wikiloops to the scientific community could lead to more traffic on Wikiloops and in turn lead to more paying members such as me.

I hope that we are able to collaborate, but if that is not possible, no hard feelings either!

Cheers,

Haris

OliVBee posted on 11 giu 2020 #2

SUPPORTER

Posts: 763

Joined: 7 gen 2013

Hey Haris !

Your project sounds interesting and fun :) i really hope you're going to find some help here !

Ernie Ball 2221 Regular Slinky Gitarrensaiten

Saitensatz für E-Gitarre

Read 10783 ratings

5,90 €

iThis widget links to Thomann, our affiliate partner. We may receive a commission when you purchase a product there. Visit Shop

marmotte posted on 11 giu 2020 #3

Member

Posts: 63

Joined: 26 feb 2014

Cool study and interesting challenge, too bad I am not pianist :)

HARK0D posted on 11 giu 2020 #4

Member

Posts: 8

Joined: 5 mag 2020

Haha yeah one alternative I am considering is to make a dataset by manually separating and rendering MIDIs. Given that the priority is also less on timbreal features (in theory not at all in fact), but rather a matter of pitch-based separation.

marmotte posted on 12 giu 2020 #5

Member

Posts: 63

Joined: 26 feb 2014

How about when some fingers cross between left and right hand ? B)

HARK0D posted on 12 giu 2020 #6

Member

Posts: 8

Joined: 5 mag 2020

marmotte wrote:
How about when some fingers cross between left and right hand ? B)

These are somewhat fringe cases which I tried to avoid by at first focusing on Boogie Woogie as the use case. The bass and treble clefs in this genre rarely to never cross over, making such a separation easier to pull off.

Other genres might prove to be more difficult of course, however these are not the main priority at the moment.

Marceys posted on 12 giu 2020 #7

SUPPORTER

Posts: 160

Joined: 9 giu 2014

Cool study!

I was wondering how you can isolate the left and right hand because the sounds are blending together......
Is it a frequency that you take out of it?

HARK0D posted on 12 giu 2020 #8

Member

Posts: 8

Joined: 5 mag 2020

Marceys wrote:
Cool study!

I was wondering how you can isolate the left and right hand because the sounds are blending together......
Is it a frequency that you take out of it?

You can transform a waveform to different representations, such as a spectrogram. In a waveform representation the amplitude is easily identifiable, however the frequency information is less clear. In a spectrogram more or less the opposite is true, where the frequency in Hz is plotted along an y-axis.

[img]https://www.researchgate.net/profile/Phillip_Lobel/publication/267827408/figure/fig2/AS:295457826852866@1447454043380/Spectrograms-and-Oscillograms-This-is-an-oscillogram-and-spectrogram-of-the-boatwhistle.png[/img]

Considering that the separation is mainly between the C4 middle key (~260Hz), such a separation between bass and treble can become even clearer by applying a logarithmic transformation on the spectrogram. What it effectively can do in practice is stretching out the lower frequencies while squeezing the higher frequencies, therefore making sound at the lower frequency more easily separable.

[img]https://d3i71xaburhd42.cloudfront.net/4dbe43d67677f1371a7ac6f9072c9fd4fe9c9f87/17-Figure2-1.png[/img]