Machine learning-generated audio and live performance
2019
— — —
For Recurrent Legumes, I trained a SampleRNN on the audio files from a previous live performance set, Legumes. The SampleRNN is a type of recurrent neural network that works from the samples of the audio files and learns a distribution of probabilities that allows it to generate audio. Legumes is a set of music performed at a number of venues over the past year. It comprises 6 tracks and approximately 20 minutes of experimental electronic sound.
The network trained for ~3 days realtime on an NVidia GTX 980 TI graphics card. I tried a number of other architectures but wasn’t able to get them working due to memory constraints or not being able to reproduce training and audio generation.
The network generated 781 2-second samples during training. These indicated the ability to reproduce textures and micro changes in sounds, e.g. sharp impulses such as drum beats. After training I generated another 22 10-second samples, and a further 55 2-minute samples to test for larger structures. The larger structures did not function as well as expected for this architecture, I believe largely due to needing to reduce the amount of recurrence due to memory constraints. The network often tends towards noise or a particular texture of short, grinding sounds that appeared briefly in the training data. The structures produced by the network are strange and quite different from anything I would composed. I’m also particularly interested in how the network learns from samples so is largely reproducing texture rather than individual instrumentation. This results in sounds that seem to be simultaneously drum patterns and extended synthesiser tones, or constant noise that merges with more individual sounds. As such it produces a number of rich textural variations, albeit within a limited range.
After generating this audio, I did some amount of cropping and re-organising of the sounds into what I considered an interesting structure. I’m particularly fascinated by the progression of compositional development, so chose to use parts of the generated audio across the duration of training that highlighted different ways of texturally developing recognisable sections. The audio from the initial performance is uploaded here.
Thanks to Unisound for the Github code. Further experiments using more recent architectures to follow.
—
Performed at Petersham Bowling Club, 14/07/2019
—