To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout our docs here
| Category | Technologies |
|---|---|
| Programming Languages | |
| Frameworks | |
| Libraries | |
| Deep Learning Models | |
| Datasets | |
| Tools | |
| Visualization & Analysis |
The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.
This is further split into 3 types of models: Composer, Jamming and Hybrid models
It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator.
It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks.
The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Zi
This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:
A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch.
If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.
This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Zt, Track Dependent Latent Vectors, Zi, and Track Dependent Temporal Vectors, Zit
The LPD-5 Cleansed dataset is a curated version of the original Lakh Pianoroll Dataset (LPD-5), which itself is derived from the Lakh MIDI Dataset (LMD) containing MIDI files from various sources. It consists of over 60,000 multi-track piano-rolls, each aligned to 4/4 time.
-
Install the dependencies
pip install -r requirements -
Go to the particular version folder you want to train and download the
.ipynbfile. -
Run the Nbk locally or in JupyterLab Notebooks
-
To access the trained checkpoint for a particular model, check the
README.mdfile in the particular Version's folder
To access the output audio, check out the Audio folder under the version Folder
- Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
- Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout
Made By Pratyush Rao and Yashasvi Choudhary