MuseGAN

🎯 Aim

To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. To generate polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout our docs here

⚙️ Tech Stack

Category	Technologies
Programming Languages
Frameworks
Libraries
Deep Learning Models
Datasets
Tools
Visualization & Analysis

💃 Model Structure

The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.

Multi-Track Model

This is further split into 3 types of models: Composer, Jamming and Hybrid models

Composer Model

It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator.

Jamming Model

It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks.

Hybrid Model

The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Z_i

Temporal Model

This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:

Generation From Sratch

A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch.

Conditional Generation

If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.

Overall Structure

This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Z_t, Track Dependent Latent Vectors, Z_i, and Track Dependent Temporal Vectors, Z_it

📊 Data

The LPD-5 Cleansed dataset is a curated version of the original Lakh Pianoroll Dataset (LPD-5), which itself is derived from the Lakh MIDI Dataset (LMD) containing MIDI files from various sources. It consists of over 60,000 multi-track piano-rolls, each aligned to 4/4 time.

🚂 How To Train The Model

Install the dependencies

pip install -r requirements
Go to the particular version folder you want to train and download the .ipynb file.
Run the Nbk locally or in JupyterLab Notebooks
To access the trained checkpoint for a particular model, check the README.md file in the particular Version's folder

🎼 Outputs

To access the output audio, check out the Audio folder under the version Folder

👏 Acknowledgement

Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout

Made By Pratyush Rao and Yashasvi Choudhary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MuseGAN

🎯 Aim

⚙️ Tech Stack

💃 Model Structure

Multi-Track Model

Composer Model

Jamming Model

Hybrid Model

Temporal Model

Generation From Sratch

Conditional Generation

Overall Structure

📊 Data

🚂 How To Train The Model

🎼 Outputs

👏 Acknowledgement

FilesExpand file tree

MuseGAN.md

Latest commit

History

MuseGAN.md

File metadata and controls

MuseGAN

🎯 Aim

⚙️ Tech Stack

💃 Model Structure

Multi-Track Model

Composer Model

Jamming Model

Hybrid Model

Temporal Model

Generation From Sratch

Conditional Generation

Overall Structure

📊 Data

🚂 How To Train The Model

🎼 Outputs

👏 Acknowledgement