[MXNET-1210 ] Gluon Audio - Example#13325
[MXNET-1210 ] Gluon Audio - Example#13325sandeep-krishnamurthy merged 21 commits intoapache:masterfrom
Conversation
|
@mxnet-label-bot add [pr-awaiting-review] |
|
@sandeep-krishnamurthy @roywei Example created. |
roywei
left a comment
There was a problem hiding this comment.
generally looks good, few comments. Really looking forward to seeing this design can be extended on more complex models.
| try: | ||
| import librosa | ||
| except ImportError as e: | ||
| warnings.warn("gluon/contrib/data/audio/datasets.py : librosa dependency could not be resolved or \ |
There was a problem hiding this comment.
change warning message, not in contrib now
example/gluon/urban_sounds/README.md
Outdated
| @@ -0,0 +1,22 @@ | |||
| # Urban Sounds classification in MXNet | |||
| train_csv: str, default None | ||
| train_csv should be populated by the training csv filename | ||
| file_format: str, default '.wav' | ||
| The format of the audio files(.wav, .mp3) |
There was a problem hiding this comment.
mp3 is not supported now, right?
| label = self.items[idx][1] | ||
|
|
||
| if librosa is not None: | ||
| X1, _ = librosa.load(filename, res_type='kaiser_fast') |
There was a problem hiding this comment.
what is 'kaiser_fast' for 'res_type'
| return len(self.items) | ||
|
|
||
|
|
||
| def transform_first(self, fn, lazy=True): |
There was a problem hiding this comment.
why default for lazy is True, but False is passed in super?
Explain the reason and recommendation of choosing default values in doc string or comment
| try: | ||
| import librosa | ||
| except ImportError as e: | ||
| warnings.warn("gluon/contrib/data/audio/transforms.py : librosa dependency could not be resolved or \ |
stu1130
left a comment
There was a problem hiding this comment.
Great work! @gaurav-gireesh
| self._train_csv = train_csv | ||
| if file_format.lower() not in self._exts: | ||
| warnings.warn("format {} not supported currently.".format(file_format)) | ||
| return |
There was a problem hiding this comment.
I think raise is better here
if file_format.lower() not in self._exts:
raise NotImplementedError('format {} not supported currently.".format(file_format)')| for folder in sorted(os.listdir(root)): | ||
| path = os.path.join(root, folder) | ||
| if not os.path.isdir(path): | ||
| warnings.warn('Ignoring %s, which is not a directory.'%path, stacklevel=3) |
There was a problem hiding this comment.
nit: unify warning formatting style: either % or .foramt()
| with open("./synset.txt", "w") as synsets_file: | ||
| for item in self.synsets: | ||
| synsets_file.write(item+os.linesep) | ||
| print("Synsets is generated as synset.txt") |
There was a problem hiding this comment.
nit: one more space between generated and as
| self._label = nd.array(label_tmp) | ||
| for i, _ in enumerate(data_tmp): | ||
| if self._format not in data_tmp[i]: | ||
| self.items.append((data_tmp[i]+self._format, self._label[i])) |
There was a problem hiding this comment.
can we do the self.items.append in line 114 to reduce the # of for-loop?
There was a problem hiding this comment.
Yes. Thanks.
example/gluon/urban_sounds/train.py
Outdated
| def evaluate_accuracy(data_iterator, net): | ||
| """Function to evaluate accuracy of any data iterator passed to it as an argument""" | ||
| acc = mx.metric.Accuracy() | ||
| for _, (data, label) in enumerate(data_iterator): |
There was a problem hiding this comment.
if we don't use the index here why not just use regular for loop?
There was a problem hiding this comment.
Yes! Refactored the code.
example/gluon/urban_sounds/train.py
Outdated
|
|
||
| for e in range(epochs): | ||
| cumulative_loss = 0 | ||
| for _, (data, label) in enumerate(audio_train_loader): |
There was a problem hiding this comment.
Removed the variable.
| y = x.asnumpy() | ||
| else: | ||
| warnings.warn("MFCC - allowed datatypes mx.nd.NDArray and numpy.ndarray") | ||
| return x |
There was a problem hiding this comment.
the return value is not what user expects, raise exception would be better here IMO
There was a problem hiding this comment.
Yes. I agree. However, other frameworks also return the unmodified value in these cases instead of raising error notifying them with a warning. Users can then use some other libraries to extract mfcc. This is the rationale behind this implementation.
|
|
||
| def forward(self, x): | ||
| if isinstance(x, np.ndarray): | ||
| return nd.array(x/self.scale_factor) |
There was a problem hiding this comment.
check self.scale_factor is not zero
There was a problem hiding this comment.
Thanks. Added that test.
| root/drilling/26.wav | ||
| root/dog_barking/42.wav | ||
| OR | ||
| Files(wav) and a csv file that has filename and associated label |
| Parameters | ||
| ---------- | ||
| fn : callable | ||
| A transformer function that takes the first elemtn of a sample |
| max_len : int | ||
| Length to which the array will be padded or trimmed to. | ||
| fill_value: int or float | ||
| If there is a need of padding, what value to padd at the end of the input array |
| sampling_rate: int, default 22050 | ||
| sampling rate of the input audio signal | ||
| num_fft: int, default 2048 | ||
| length of the Fast fourier transform window |
sandeep-krishnamurthy
left a comment
There was a problem hiding this comment.
Thanks for your contributions.
- should this be under folder - example/gluon/audio/urban_sounds ?
- Please run lint checks.
- requirements.txt will be useful.
- prepare_data.py will be useful.
example/gluon/urban_sounds/README.md
Outdated
| To be able to run this example: | ||
|
|
||
| 1. Download the dataset(train.zip, test.zip) required for this example from the location: | ||
| **https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU** |
There was a problem hiding this comment.
nit: Why bold? Should be hyperlinks only?
example/gluon/urban_sounds/README.md
Outdated
| **https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU** | ||
|
|
||
|
|
||
| 2. Extract both the zip archives into the **current directory** - after unzipping you would get 2 new folders namely,\ |
There was a problem hiding this comment.
can we provide simple python script to download and prepare data? something like prepare_dataset.py that users can run as 1st step.
Less the manual step the better.
example/gluon/urban_sounds/README.md
Outdated
|
|
||
| 3. Apache MXNet is installed on the machine. For instructions, go to the link: **https://mxnet.incubator.apache.org/install/** | ||
|
|
||
| 4. Librosa is installed. To install, use the commands |
There was a problem hiding this comment.
Should we add a note to say, against which version of Librosa, this example is tested and working fine.
There was a problem hiding this comment.
Added the version in the README as well as in requirements.txt.
| try: | ||
| import librosa | ||
| except ImportError as e: | ||
| warnings.warn("librosa dependency could not be resolved or \ |
There was a problem hiding this comment.
This will continue execution after printing warning. Is this intended?
There was a problem hiding this comment.
Raise an ImportError here
There was a problem hiding this comment.
Raised an Import error with a warning.
| Attributes | ||
| ---------- | ||
| synsets : list | ||
| List of class names. `synsets[i]` is the name for the integer label `i` |
There was a problem hiding this comment.
Addressed this.
| """ | ||
| self.synsets = [] | ||
| self.items = [] | ||
| if self._train_csv is None: |
There was a problem hiding this comment.
Please add code comments for each of this section or better have separate functions to handled if csv is provided or folder.
There was a problem hiding this comment.
if not self._train_csv ?
There was a problem hiding this comment.
Refactored code to handle the logic in separate functions.
example/gluon/urban_sounds/model.py
Outdated
| net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes) | ||
| net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer | ||
| net.add(gluon.nn.Dense(num_labels)) | ||
| net.collect_params().initialize(mx.init.Normal(1.)) |
There was a problem hiding this comment.
xavier is better? or doesn't matter?
There was a problem hiding this comment.
Changed to Xavier.
| if __name__ == '__main__': | ||
| try: | ||
| import argparse | ||
| parser = argparse.ArgumentParser(description="Urban Sounds clsssification example - MXNet") |
There was a problem hiding this comment.
Thanks. Corrected this.
| pred_dir = args.pred | ||
|
|
||
| except ImportError: | ||
| warnings.warn("Argparse module not installed! passing default arguments.") |
There was a problem hiding this comment.
It would be good, if you can add requirements.txt file in this example folder. Have a step in readme to install pre-requisites using this requirements.txt
There was a problem hiding this comment.
you could also provide a setup script to install all dependencies
There was a problem hiding this comment.
Good point. I have added a requirements.txt file and a step in README for installing pre-requisites.
| return x / self.scale_factor | ||
|
|
||
|
|
||
| class PadTrim(Block): |
There was a problem hiding this comment.
This looks like a generally useful transforms. Post this PR, can you please see if this can be part of gluon transforms?
There was a problem hiding this comment.
Yes. Thanks.
| if skip_header: | ||
| skip_rows = 1 | ||
| else: | ||
| skip_rows = 0 |
There was a problem hiding this comment.
maybe
skip_rows = 0
if skip_header:
skip_rows = 1
?
| try: | ||
| import librosa | ||
| except ImportError as e: | ||
| warnings.warn("librosa dependency could not be resolved or \ |
There was a problem hiding this comment.
Raise an ImportError here
| """ | ||
| self.synsets = [] | ||
| self.items = [] | ||
| if self._train_csv is None: |
There was a problem hiding this comment.
if not self._train_csv ?
| for line in traincsv: | ||
| skipped_rows = skipped_rows + 1 | ||
| if skipped_rows <= skip_rows: | ||
| continue |
There was a problem hiding this comment.
for skipping multiple rows in csv, could you explore https://python-forum.io/Thread-How-to-Loop-CSV-File-Beginning-at-Specific-Row?pid=29676#pid29676 or https://stackoverflow.com/questions/40403971/skip-multiple-rows-in-python ?
There was a problem hiding this comment.
Yes. Good point Vandana. Made use of itertools and csv modules to start read at a particular row.
| def __getitem__(self, idx): | ||
| """Retrieve the item (data, label) stored at idx in items""" | ||
| filename = self.items[idx][0] | ||
| label = self.items[idx][1] |
There was a problem hiding this comment.
filename, label = self.items[idx]
| pred_dir = args.pred | ||
|
|
||
| except ImportError: | ||
| warnings.warn("Argparse module not installed! passing default arguments.") |
There was a problem hiding this comment.
you could also provide a setup script to install all dependencies
example/gluon/urban_sounds/train.py
Outdated
| if args.train: | ||
| training_dir = args.train | ||
| else: | ||
| training_dir = './Train' |
There was a problem hiding this comment.
training_dir = './Train'
...
...
try:
...
...
if args:
if args.train:
training_dir = args.train
similar for the ones below
example/gluon/urban_sounds/train.py
Outdated
| training_dir = './Train' | ||
| training_csv = './train.csv' | ||
| eps = 30 | ||
| batch_sz = 32 |
There was a problem hiding this comment.
move this initialization before try, then it wont be required inside try or except.
There was a problem hiding this comment.
Moved the initialization block above try catch blocks.
|
|
||
| def forward(self, x): | ||
| if isinstance(x, np.ndarray): | ||
| y = x |
There was a problem hiding this comment.
what is the default value for y?
There was a problem hiding this comment.
Since, the argument x could be of numpy.ndarray or mxnet.nd.NDArray types, I do a conversion to numpy types as librosa needs numpy argument. So y is unchanged if it is of type numpy else it is converted into numpy before being passed for MFCC feature extraction.
| specs = librosa.feature.melspectrogram(x, sr=self._sampling_rate,\ | ||
| n_fft=self._num_fft, n_mels=self._num_mels, hop_length=self._hop_length) | ||
| return nd.array(specs) | ||
|
No newline at end of file |
stu1130
left a comment
There was a problem hiding this comment.
some minor changes
The rest LGTM
| fn : callable | ||
| A transformer function that takes the first element of a sample | ||
| as input and returns the transformed element. | ||
| lazy : bool, default True |
There was a problem hiding this comment.
Thanks. corrected this.
| The transformed dataset. | ||
|
|
||
| """ | ||
| return super(AudioFolderDataset, self).transform_first(fn, lazy=False) |
There was a problem hiding this comment.
| return super(AudioFolderDataset, self).transform_first(fn, lazy=False) | |
| return super(AudioFolderDataset, self).transform_first(fn, lazy=lazy) |
There was a problem hiding this comment.
Passing lazy from the arguments now.
example/gluon/urban_sounds/train.py
Outdated
|
|
||
|
|
||
| def train(train_dir=None, train_csv=None, epochs=30, batch_size=32): | ||
| """The function responsible for running the training the model.""" |
There was a problem hiding this comment.
nit: unify the docstring style
| """The function responsible for running the training the model.""" | |
| """Function responsible for running the training the model.""" |
There was a problem hiding this comment.
Corrected this.
example/gluon/urban_sounds/train.py
Outdated
|
|
||
| if e%5 == 0: | ||
| train_accuracy = evaluate_accuracy(audio_train_loader, net) | ||
| print("Epoch %s. Loss: %s Train accuracy : %s " % (e, cumulative_loss/num_examples, train_accuracy)) |
There was a problem hiding this comment.
nit: unify the print style
| print("Epoch %s. Loss: %s Train accuracy : %s " % (e, cumulative_loss/num_examples, train_accuracy)) | |
| print("Epoch {}. Loss: {} Train accuracy : {} ".format(e, cumulative_loss/ num_examples, train_accuracy)) |
kalyc
left a comment
There was a problem hiding this comment.
Added minor comments inline - can you add details about where has this example been tested?
example/gluon/urban_sounds/README.md
Outdated
| The details of the dataset and the link to download it are given below: | ||
|
|
||
|
|
||
| Urban Sounds Dataset: |
| ## Description | ||
| The dataset contains 8732 wav files which are audio samples(<= 4s)) of street sounds like engine_idling, car_horn, children_playing, dog_barking and so on. | ||
| The task is to classify these audio samples into one of the 10 labels. | ||
|
|
There was a problem hiding this comment.
nit: please add list of available labels here as well
example/gluon/urban_sounds/README.md
Outdated
|
|
||
| To be able to run this example: | ||
|
|
||
| 1. `pip install -r ./requirements.txt` |
There was a problem hiding this comment.
Would mention that we need to go back to the directory: cd ../
would just leave it to pip install requirements.txt
| *https://librosa.github.io/librosa/install.html* | ||
|
|
||
| 2. Download the dataset(train.zip, test.zip) required for this example from the location: | ||
| https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU |
There was a problem hiding this comment.
Could you move this to a public S3 bucket instead?
There was a problem hiding this comment.
You can use https://registry.opendata.aws/ this page to onboard your dataset onto a public S3 bucket - I would highly recommend doing so as the 1. Google drive link is external and we don't use it to store data in production
2. S3 buckets are more reliable and the example would not have any issues related to data availability in the future
|
|
||
| 3. Extract both the zip archives into the **current directory** - after unzipping you would get 2 new folders namely,\ | ||
| **Train** and **Test** and two csv files - **train.csv**, **test.csv** | ||
|
|
There was a problem hiding this comment.
please add comment about how the folder structure should look like for more clarity
example/gluon/urban_sounds/train.py
Outdated
| if __name__ == '__main__': | ||
| training_dir = './Train' | ||
| training_csv = './train.csv' | ||
| eps = 30 |
example/gluon/urban_sounds/train.py
Outdated
| training_dir = './Train' | ||
| training_csv = './train.csv' | ||
| eps = 30 | ||
| batch_sz = 32 |
| x = x.asnumpy() | ||
| specs = librosa.feature.melspectrogram(x, sr=self._sampling_rate,\ | ||
| n_fft=self._num_fft, n_mels=self._num_mels, hop_length=self._hop_length) | ||
| return nd.array(specs) |
There was a problem hiding this comment.
add blank line at the end of every python file
| return x | ||
|
|
||
|
|
||
| class MEL(Block): |
There was a problem hiding this comment.
please use more verbose name here - what does MEL stand for?
There was a problem hiding this comment.
The name MEL refers to a MEL scale frequently used with audio data(pitches and frequencies of audio signal). This isnt an abbreviation. And also, the idea was to have similar names as they appear in other framework so that people can intuitively port.
|
|
||
| # coding: utf-8 | ||
| # pylint: disable= arguments-differ | ||
| "Audio transforms." |
roywei
left a comment
There was a problem hiding this comment.
Thanks for the contribution! LGTM!
example/gluon/urban_sounds/README.md
Outdated
| @@ -0,0 +1,101 @@ | |||
| # Urban Sounds classification in MXNet | |||
sandeep-krishnamurthy
left a comment
There was a problem hiding this comment.
Thank you for this example, and patiently responding to all feedback comments.
LGTM.
This will be merged after CI pipeline is Green.
|
@marcoabreu Hi Marco! Can this PR be merged, if the required tests are passing, even if the new test steps are running? Thanks! |
|
Hey, yeah feel free to move ahead. The new statuses are equivalent to the
old big pipeline :)
Am Fr., 30. Nov. 2018, 20:33 hat Gaurav Gireesh <notifications@github.com>
geschrieben:
… @marcoabreu <https://github.com/marcoabreu> Hi Marco! Can this PR be
merged, if the required tests are passing, even if the new test steps are
running? Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13325 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARxB6_LLj3mI2tWiothGhS0lxIBZIeueks5u0Yf_gaJpZM4YqOjQ>
.
|
|
@mxnet-label-bot remove [pr-awaiting-merge] |
…ile (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test #13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN
Description
Contribute an Example for performing a task with Audio data that demonstrates the following abilities:
Note : This example uses AudioFolderDataset and applies transforms to extract features from audio file and does a classification task. Current design of the AudioFolderDataset is below:
https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments
Testing