Conversation
zhreshold
left a comment
There was a problem hiding this comment.
So it will support only 3D to 5D? What's the limitation here?
|
looks good |
|
There is a bug when the dimension of inputs is 2. |
…-mxnet into support_5d_sync_batchnorm
|
There seems to be a bug in SyncBatchNorm, when spatial_shape is 1x1, 1xn or nx1. I am checking it. |
|
It seems that the bug has been addressed, although I do not know the specific reason yet. I will add a test for multi-output. |
|
@zhreshold @szha Hi! I have updated the PR and add adquate unittests. |
|
If @szha has no complaint, I can merge it in 24hr |
|
thanks @wkcn , this is merged! |
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
| input2grad.asnumpy(), atol=atol, rtol=rtol) | ||
|
|
||
| cfgs = [(1, False)] | ||
| num_gpus = mx.context.num_gpus() |
There was a problem hiding this comment.
This line requires having GPU when CUDA is installed, or it would throw this error:
======================================================================
ERROR: test_gluon.test_sync_batchnorm
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/dmlc/mxnet-distro/mxnet-build/tests/python/unittest/common.py", line 177, in test_new
orig_test(*args, **kwargs)
File "/home/travis/build/dmlc/mxnet-distro/mxnet-build/tests/python/unittest/test_gluon.py", line 693, in test_sync_batchnorm
num_gpus = mx.context.num_gpus()
File "/home/travis/build/dmlc/mxnet-distro/mxnet/context.py", line 258, in num_gpus
check_call(_LIB.MXGetGPUCount(ctypes.byref(count)))
File "/home/travis/build/dmlc/mxnet-distro/mxnet/base.py", line 254, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [11:47:54] include/mxnet/base.h:427: Check failed: e == cudaSuccess (30 vs. 0) : CUDA: unknown error
Stack trace:
[bt] (0) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(+0x4b60fb) [0x7f8d608830fb]
[bt] (1) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(+0x2440eec) [0x7f8d6280deec]
[bt] (2) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(MXGetGPUCount+0x19) [0x7f8d6280df79]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f8d9a2e1c7c]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f8d9a2e15ac]
[bt] (5) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f8d9a4f85fe]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f8d9a4f9f9e]
[bt] (7) /usr/bin/python(PyEval_EvalFrameEx+0x965) [0x4c84a5]
[bt] (8) /usr/bin/python(PyEval_EvalCodeEx+0x2ac) [0x4cfedc]
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1179889124 to reproduce.
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Can you please move this test to tests/python/gpu/test_gluon_contrib_gpu.py? @wkcn @zhreshold
There was a problem hiding this comment.
I don't know why a unknown CUDA error was raised.
https://github.com/apache/incubator-mxnet/blob/master/include/mxnet/base.h#L424
There was a problem hiding this comment.
I was testing it on a platform without GPU, with CUDA installed. In any case, the test seems misplaced.
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
Description
Hi! there.
Currently, SyncBatchNorm doesn't support 5+D input.
In this PR, I fix it.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
keyfor SyncBatchNormtest_sync_batchnormtests/python/gpu/test_gluon_gpu.pyComments