[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel#15048
[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel#15048apeforest merged 1 commit intoapache:masterfrom
Conversation
|
@mxnet-label-bot add [pr-awaiting-review] |
|
@apeforest : Please review |
tests/nightly/test_large_array.py
Outdated
| assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | ||
|
|
||
|
|
||
| def test_ravel_and_unravel(): |
There was a problem hiding this comment.
Its more readable and understandable this way. Also I have split it into 2 functions inside.
tests/nightly/test_large_array.py
Outdated
|
|
||
| def test_unravel_index(): | ||
| original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) | ||
| assert (original_2d_idxs.asnumpy() == np.array(idxs_2d)).all() == True |
7c915e3 to
e2970e9
Compare
tests/nightly/test_large_array.py
Outdated
|
|
||
|
|
||
| def test_ravel_and_unravel(): | ||
| idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
Please use C++ tests with mocking of the memory allocation instead of actually allocating memory.
Although this is nightly, I don't think it's necessary for these kind of tests to actually allocate real memory.
There was a problem hiding this comment.
We can add C++ unit test to mock the memory allocation. However, I thought the nightly test is used for end-to-end testing purpose to mimic the real deployment situation, or is it not?
There was a problem hiding this comment.
They are to some extent, but in same cases the cost vs return ratio is not that good. While smoke tests are certainly necessary in some cases, resource intensive ones on the other hand can usually be abstracted away. If we assume that memory access on the system side is properly working and only mock this very laster layer, this result should be pretty much as meaningful as running it end to end.
There was a problem hiding this comment.
@marcoabreu : I didn't understand your last comment. What did you mean by "very laster layer" ? All these tests are single operator tests. This tests I combined because its easier to test when combined together
There was a problem hiding this comment.
Hi @marcoabreu, thanks a lot for your suggestion and it's a very good learning for me too. The large tensor support is crucial to DGL (Deep Graph Library) that is using MXNet and planning to release with support of large tensors. Without this test, we cannot make this feature production ready.
While I appreciate the direction of testing you suggested and definitely would like to explore it further, I am also afraid that not adding tests before we implement the approach you suggested will block the release of DGL and other MXNet projects that depend on large tensors as well.
Can we have a seperate project to address the test efficiency (we may need some expertise from CI team for this project) while moving on with the large tensor project? Please let me know if you have other concerns or advice. Thank you
There was a problem hiding this comment.
@marcoabreu : We have to test cases where we have to allocate large tensors on actual physical memory. That is the whole purpose of adding these tests. Without which we cannot support or even guarantee large dense tensor support on MXNet.
The purpose of these tests are not at all to test only Sparse. Currently we are not addressing performance issues with large dense arrays but trying to support them. From your reply it seems like you are under the impression that everything should work fine if tensor size >2^32 elements(Please correct me if I am wrong). But that's not the case and the operator simply segfaults or gives unpredictable behavior. That is why is necessary to test them with such large arrays/tensors.
There was a problem hiding this comment.
@marcoabreu : Let me know if you have other unanswered queries that will help you understand the use case better and unblock this PR.
There was a problem hiding this comment.
Hey,
I understand the importance of large tensors and I'm really looking forward to them as well. Don't get me wrong, I'm not underestimating the work that comes with supporting them.
We already had a history of issues that were caused by too large allocations. Thus, for the sake of keeping CI stable (independent on whether we're running on per-PR or nightly) and to adhere to a high test standard, I'd prefer if we would not delay adressing the test strategy.
I think there's a misunderstanding. With my proposed approach, you would still be able to make a full test of a single operator. But instead of randomly generating the input tensor, you would use a generator model that gives you predictable and procedually generatable input tensor instead of a large and randomly generated one.
The idea here would then be to give this procedually generated input tensor to the operator and since it's a predicutable input, you can also have a predictable output that you can verify. On the way, the operator will run as usual, it's just that the source of the data is not physical memory but generated values.
The only case that wouldn't be supported is the chaining of operators since that would require storing the intermediary values.
The idea here is that (in whatever fashion you'd like) we generate fake input and output tensors that return predictable and valid data, but doesn't actually consume memory on the RAM. From MXNets perspective, it shouldn't matter where the real values is coming from. For the sake of an operator, it's important that it actually retrieves proper values and that would be the case here.
There was a problem hiding this comment.
@marcoabreu I still don't understand how will I know that on actual large tensor after memory allocation would it work as expected or not. Procedurally generating the values will indeed give me random values on random locations that are indexed at locations > 2^32. But the input should come from RAM. it does matter. It needs to be tested on actual physical memory so there are no surprises. If nighly tests are not feasible do we have a weekly test where we can run this ?
tests/nightly/test_large_array.py
Outdated
|
|
||
|
|
||
| def test_ravel_and_unravel(): | ||
| idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
nit: space between operators and operands
tests/nightly/test_large_array.py
Outdated
| idx = mx.nd.array(1) | ||
| def test_ravel_multi_index(): | ||
| idx = mx.nd.ravel_multi_index(mx.nd.array(idxs_2d, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) | ||
| idx_numpy = np.ravel_multi_index(idxs_2d, (LARGE_X,SMALL_Y)) |
There was a problem hiding this comment.
nit: space. btw, do you see any pylint warning out of these?
tests/nightly/test_large_array.py
Outdated
| assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3 | ||
|
|
||
| def test_unravel_index(): | ||
| original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) |
tests/nightly/test_large_array.py
Outdated
|
|
||
|
|
||
| def test_ravel_and_unravel(): | ||
| idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
nit: do you mean indices_2d?
There was a problem hiding this comment.
yes . Though make pylint gives this output.
ubuntu@ip-172-31-82-110 ~/mxnet3 (master) $ make pylint
Makefile:345: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages
python3 -m pylint --rcfile=/home/ubuntu/mxnet3/ci/other/pylintrc --ignore-patterns="..so$,..dll$,..dylib$" python/mxnet tools/caffe_converter/.py
Using config file /home/ubuntu/mxnet3/ci/other/pylintrc
Your code has been rated at 10.00/10
I will still fix it. Dunno why pylint isn't catching it.
There was a problem hiding this comment.
got it .... mxnet makefile doesn't do pylint on tests/nightly and tests/python/unittest ... added manually to local repo. Thanks for the advice !
|
|
||
| import mxnet as mx | ||
| import numpy as np | ||
| import mxnet as mx |
There was a problem hiding this comment.
Suggested reordering of imports by pylint
| assert np.sum(res[-1].asnumpy() == 1000) == a.shape[1] | ||
|
|
||
|
|
||
| def test_take(): |
There was a problem hiding this comment.
This was duplicate
| idx_numpy = np.ravel_multi_index(indices_2d, (LARGE_X, SMALL_Y)) | ||
| assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3 | ||
|
|
||
|
|
marcoabreu
left a comment
There was a problem hiding this comment.
I just talked to Lin and we agreed on merging the end to end tests for now. The condition is that they have to be replaced by 15th of July or the next minor release that includes large tensor support; whichever comes first.
tests/nightly/test_large_array.py
Outdated
| def test_diag(): | ||
| h = np.random.randint(2,9) | ||
| w = np.random.randint(2,9) | ||
| h = np.random.randint(2, 9) |
There was a problem hiding this comment.
I do not see the necessity of creating h and w variables here.
There was a problem hiding this comment.
If you decide to move forward with them, please add the with seed annotation to the test
There was a problem hiding this comment.
@apeforest Thats how the unittest for diagonal operator is implemented.
@marcoabreu Good point will add @with_seed()
tests/nightly/test_large_array.py
Outdated
|
|
||
|
|
||
| def test_ravel_multi_index(): | ||
| indices_2d = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]] |
There was a problem hiding this comment.
can we generate the random indices instead?
tests/nightly/test_large_array.py
Outdated
|
|
||
|
|
||
| def test_unravel_index(): | ||
| original_2d_indices = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]] |
There was a problem hiding this comment.
same here. can we generate random indices? you can pass in the 2d_array as input so you only keep one copy
There was a problem hiding this comment.
The input is not 2d_array, we pass indices as input. I can do following:
low_large_value = 2**32
high_large_value = 2**34
a = nd.random.randint(low_large_value, high_large_value, dtype=np.int64)
for random Large indices
Good Point
tests/nightly/test_large_array.py
Outdated
| a = mx.nd.ones(shape=(256*35, 1024*1024)) | ||
| b = mx.nd.ones(shape=(256*35,)) | ||
| res = mx.nd.pick(a,b) | ||
| b = mx.nd.ones(shape=(256*35, )) |
There was a problem hiding this comment.
nit: space around * operator
tests/nightly/test_large_array.py
Outdated
| assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | ||
|
|
||
|
|
||
| def random_2d_coordinates(x_low, x_high, y_low, y_high): |
There was a problem hiding this comment.
It seems you are re-use the existing utility method: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L410
There was a problem hiding this comment.
it is different and we need one value that to very large for tests case to be successful not any values between (1, very large value)
There was a problem hiding this comment.
will move this to test_utils.py
Description
New Ops: ravel_multi_index, unravel_index
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments