Parquet writer dictionary encoding refactor#8476
Parquet writer dictionary encoding refactor#8476rapids-bot[bot] merged 76 commits intorapidsai:branch-21.10from
Conversation
cpp/src/io/parquet/chunk_dict.cu
Outdated
| auto t = threadIdx.x; | ||
|
|
||
| size_type start_row = block_x * 5000; | ||
| size_type end_row = min(start_row + 5000, num_rows); |
There was a problem hiding this comment.
Then we need to explicitly add the prefix cuda::min or something similar to that?
|
|
||
| void InitializeChunkHashMaps(device_span<EncColumnChunk> chunks, rmm::cuda_stream_view stream) | ||
| { | ||
| constexpr int block_size = 1024; |
There was a problem hiding this comment.
Use int32_t instead, for clarification.
There was a problem hiding this comment.
I believe kernel launch parameters are not sensitive to bit size.
|
How does this fix #8890? |
The problem was that we allocated 256 kb as scratch space for every chunk even if the chunk contained much less values. Now we only allocate any temp memory proportional to data size. There was one instance in this PR where we still allocated 256kb per chunk but that has been limited to min(256kb, chunk size). |
vuule
left a comment
There was a problem hiding this comment.
Partial review, still not understanding the kernels that well.
Causing a problem where early return would result in num_dict_entries == 65536 but that meant that the dict bit calc logic would think it's ok to use dict as 65536 fits in 16 bits
There was a problem hiding this comment.
Looks good. Note that there are still magic numbers which can be replaced by constexpr variables:
- Magic numbers should always be avoided
- Any
constexprvalue replacing a magic number should be associated with a full comment/doxygen clarifying why it has such value
vuule
left a comment
There was a problem hiding this comment.
Just a formatting error, otherwise good to go.
Co-authored-by: Vukasin Milovanovic <vukasin.milovanovic.87@gmail.com>
|
@gpucibot merge |
Replaces previous parquet dictionary encoding code with one that uses
cuCollections' static map.Adds
cuCollectionstolibcudfCloses #7873
Fixes #8890
Currently blocked on Pascal support for static_map in cuCollections
(More details to be added)