GH-47112: [Parquet][C++] Rle BitPacked parser by AntoinePrv · Pull Request #47294 · apache/arrow

AntoinePrv · 2025-08-08T13:32:25Z

Rationale for this change

What changes are included in this PR?

New independent abstractions:

A BitPackedRun to describe the encoded bytes in a bit packed run.
A minimal BitPackedDecoder that can decode this type of run (no dict/spaced methods).
A RleRun to describe the encoded value in a RLE run.
A minimal RleDecoder that can decode this type of run (no dict/spaced methods).
A RleBitPackedParser that read the encoded headers and emits different runs.

These new abstractions are then plugged into RleBitPackedDecoder (formerly RleDecode) to keep the compatibility with the rest of Arrow (improvements to using the parser independently can come in follow-up PR).

Misc changes:

Separation of LEB128 reading/writing from BitReader into a free functions, and add check for a special case for handling undefined behavior overflow.

Are these changes tested?

Yes, on top of the existing tests, many more unit tests have been added.

Are there any user-facing changes?

API changes to internal classes.

GitHub Issue: [C++][Parquet] Refactor RLE decoding by extract a RLE parser #47112

github-actions · 2025-08-08T13:32:48Z

⚠️ GitHub issue #47112 has been automatically assigned in GitHub to PR creator.

AntoinePrv · 2025-08-08T13:34:40Z

Work in progress, this is currently a split of the decoder in a parser and a decoder, but it is not plugged in.

AntoinePrv · 2025-08-26T11:48:51Z

Some benchmarks on Linux x86_64 cloud instance with 8 CPU 16 Gb memory and dependencies/compilers from Conda-Forge.

archery benchmark diff --benchmark-suite=parquet-arrow --benchmark-filter=Read --repetitions=3

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (81)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                 benchmark         baseline        contender  change %                                                                                                                                                                                                                 counters
                                     BM_ReadColumn<false,BooleanType>/-1/0  120.497 MiB/sec  172.809 MiB/sec    43.414                                        {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,BooleanType>/-1/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 68}
                                            BM_ReadStructOfStructColumn/50    1.379 GiB/sec    1.650 GiB/sec    19.712                                              {'family_index': 19, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructOfStructColumn/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 41}
                                     BM_ReadColumn<false,BooleanType>/1/20   53.301 MiB/sec   60.698 MiB/sec    13.878                                        {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,BooleanType>/1/20', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
                                                    BM_ReadStructColumn/50    1.251 GiB/sec    1.367 GiB/sec     9.297                                                      {'family_index': 18, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 75}
                                      BM_ReadColumn<true,BooleanType>/-1/1   29.265 MiB/sec   31.087 MiB/sec     6.227                                         {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,BooleanType>/-1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 16}
                                       BM_ReadColumn<false,Int32Type>/-1/1   14.031 GiB/sec   14.775 GiB/sec     5.302                                         {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 258}
                                       BM_ReadColumn<true,DoubleType>/-1/0    1.945 GiB/sec    2.048 GiB/sec     5.261                                          {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,DoubleType>/-1/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
                                        BM_ReadColumn<true,Int64Type>/-1/0    1.957 GiB/sec    2.046 GiB/sec     4.527                                           {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,Int64Type>/-1/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                     BM_ReadColumnPlain<true,Int32Type>/null_probability:0    3.356 GiB/sec    3.504 GiB/sec     4.401                        {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
                                                     BM_ReadStructColumn/0   10.081 GiB/sec   10.508 GiB/sec     4.236                                                      {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 601}
                                                BM_ReadListOfListColumn/99    1.452 GiB/sec    1.513 GiB/sec     4.181                                                 {'family_index': 23, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListOfListColumn/99', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 123}
                                      BM_ReadColumn<false,Int32Type>/-1/10   13.915 GiB/sec   14.484 GiB/sec     4.083                                        {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 255}
                                      BM_ReadColumn<false,Int64Type>/-1/50   11.318 GiB/sec   11.744 GiB/sec     3.761                                        {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 103}
          BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100    2.162 GiB/sec    2.229 GiB/sec     3.099            {'family_index': 11, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 79}
                                                BM_ReadIndividualRowGroups    3.753 GiB/sec    3.822 GiB/sec     1.818                                                  {'family_index': 24, 'per_family_instance_index': 0, 'run_name': 'BM_ReadIndividualRowGroups', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 34}
                                       BM_ReadColumn<false,Int64Type>/-1/1   11.464 GiB/sec   11.670 GiB/sec     1.799                                         {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 103}
               BM_ReadBinaryViewColumn/null_probability:0/unique_values:32    1.070 GiB/sec    1.089 GiB/sec     1.786                  {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:0/unique_values:32', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 3}
                                                      BM_ReadListColumn/99    1.734 GiB/sec    1.763 GiB/sec     1.703                                                       {'family_index': 21, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListColumn/99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 155}
               BM_ReadBinaryViewColumn/null_probability:0/unique_values:-1    1.935 GiB/sec    1.967 GiB/sec     1.654                  {'family_index': 15, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100    2.167 GiB/sec    2.203 GiB/sec     1.639  {'family_index': 13, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 78}
            BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0    1.976 GiB/sec    2.008 GiB/sec     1.606              {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 69}
                   BM_ReadColumnPlain<false,Int32Type>/null_probability:-1   14.423 GiB/sec   14.654 GiB/sec     1.605                     {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<false,Int32Type>/null_probability:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 263}
BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1   18.385 GiB/sec   18.650 GiB/sec     1.444 {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 662}
                                                  BM_ReadMultipleRowGroups    3.706 GiB/sec    3.752 GiB/sec     1.248                                                    {'family_index': 25, 'per_family_instance_index': 0, 'run_name': 'BM_ReadMultipleRowGroups', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 33}
                                             BM_ReadStructOfStructColumn/0    8.740 GiB/sec    8.838 GiB/sec     1.118                                              {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructOfStructColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 252}
                                         BM_ReadColumn<true,Int32Type>/0/1    3.461 GiB/sec    3.498 GiB/sec     1.072                                            {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,Int32Type>/0/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 60}
                                              BM_ReadStructOfListColumn/99    1.130 GiB/sec    1.141 GiB/sec     0.995                                                {'family_index': 20, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructOfListColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 65}
                                         BM_ReadMultipleRowGroupsGenerator    3.724 GiB/sec    3.742 GiB/sec     0.482                                           {'family_index': 26, 'per_family_instance_index': 0, 'run_name': 'BM_ReadMultipleRowGroupsGenerator', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 33}
              BM_ReadBinaryViewColumn/null_probability:50/unique_values:-1    1.055 GiB/sec    1.059 GiB/sec     0.324                 {'family_index': 15, 'per_family_instance_index': 6, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
                                              BM_ReadStructOfListColumn/50  528.189 MiB/sec  529.803 MiB/sec     0.306                                                {'family_index': 20, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructOfListColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 32}
    BM_ReadBinaryColumnDeltaByteArray/null_probability:50/unique_values:-1  778.888 MiB/sec  780.710 MiB/sec     0.234       {'family_index': 16, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
                                       BM_ReadColumn<true,Int64Type>/50/50    2.109 GiB/sec    2.113 GiB/sec     0.203                                          {'family_index': 3, 'per_family_instance_index': 8, 'run_name': 'BM_ReadColumn<true,Int64Type>/50/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
BM_ReadBinaryViewColumnDeltaByteArray/null_probability:50/unique_values:-1    1.040 GiB/sec    1.041 GiB/sec     0.164   {'family_index': 17, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 4}
 BM_ReadBinaryViewColumnDeltaByteArray/null_probability:1/unique_values:-1    1.725 GiB/sec    1.724 GiB/sec    -0.069    {'family_index': 17, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5}
                                      BM_ReadColumn<false,Int32Type>/-1/50   14.614 GiB/sec   14.597 GiB/sec    -0.116                                        {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 264}
                                        BM_ReadColumn<true,Int64Type>/50/1    2.116 GiB/sec    2.112 GiB/sec    -0.186                                           {'family_index': 3, 'per_family_instance_index': 9, 'run_name': 'BM_ReadColumn<true,Int64Type>/50/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
                   BM_ReadColumnPlain<true,Int32Type>/null_probability:100    4.013 GiB/sec    4.004 GiB/sec    -0.233                      {'family_index': 9, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:100', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 69}
     BM_ReadBinaryColumnDeltaByteArray/null_probability:0/unique_values:-1    1.298 GiB/sec    1.294 GiB/sec    -0.247        {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
                                        BM_ReadColumn<true,Int32Type>/-1/0    1.086 GiB/sec    1.083 GiB/sec    -0.279                                           {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,Int32Type>/-1/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
                                              BM_ReadListOfStructColumn/50  656.332 MiB/sec  654.402 MiB/sec    -0.294                                                {'family_index': 22, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListOfStructColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 38}
          BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1   18.674 GiB/sec   18.573 GiB/sec    -0.539           {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 633}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:50    1.129 GiB/sec    1.123 GiB/sec    -0.566                       {'family_index': 9, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 20}
                                       BM_ReadColumn<true,Int32Type>/50/50    1.128 GiB/sec    1.122 GiB/sec    -0.571                                          {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BM_ReadColumn<true,Int32Type>/50/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 20}
                                        BM_ReadColumn<true,Int32Type>/50/0    1.135 GiB/sec    1.127 GiB/sec    -0.710                                           {'family_index': 1, 'per_family_instance_index': 6, 'run_name': 'BM_ReadColumn<true,Int32Type>/50/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
     BM_ReadBinaryColumnDeltaByteArray/null_probability:1/unique_values:-1    1.221 GiB/sec    1.212 GiB/sec    -0.749        {'family_index': 16, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
  BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0    2.104 GiB/sec    2.085 GiB/sec    -0.881    {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 74}
                  BM_ReadBinaryColumn/null_probability:50/unique_values:-1  794.836 MiB/sec  787.189 MiB/sec    -0.962                     {'family_index': 14, 'per_family_instance_index': 6, 'run_name': 'BM_ReadBinaryColumn/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
 BM_ReadBinaryViewColumnDeltaByteArray/null_probability:0/unique_values:-1    1.929 GiB/sec    1.910 GiB/sec    -0.969    {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
                                      BM_ReadColumn<true,BooleanType>/5/10   30.798 MiB/sec   30.487 MiB/sec    -1.010                                         {'family_index': 7, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,BooleanType>/5/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 17}
                   BM_ReadBinaryColumn/null_probability:1/unique_values:-1    1.216 GiB/sec    1.203 GiB/sec    -1.018                      {'family_index': 14, 'per_family_instance_index': 5, 'run_name': 'BM_ReadBinaryColumn/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
                                      BM_ReadColumn<false,Int64Type>/-1/10   11.593 GiB/sec   11.459 GiB/sec    -1.161                                        {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/10', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 105}
               BM_ReadBinaryViewColumn/null_probability:1/unique_values:-1    1.736 GiB/sec    1.716 GiB/sec    -1.174                  {'family_index': 15, 'per_family_instance_index': 5, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 5}
               BM_ReadBinaryViewColumn/null_probability:1/unique_values:32 1023.246 MiB/sec 1009.122 MiB/sec    -1.380                  {'family_index': 15, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:1/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 3}
              BM_ReadBinaryViewColumn/null_probability:50/unique_values:32  857.872 MiB/sec  845.888 MiB/sec    -1.397                 {'family_index': 15, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:50/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
                  BM_ReadBinaryColumn/null_probability:50/unique_values:32  567.955 MiB/sec  559.978 MiB/sec    -1.404                     {'family_index': 14, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryColumn/null_probability:50/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50  614.041 MiB/sec  603.984 MiB/sec    -1.638             {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 22}
                                       BM_ReadColumn<true,Int64Type>/45/25    2.107 GiB/sec    2.073 GiB/sec    -1.641                                          {'family_index': 3, 'per_family_instance_index': 7, 'run_name': 'BM_ReadColumn<true,Int64Type>/45/25', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 19}
              BM_ReadBinaryViewColumn/null_probability:99/unique_values:-1    1.397 GiB/sec    1.372 GiB/sec    -1.783                {'family_index': 15, 'per_family_instance_index': 7, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
                  BM_ReadBinaryColumn/null_probability:99/unique_values:-1  741.638 MiB/sec  728.032 MiB/sec    -1.835                    {'family_index': 14, 'per_family_instance_index': 7, 'run_name': 'BM_ReadBinaryColumn/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                   BM_ReadBinaryColumn/null_probability:0/unique_values:32  754.669 MiB/sec  740.571 MiB/sec    -1.868                      {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryColumn/null_probability:0/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 3}
                                      BM_ReadColumn<false,DoubleType>/-1/0   11.838 GiB/sec   11.607 GiB/sec    -1.946                                        {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,DoubleType>/-1/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 104}
                                                      BM_ReadListColumn/50  718.113 MiB/sec  704.088 MiB/sec    -1.953                                                        {'family_index': 21, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListColumn/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 62}
                                       BM_ReadColumn<true,Int64Type>/35/10    2.061 GiB/sec    2.020 GiB/sec    -1.983                                          {'family_index': 3, 'per_family_instance_index': 6, 'run_name': 'BM_ReadColumn<true,Int64Type>/35/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
              BM_ReadBinaryViewColumn/null_probability:99/unique_values:32    1.370 GiB/sec    1.342 GiB/sec    -2.059                {'family_index': 15, 'per_family_instance_index': 4, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:99/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                                              BM_ReadListOfStructColumn/99    1.339 GiB/sec    1.311 GiB/sec    -2.086                                                {'family_index': 22, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListOfStructColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 74}
                                                BM_ReadListOfListColumn/50  631.471 MiB/sec  618.070 MiB/sec    -2.122                                                  {'family_index': 23, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListOfListColumn/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 55}
    BM_ReadBinaryColumnDeltaByteArray/null_probability:99/unique_values:-1  743.722 MiB/sec  723.739 MiB/sec    -2.687      {'family_index': 16, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 12}
                   BM_ReadBinaryColumn/null_probability:1/unique_values:32  718.629 MiB/sec  698.738 MiB/sec    -2.768                      {'family_index': 14, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryColumn/null_probability:1/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 3}
                                       BM_ReadColumn<true,Int64Type>/30/10    2.033 GiB/sec    1.974 GiB/sec    -2.888                                          {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BM_ReadColumn<true,Int64Type>/30/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
                  BM_ReadBinaryColumn/null_probability:99/unique_values:32  729.705 MiB/sec  707.958 MiB/sec    -2.980                    {'family_index': 14, 'per_family_instance_index': 4, 'run_name': 'BM_ReadBinaryColumn/null_probability:99/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 13}
                   BM_ReadBinaryColumn/null_probability:0/unique_values:-1    1.327 GiB/sec    1.285 GiB/sec    -3.171                      {'family_index': 14, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryColumn/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
BM_ReadBinaryViewColumnDeltaByteArray/null_probability:99/unique_values:-1    1.398 GiB/sec    1.353 GiB/sec    -3.218  {'family_index': 17, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                                      BM_ReadColumn<true,DoubleType>/25/25    1.972 GiB/sec    1.891 GiB/sec    -4.113                                         {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,DoubleType>/25/25', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 18}
                                        BM_ReadColumn<true,Int32Type>/25/5    1.060 GiB/sec    1.015 GiB/sec    -4.257                                           {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumn<true,Int32Type>/25/5', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 19}
 BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50  615.394 MiB/sec  588.496 MiB/sec    -4.371   {'family_index': 13, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
                                                 BM_ReadListOfListColumn/1  907.752 MiB/sec  866.414 MiB/sec    -4.554                                                   {'family_index': 23, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListOfListColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 79}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:99    2.424 GiB/sec    2.314 GiB/sec    -4.566                       {'family_index': 9, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 40}
                                       BM_ReadColumn<true,Int64Type>/25/10    1.986 GiB/sec    1.893 GiB/sec    -4.669                                          {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumn<true,Int64Type>/25/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 18}
                                         BM_ReadColumn<true,Int64Type>/1/1    3.363 GiB/sec    3.201 GiB/sec    -4.798                                            {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,Int64Type>/1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 31}
                                        BM_ReadColumn<true,Int64Type>/75/1    2.404 GiB/sec    2.287 GiB/sec    -4.859                                          {'family_index': 3, 'per_family_instance_index': 10, 'run_name': 'BM_ReadColumn<true,Int64Type>/75/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 22}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99    1.297 GiB/sec    1.233 GiB/sec    -4.942             {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 47}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (25)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                benchmark        baseline       contender  change %                                                                                                                                                                                                               counters
                                    BM_ReadColumn<false,DoubleType>/-1/20  12.151 GiB/sec  11.520 GiB/sec    -5.190                                      {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,DoubleType>/-1/20', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 99}
                                              BM_ReadStructOfListColumn/0 835.554 MiB/sec 790.684 MiB/sec    -5.370                                               {'family_index': 20, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructOfListColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 50}
                                       BM_ReadColumn<true,Int64Type>/99/0   4.479 GiB/sec   4.226 GiB/sec    -5.661                                        {'family_index': 3, 'per_family_instance_index': 12, 'run_name': 'BM_ReadColumn<true,Int64Type>/99/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 40}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:1   2.104 GiB/sec   1.984 GiB/sec    -5.717                      {'family_index': 9, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 37}
                                                    BM_ReadStructColumn/1   1.942 GiB/sec   1.830 GiB/sec    -5.763                                                    {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 115}
                                      BM_ReadColumn<true,Int64Type>/99/50   4.444 GiB/sec   4.183 GiB/sec    -5.878                                       {'family_index': 3, 'per_family_instance_index': 11, 'run_name': 'BM_ReadColumn<true,Int64Type>/99/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 40}
                                                      BM_ReadListColumn/1   1.087 GiB/sec   1.023 GiB/sec    -5.903                                                       {'family_index': 21, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
                                                BM_ReadListOfListColumn/0   1.009 GiB/sec 968.501 MiB/sec    -6.252                                                 {'family_index': 23, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListOfListColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 90}
 BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1   1.184 GiB/sec   1.109 GiB/sec    -6.347  {'family_index': 13, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 42}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99   1.293 GiB/sec   1.208 GiB/sec    -6.597 {'family_index': 13, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 46}
                                              BM_ReadStructOfListColumn/1 694.703 MiB/sec 648.480 MiB/sec    -6.654                                               {'family_index': 20, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructOfListColumn/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 43}
                                       BM_ReadColumn<true,Int32Type>/99/0   2.536 GiB/sec   2.357 GiB/sec    -7.061                                         {'family_index': 1, 'per_family_instance_index': 8, 'run_name': 'BM_ReadColumn<true,Int32Type>/99/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 45}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1   1.181 GiB/sec   1.097 GiB/sec    -7.096            {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
                                              BM_ReadListOfStructColumn/0 990.568 MiB/sec 920.118 MiB/sec    -7.112                                               {'family_index': 22, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListOfStructColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
                                      BM_ReadColumn<true,Int32Type>/99/50   2.531 GiB/sec   2.337 GiB/sec    -7.683                                        {'family_index': 1, 'per_family_instance_index': 7, 'run_name': 'BM_ReadColumn<true,Int32Type>/99/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 45}
                                        BM_ReadColumn<true,Int32Type>/1/1   2.133 GiB/sec   1.967 GiB/sec    -7.780                                          {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,Int32Type>/1/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 38}
                                                   BM_ReadStructColumn/99   3.581 GiB/sec   3.262 GiB/sec    -8.906                                                   {'family_index': 18, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructColumn/99', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 216}
                                            BM_ReadStructOfStructColumn/1   1.621 GiB/sec   1.476 GiB/sec    -8.945                                             {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructOfStructColumn/1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 49}
                                              BM_ReadListOfStructColumn/1 804.476 MiB/sec 731.014 MiB/sec    -9.132                                               {'family_index': 22, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListOfStructColumn/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 47}
                                           BM_ReadStructOfStructColumn/99   3.142 GiB/sec   2.838 GiB/sec    -9.691                                            {'family_index': 19, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructOfStructColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 93}
                                        BM_ReadColumn<true,Int64Type>/5/5   2.330 GiB/sec   2.073 GiB/sec   -11.041                                          {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,Int64Type>/5/5', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
                                                      BM_ReadListColumn/0   1.300 GiB/sec   1.149 GiB/sec   -11.568                                                      {'family_index': 21, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 112}
                                     BM_ReadColumn<true,DoubleType>/10/50   2.022 GiB/sec   1.787 GiB/sec   -11.608                                       {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,DoubleType>/10/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                                       BM_ReadColumn<true,Int64Type>/10/5   2.035 GiB/sec   1.775 GiB/sec   -12.771                                         {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumn<true,Int64Type>/10/5', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                                      BM_ReadColumn<true,Int32Type>/10/10   1.086 GiB/sec 970.014 MiB/sec   -12.778                                        {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumn<true,Int32Type>/10/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 20}

archery benchmark diff --suite-filter='parquet-column-reader' --repetitions=3

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (30)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                        benchmark        baseline       contender  change %                                                                                                                                                                                                                          counters
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   8.384 GiB/sec   9.193 GiB/sec     9.644    {'family_index': 6, 'per_family_instance_index': 6, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 388671}
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   8.388 GiB/sec   9.153 GiB/sec     9.126    {'family_index': 6, 'per_family_instance_index': 4, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 389485}
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   8.549 GiB/sec   9.315 GiB/sec     8.955    {'family_index': 6, 'per_family_instance_index': 5, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 396627}
    ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024  13.626 GiB/sec  14.825 GiB/sec     8.801     {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 636201}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   8.247 GiB/sec   8.848 GiB/sec     7.280    {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 377260}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   8.245 GiB/sec   8.836 GiB/sec     7.174    {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 382577}
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024   8.236 GiB/sec   8.813 GiB/sec     7.005 {'family_index': 6, 'per_family_instance_index': 2, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 382123}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   8.576 GiB/sec   9.031 GiB/sec     5.304    {'family_index': 6, 'per_family_instance_index': 3, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 398009}
     RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10/LevelsPerPage:80000   1.332 GiB/sec   1.355 GiB/sec     1.755         {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 187}
   RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:1000/LevelsPerPage:80000  45.353 GiB/sec  46.100 GiB/sec     1.647      {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:1000/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 6961}
       ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   6.379 GiB/sec   6.482 GiB/sec     1.605        {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 296390}
RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10000/LevelsPerPage:1000000  23.205 GiB/sec  23.371 GiB/sec     0.716    {'family_index': 4, 'per_family_instance_index': 2, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 284}
                  RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:0  38.378 GiB/sec  38.560 GiB/sec     0.474                     {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5823}
     RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10/LevelsPerPage:80000 178.626 MiB/sec 179.468 MiB/sec     0.471          {'family_index': 4, 'per_family_instance_index': 3, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 49}
                              RecordReaderSkipRecords/Repetition:0/BatchSize:1000  56.072 GiB/sec  56.315 GiB/sec     0.434                                 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'RecordReaderSkipRecords/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 8144}
       ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   6.118 GiB/sec   6.140 GiB/sec     0.366        {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 287642}
RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10000/LevelsPerPage:1000000   1.079 GiB/sec   1.082 GiB/sec     0.244     {'family_index': 4, 'per_family_instance_index': 5, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 24}
                  RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:0 595.972 MiB/sec 596.755 MiB/sec     0.131                      {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 160}
   RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:1000/LevelsPerPage:80000   1.039 GiB/sec   1.040 GiB/sec     0.086       {'family_index': 4, 'per_family_instance_index': 4, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:1000/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 282}
                  RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:1   3.190 GiB/sec   3.176 GiB/sec    -0.452                      {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 880}
                           ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000   4.001 GiB/sec   3.956 GiB/sec    -1.119                              {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1102}
                  RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:1  39.466 GiB/sec  38.949 GiB/sec    -1.308                     {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5826}
                                ColumnReaderSkipInt32/Repetition:0/BatchSize:1000  60.157 GiB/sec  59.317 GiB/sec    -1.396                                   {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ColumnReaderSkipInt32/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 8911}
     RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10/LevelsPerPage:80000 164.960 MiB/sec 162.160 MiB/sec    -1.697          {'family_index': 4, 'per_family_instance_index': 6, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 42}
RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10000/LevelsPerPage:1000000 686.955 MiB/sec 673.895 MiB/sec    -1.901     {'family_index': 4, 'per_family_instance_index': 8, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
    RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:100/LevelsPerPage:80000 480.692 MiB/sec 468.740 MiB/sec    -2.486        {'family_index': 4, 'per_family_instance_index': 7, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:100/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 121}
                  RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:0 418.474 MiB/sec 407.546 MiB/sec    -2.612                      {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 106}
       ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   5.648 GiB/sec   5.497 GiB/sec    -2.679        {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 262791}
                              RecordReaderSkipRecords/Repetition:1/BatchSize:1000   4.059 GiB/sec   3.943 GiB/sec    -2.862                                 {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'RecordReaderSkipRecords/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1119}
                                ColumnReaderSkipInt32/Repetition:1/BatchSize:1000   4.129 GiB/sec   4.010 GiB/sec    -2.872                                   {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'ColumnReaderSkipInt32/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1139}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (8)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                 benchmark       baseline      contender  change %                                                                                                                                                                                                                   counters
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  7.188 GiB/sec  6.754 GiB/sec    -6.044 {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 331733}
                    ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000 64.035 GiB/sec 57.609 GiB/sec   -10.034                       {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 9709}
                       RecordReaderSkipRecords/Repetition:2/BatchSize:1000  1.416 GiB/sec  1.273 GiB/sec   -10.080                           {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'RecordReaderSkipRecords/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 362}
           RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:1  1.381 GiB/sec  1.235 GiB/sec   -10.580               {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 357}
                    ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000  1.771 GiB/sec  1.563 GiB/sec   -11.750                        {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 450}
                         ColumnReaderSkipInt32/Repetition:2/BatchSize:1000  1.818 GiB/sec  1.592 GiB/sec   -12.410                             {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'ColumnReaderSkipInt32/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 469}
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  2.313 GiB/sec  1.613 GiB/sec   -30.257 {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 107433}
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  2.250 GiB/sec  1.438 GiB/sec   -36.066 {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 104425}

The new implementation is on-par with the previous one on the parquet-arrow-reader benchmark. Sometimes better, sometimes worse, possibly slightly worse on average.
For the parquet-column-reader worst case regression goes intp -30%

AntoinePrv · 2025-08-26T15:56:45Z

@pitrou this is ready for review

cpp/src/arrow/util/bit_util.h

cpp/src/arrow/util/bit_util_test.cc

HuaHuaY · 2025-08-28T08:02:10Z

cpp/src/arrow/util/bit_stream_utils_internal.h

-  while ((v & 0xFFFFFFFFFFFFFF80ULL) != 0ULL) {
-    result &= PutAligned<uint8_t>(static_cast<uint8_t>((v & 0x7F) | 0x80), 1);
-    v >>= 7;
+  constexpr auto kMaxBytes = bit_util::MaxLEB128ByteLenFor<decltype(v)>;


Suggested change

constexpr auto kMaxBytes = bit_util::MaxLEB128ByteLenFor<decltype(v)>;

constexpr auto kMaxBytes = kMaxVlqByteLengthForInt64;

That's member of the reader that was kept for compatibility.

HuaHuaY · 2025-08-28T08:34:43Z

cpp/src/arrow/util/bit_util.h

+
+  // Need to check if there are bits that would overflow the output.
+  // Also checks that there is no continuation.
+  if (ARROW_PREDICT_FALSE((byte & kHighForbiddenMask) != 0)) {


In my opinion, due to right shift of uint64_t, high bit remains bit 0 when original BitWriter::PutVlqInt(uint64_t v) writes the last byte. In which case should we treat the last byte specially in WriteLEB128 and ParseLeadingLEB128 ?

Take a uint32, that four bytes, it can be written in up to five bytes in LEB128.
But not all five bytes LEB128 fit in a uint32.
On the last byte, a possible 01111111 would be shifted by 4*7=28 overflowing some of the bits.
I added a test for that.

Writing does not have this issue.

cpp/src/arrow/util/bit_util.h

pitrou · 2025-09-02T13:03:28Z

There is a fuzz regression failure which should probably be addressed @AntoinePrv : https://github.com/apache/arrow/actions/runs/17266367915/job/48999342089?pr=47294#step:7:6933

pitrou · 2025-09-02T14:10:33Z

I ran the benchmarks locally with an AMD Zen 2 CPU on Ubuntu 24.04:
https://gist.github.com/pitrou/9539520b5f7850b9cb6ffd25341fdeb0

There are a couple significant speedups and a couple significant regressions, but the overall picture is rather reassuring: this refactoring is globally neutral performance-wise.

pitrou

This is a partial code review, I'll finish tomorrow.

cpp/src/arrow/util/bit_stream_utils_internal.h

cpp/src/arrow/util/bit_util.h

cpp/src/arrow/util/rle_encoding_internal.h

pitrou · 2025-09-03T16:07:26Z

cpp/src/arrow/util/rle_encoding_internal.h

+        out += read;
+
+        // Stop reading and store remaining decoder
+        if (ARROW_PREDICT_FALSE(values_read == batch_size || read == 0)) {


Not sure about ARROW_PREDICT_FALSE either here. Its probability might depend on RLE stream and batching patterns in the upper layers.

pitrou

Sorry, we've been a bit carried away. This is looking excellent now. I've rebased on git main to up-to-date CI results.

conbench-apache-arrow · 2025-09-24T17:04:10Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 79f9764.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

### Rationale for this change ### What changes are included in this PR? New independent abstractions: - A `BitPackedRun` to describe the encoded bytes in a bit packed run. - A minimal `BitPackedDecoder` that can decode this type of run (no dict/spaced methods). - A `RleRun` to describe the encoded value in a RLE run. - A minimal `RleDecoder` that can decode this type of run (no dict/spaced methods). - A `RleBitPackedParser` that read the encoded headers and emits different runs. These new abstractions are then plugged into `RleBitPackedDecoder` (formerly `RleDecode`) to keep the compatibility with the rest of Arrow (improvements to using the parser independently can come in follow-up PR). Misc changes: - Separation of LEB128 reading/writing from `BitReader` into a free functions, and add check for a special case for handling undefined behavior overflow. ### Are these changes tested? Yes, on top of the existing tests, many more unit tests have been added. ### Are there any user-facing changes? API changes to internal classes. * GitHub Issue: apache#47112 Authored-by: AntoinePrv <AntoinePrv@users.noreply.github.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

HuaHuaY · 2026-03-04T13:54:18Z

We're experiencing a performance regression with the Parquet reader on our internal program. We are still locating the regression and suspecting it's related to this PR. I checked the full Conbench report, which mentions several unstable performance issues, all seemingly related to the changes in this PR. However, the URLs in the report are no longer accessible.

AntoinePrv · 2026-03-04T16:06:30Z

@HuaHuaY it is possible this PR (and some follow-up changes found by fuzzing) introduce light regressions in some cases but it should not be much (let me know what you find).
This goal of this PR is for longer term gain where we will be able to completely short-circuit the decoding in some cases, so that will more than offset it.
Recently, GH-47994 should also be a gain.

HuaHuaY · 2026-03-05T10:28:46Z

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

parquet-scan-main is compiled from the latest commit of the main branch.
parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

wgtmac · 2026-03-12T15:32:53Z

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

parquet-scan-main is compiled from the latest commit of the main branch.
parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

Update from @HuaHuaY's comment: we just located the performance regression comes from the API change of the internal unpack function introduced by #47994. The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1. @AntoinePrv @pitrou

AntoinePrv · 2026-03-12T15:46:51Z

The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1.

This should not be the case, where do you see it being left to -1? Or do you mean in the unpack benchmark themselves?

wgtmac · 2026-03-12T16:00:24Z

That's in our internal codebase where the unpack function is called somewhere else. I made a mistake by not setting it explicitly :)

HuaHuaY · 2026-03-13T02:59:27Z

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

parquet-scan-main is compiled from the latest commit of the main branch.
parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

Update from @HuaHuaY's comment: we just located the performance regression comes from the API change of the internal unpack function introduced by #47994. The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1. @AntoinePrv @pitrou

I tested by arrow's parquet-scan target and this argument has already been set to max_read_bytes_ - bytes_fully_read at cpp/src/arrow/util/rle_encoding_internal.h.

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Aug 8, 2025

AntoinePrv force-pushed the rle branch 5 times, most recently from 2f28e67 to d4ccc46 Compare August 14, 2025 15:06

AntoinePrv force-pushed the rle branch 5 times, most recently from b8e6e38 to a08ce09 Compare August 19, 2025 16:10

AntoinePrv force-pushed the rle branch from a08ce09 to 95446ea Compare August 26, 2025 09:35

AntoinePrv marked this pull request as ready for review August 26, 2025 15:56

AntoinePrv requested a review from wgtmac as a code owner August 26, 2025 15:56

AntoinePrv force-pushed the rle branch 2 times, most recently from 026be9a to 0344842 Compare August 27, 2025 12:14

HuaHuaY reviewed Aug 28, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 28, 2025

pitrou reviewed Sep 2, 2025

View reviewed changes

pitrou reviewed Sep 3, 2025

View reviewed changes

cpp/src/arrow/util/rle_encoding_internal.h Outdated Show resolved Hide resolved

pitrou reviewed Sep 3, 2025

View reviewed changes

AntoinePrv added 14 commits September 24, 2025 11:18

Comment slashes

12a87fd

Third pass addressing review comments

a272274

Fix infinite loop on invalid input

ab7d212

Fourth pass addressing review comments

6ed409e

Apply formatter

a61d836

Address newer reviewer comments

c91638b

Inline small method and remove [[nodiscard]]

11bb10a

snake_case for small const methods

a91a847

Rule of zero

2221741

Simplify type aliases

d520ebd

Set buffer capacity

7f153c0

More doc

41ff0cd

Address reviewer comments

dcc6db0

Explicitly pass bit width (don't store multiple time)

c2344a8

pitrou approved these changes Sep 24, 2025

View reviewed changes

pitrou force-pushed the rle branch from fc4f311 to c2344a8 Compare September 24, 2025 09:31

pitrou merged commit 79f9764 into apache:main Sep 24, 2025
38 of 39 checks passed

pitrou removed the awaiting committer review Awaiting committer review label Sep 24, 2025

pitrou mentioned this pull request Sep 24, 2025

[C++][Parquet] Refactor RLE decoding by extract a RLE parser #47112

Closed

AntoinePrv deleted the rle branch September 24, 2025 13:23

pitrou mentioned this pull request Oct 27, 2025

GH-32339: [C++][Python][Parquet] Implement direct reads of Parquet RLE encoded data into Arrow REE #46103

Open

	constexpr auto kMaxBytes = bit_util::MaxLEB128ByteLenFor<decltype(v)>;
	constexpr auto kMaxBytes = kMaxVlqByteLengthForInt64;

Conversation

AntoinePrv commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

AntoinePrv commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoinePrv commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoinePrv commented Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

HuaHuaY Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

HuaHuaY Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou commented Sep 2, 2025

Uh oh!

pitrou commented Sep 2, 2025

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

conbench-apache-arrow bot commented Sep 24, 2025

Uh oh!

HuaHuaY commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoinePrv commented Mar 4, 2026

Uh oh!

HuaHuaY commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wgtmac commented Mar 12, 2026

Uh oh!

AntoinePrv commented Mar 12, 2026

Uh oh!

wgtmac commented Mar 12, 2026

Uh oh!

HuaHuaY commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

AntoinePrv commented Aug 8, 2025 •

edited

Loading

AntoinePrv commented Aug 8, 2025 •

edited

Loading

AntoinePrv commented Aug 26, 2025 •

edited

Loading

HuaHuaY Aug 28, 2025 •

edited

Loading

HuaHuaY commented Mar 4, 2026 •

edited

Loading

HuaHuaY commented Mar 5, 2026 •

edited

Loading

HuaHuaY commented Mar 13, 2026 •

edited

Loading