Skip to content

GH-47112: [Parquet][C++] Rle BitPacked parser#47294

Merged
pitrou merged 56 commits intoapache:mainfrom
AntoinePrv:rle
Sep 24, 2025
Merged

GH-47112: [Parquet][C++] Rle BitPacked parser#47294
pitrou merged 56 commits intoapache:mainfrom
AntoinePrv:rle

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented Aug 8, 2025

Rationale for this change

What changes are included in this PR?

New independent abstractions:

  • A BitPackedRun to describe the encoded bytes in a bit packed run.
  • A minimal BitPackedDecoder that can decode this type of run (no dict/spaced methods).
  • A RleRun to describe the encoded value in a RLE run.
  • A minimal RleDecoder that can decode this type of run (no dict/spaced methods).
  • A RleBitPackedParser that read the encoded headers and emits different runs.

These new abstractions are then plugged into RleBitPackedDecoder (formerly RleDecode) to keep the compatibility with the rest of Arrow (improvements to using the parser independently can come in follow-up PR).

Misc changes:

  • Separation of LEB128 reading/writing from BitReader into a free functions, and add check for a special case for handling undefined behavior overflow.

Are these changes tested?

Yes, on top of the existing tests, many more unit tests have been added.

Are there any user-facing changes?

API changes to internal classes.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 8, 2025

⚠️ GitHub issue #47112 has been automatically assigned in GitHub to PR creator.

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

AntoinePrv commented Aug 8, 2025

Work in progress, this is currently a split of the decoder in a parser and a decoder, but it is not plugged in.

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

AntoinePrv commented Aug 26, 2025

Some benchmarks on Linux x86_64 cloud instance with 8 CPU 16 Gb memory and dependencies/compilers from Conda-Forge.

archery benchmark diff --benchmark-suite=parquet-arrow --benchmark-filter=Read --repetitions=3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (81)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                 benchmark         baseline        contender  change %                                                                                                                                                                                                                 counters
                                     BM_ReadColumn<false,BooleanType>/-1/0  120.497 MiB/sec  172.809 MiB/sec    43.414                                        {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,BooleanType>/-1/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 68}
                                            BM_ReadStructOfStructColumn/50    1.379 GiB/sec    1.650 GiB/sec    19.712                                              {'family_index': 19, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructOfStructColumn/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 41}
                                     BM_ReadColumn<false,BooleanType>/1/20   53.301 MiB/sec   60.698 MiB/sec    13.878                                        {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,BooleanType>/1/20', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 29}
                                                    BM_ReadStructColumn/50    1.251 GiB/sec    1.367 GiB/sec     9.297                                                      {'family_index': 18, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 75}
                                      BM_ReadColumn<true,BooleanType>/-1/1   29.265 MiB/sec   31.087 MiB/sec     6.227                                         {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,BooleanType>/-1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 16}
                                       BM_ReadColumn<false,Int32Type>/-1/1   14.031 GiB/sec   14.775 GiB/sec     5.302                                         {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 258}
                                       BM_ReadColumn<true,DoubleType>/-1/0    1.945 GiB/sec    2.048 GiB/sec     5.261                                          {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,DoubleType>/-1/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
                                        BM_ReadColumn<true,Int64Type>/-1/0    1.957 GiB/sec    2.046 GiB/sec     4.527                                           {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,Int64Type>/-1/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                     BM_ReadColumnPlain<true,Int32Type>/null_probability:0    3.356 GiB/sec    3.504 GiB/sec     4.401                        {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
                                                     BM_ReadStructColumn/0   10.081 GiB/sec   10.508 GiB/sec     4.236                                                      {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 601}
                                                BM_ReadListOfListColumn/99    1.452 GiB/sec    1.513 GiB/sec     4.181                                                 {'family_index': 23, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListOfListColumn/99', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 123}
                                      BM_ReadColumn<false,Int32Type>/-1/10   13.915 GiB/sec   14.484 GiB/sec     4.083                                        {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 255}
                                      BM_ReadColumn<false,Int64Type>/-1/50   11.318 GiB/sec   11.744 GiB/sec     3.761                                        {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 103}
          BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100    2.162 GiB/sec    2.229 GiB/sec     3.099            {'family_index': 11, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:100', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 79}
                                                BM_ReadIndividualRowGroups    3.753 GiB/sec    3.822 GiB/sec     1.818                                                  {'family_index': 24, 'per_family_instance_index': 0, 'run_name': 'BM_ReadIndividualRowGroups', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 34}
                                       BM_ReadColumn<false,Int64Type>/-1/1   11.464 GiB/sec   11.670 GiB/sec     1.799                                         {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 103}
               BM_ReadBinaryViewColumn/null_probability:0/unique_values:32    1.070 GiB/sec    1.089 GiB/sec     1.786                  {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:0/unique_values:32', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 3}
                                                      BM_ReadListColumn/99    1.734 GiB/sec    1.763 GiB/sec     1.703                                                       {'family_index': 21, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListColumn/99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 155}
               BM_ReadBinaryViewColumn/null_probability:0/unique_values:-1    1.935 GiB/sec    1.967 GiB/sec     1.654                  {'family_index': 15, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100    2.167 GiB/sec    2.203 GiB/sec     1.639  {'family_index': 13, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:100', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 78}
            BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0    1.976 GiB/sec    2.008 GiB/sec     1.606              {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 69}
                   BM_ReadColumnPlain<false,Int32Type>/null_probability:-1   14.423 GiB/sec   14.654 GiB/sec     1.605                     {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<false,Int32Type>/null_probability:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 263}
BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1   18.385 GiB/sec   18.650 GiB/sec     1.444 {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnByteStreamSplit<false,Float16LogicalType>/null_probability:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 662}
                                                  BM_ReadMultipleRowGroups    3.706 GiB/sec    3.752 GiB/sec     1.248                                                    {'family_index': 25, 'per_family_instance_index': 0, 'run_name': 'BM_ReadMultipleRowGroups', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 33}
                                             BM_ReadStructOfStructColumn/0    8.740 GiB/sec    8.838 GiB/sec     1.118                                              {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructOfStructColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 252}
                                         BM_ReadColumn<true,Int32Type>/0/1    3.461 GiB/sec    3.498 GiB/sec     1.072                                            {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,Int32Type>/0/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 60}
                                              BM_ReadStructOfListColumn/99    1.130 GiB/sec    1.141 GiB/sec     0.995                                                {'family_index': 20, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructOfListColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 65}
                                         BM_ReadMultipleRowGroupsGenerator    3.724 GiB/sec    3.742 GiB/sec     0.482                                           {'family_index': 26, 'per_family_instance_index': 0, 'run_name': 'BM_ReadMultipleRowGroupsGenerator', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 33}
              BM_ReadBinaryViewColumn/null_probability:50/unique_values:-1    1.055 GiB/sec    1.059 GiB/sec     0.324                 {'family_index': 15, 'per_family_instance_index': 6, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
                                              BM_ReadStructOfListColumn/50  528.189 MiB/sec  529.803 MiB/sec     0.306                                                {'family_index': 20, 'per_family_instance_index': 2, 'run_name': 'BM_ReadStructOfListColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 32}
    BM_ReadBinaryColumnDeltaByteArray/null_probability:50/unique_values:-1  778.888 MiB/sec  780.710 MiB/sec     0.234       {'family_index': 16, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
                                       BM_ReadColumn<true,Int64Type>/50/50    2.109 GiB/sec    2.113 GiB/sec     0.203                                          {'family_index': 3, 'per_family_instance_index': 8, 'run_name': 'BM_ReadColumn<true,Int64Type>/50/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
BM_ReadBinaryViewColumnDeltaByteArray/null_probability:50/unique_values:-1    1.040 GiB/sec    1.041 GiB/sec     0.164   {'family_index': 17, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 4}
 BM_ReadBinaryViewColumnDeltaByteArray/null_probability:1/unique_values:-1    1.725 GiB/sec    1.724 GiB/sec    -0.069    {'family_index': 17, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5}
                                      BM_ReadColumn<false,Int32Type>/-1/50   14.614 GiB/sec   14.597 GiB/sec    -0.116                                        {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<false,Int32Type>/-1/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 264}
                                        BM_ReadColumn<true,Int64Type>/50/1    2.116 GiB/sec    2.112 GiB/sec    -0.186                                           {'family_index': 3, 'per_family_instance_index': 9, 'run_name': 'BM_ReadColumn<true,Int64Type>/50/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
                   BM_ReadColumnPlain<true,Int32Type>/null_probability:100    4.013 GiB/sec    4.004 GiB/sec    -0.233                      {'family_index': 9, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:100', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 69}
     BM_ReadBinaryColumnDeltaByteArray/null_probability:0/unique_values:-1    1.298 GiB/sec    1.294 GiB/sec    -0.247        {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
                                        BM_ReadColumn<true,Int32Type>/-1/0    1.086 GiB/sec    1.083 GiB/sec    -0.279                                           {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<true,Int32Type>/-1/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 19}
                                              BM_ReadListOfStructColumn/50  656.332 MiB/sec  654.402 MiB/sec    -0.294                                                {'family_index': 22, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListOfStructColumn/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 38}
          BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1   18.674 GiB/sec   18.573 GiB/sec    -0.539           {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnPlain<false,Float16LogicalType>/null_probability:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 633}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:50    1.129 GiB/sec    1.123 GiB/sec    -0.566                       {'family_index': 9, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 20}
                                       BM_ReadColumn<true,Int32Type>/50/50    1.128 GiB/sec    1.122 GiB/sec    -0.571                                          {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'BM_ReadColumn<true,Int32Type>/50/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 20}
                                        BM_ReadColumn<true,Int32Type>/50/0    1.135 GiB/sec    1.127 GiB/sec    -0.710                                           {'family_index': 1, 'per_family_instance_index': 6, 'run_name': 'BM_ReadColumn<true,Int32Type>/50/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 20}
     BM_ReadBinaryColumnDeltaByteArray/null_probability:1/unique_values:-1    1.221 GiB/sec    1.212 GiB/sec    -0.749        {'family_index': 16, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
  BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0    2.104 GiB/sec    2.085 GiB/sec    -0.881    {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 74}
                  BM_ReadBinaryColumn/null_probability:50/unique_values:-1  794.836 MiB/sec  787.189 MiB/sec    -0.962                     {'family_index': 14, 'per_family_instance_index': 6, 'run_name': 'BM_ReadBinaryColumn/null_probability:50/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
 BM_ReadBinaryViewColumnDeltaByteArray/null_probability:0/unique_values:-1    1.929 GiB/sec    1.910 GiB/sec    -0.969    {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 5}
                                      BM_ReadColumn<true,BooleanType>/5/10   30.798 MiB/sec   30.487 MiB/sec    -1.010                                         {'family_index': 7, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,BooleanType>/5/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 17}
                   BM_ReadBinaryColumn/null_probability:1/unique_values:-1    1.216 GiB/sec    1.203 GiB/sec    -1.018                      {'family_index': 14, 'per_family_instance_index': 5, 'run_name': 'BM_ReadBinaryColumn/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
                                      BM_ReadColumn<false,Int64Type>/-1/10   11.593 GiB/sec   11.459 GiB/sec    -1.161                                        {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,Int64Type>/-1/10', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 105}
               BM_ReadBinaryViewColumn/null_probability:1/unique_values:-1    1.736 GiB/sec    1.716 GiB/sec    -1.174                  {'family_index': 15, 'per_family_instance_index': 5, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:1/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 5}
               BM_ReadBinaryViewColumn/null_probability:1/unique_values:32 1023.246 MiB/sec 1009.122 MiB/sec    -1.380                  {'family_index': 15, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:1/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 3}
              BM_ReadBinaryViewColumn/null_probability:50/unique_values:32  857.872 MiB/sec  845.888 MiB/sec    -1.397                 {'family_index': 15, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:50/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
                  BM_ReadBinaryColumn/null_probability:50/unique_values:32  567.955 MiB/sec  559.978 MiB/sec    -1.404                     {'family_index': 14, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryColumn/null_probability:50/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 4}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50  614.041 MiB/sec  603.984 MiB/sec    -1.638             {'family_index': 11, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 22}
                                       BM_ReadColumn<true,Int64Type>/45/25    2.107 GiB/sec    2.073 GiB/sec    -1.641                                          {'family_index': 3, 'per_family_instance_index': 7, 'run_name': 'BM_ReadColumn<true,Int64Type>/45/25', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 19}
              BM_ReadBinaryViewColumn/null_probability:99/unique_values:-1    1.397 GiB/sec    1.372 GiB/sec    -1.783                {'family_index': 15, 'per_family_instance_index': 7, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 12}
                  BM_ReadBinaryColumn/null_probability:99/unique_values:-1  741.638 MiB/sec  728.032 MiB/sec    -1.835                    {'family_index': 14, 'per_family_instance_index': 7, 'run_name': 'BM_ReadBinaryColumn/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                   BM_ReadBinaryColumn/null_probability:0/unique_values:32  754.669 MiB/sec  740.571 MiB/sec    -1.868                      {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'BM_ReadBinaryColumn/null_probability:0/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 3}
                                      BM_ReadColumn<false,DoubleType>/-1/0   11.838 GiB/sec   11.607 GiB/sec    -1.946                                        {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'BM_ReadColumn<false,DoubleType>/-1/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 104}
                                                      BM_ReadListColumn/50  718.113 MiB/sec  704.088 MiB/sec    -1.953                                                        {'family_index': 21, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListColumn/50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 62}
                                       BM_ReadColumn<true,Int64Type>/35/10    2.061 GiB/sec    2.020 GiB/sec    -1.983                                          {'family_index': 3, 'per_family_instance_index': 6, 'run_name': 'BM_ReadColumn<true,Int64Type>/35/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
              BM_ReadBinaryViewColumn/null_probability:99/unique_values:32    1.370 GiB/sec    1.342 GiB/sec    -2.059                {'family_index': 15, 'per_family_instance_index': 4, 'run_name': 'BM_ReadBinaryViewColumn/null_probability:99/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                                              BM_ReadListOfStructColumn/99    1.339 GiB/sec    1.311 GiB/sec    -2.086                                                {'family_index': 22, 'per_family_instance_index': 3, 'run_name': 'BM_ReadListOfStructColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 74}
                                                BM_ReadListOfListColumn/50  631.471 MiB/sec  618.070 MiB/sec    -2.122                                                  {'family_index': 23, 'per_family_instance_index': 2, 'run_name': 'BM_ReadListOfListColumn/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 55}
    BM_ReadBinaryColumnDeltaByteArray/null_probability:99/unique_values:-1  743.722 MiB/sec  723.739 MiB/sec    -2.687      {'family_index': 16, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryColumnDeltaByteArray/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 12}
                   BM_ReadBinaryColumn/null_probability:1/unique_values:32  718.629 MiB/sec  698.738 MiB/sec    -2.768                      {'family_index': 14, 'per_family_instance_index': 2, 'run_name': 'BM_ReadBinaryColumn/null_probability:1/unique_values:32', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 3}
                                       BM_ReadColumn<true,Int64Type>/30/10    2.033 GiB/sec    1.974 GiB/sec    -2.888                                          {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'BM_ReadColumn<true,Int64Type>/30/10', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 18}
                  BM_ReadBinaryColumn/null_probability:99/unique_values:32  729.705 MiB/sec  707.958 MiB/sec    -2.980                    {'family_index': 14, 'per_family_instance_index': 4, 'run_name': 'BM_ReadBinaryColumn/null_probability:99/unique_values:32', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 13}
                   BM_ReadBinaryColumn/null_probability:0/unique_values:-1    1.327 GiB/sec    1.285 GiB/sec    -3.171                      {'family_index': 14, 'per_family_instance_index': 1, 'run_name': 'BM_ReadBinaryColumn/null_probability:0/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 4}
BM_ReadBinaryViewColumnDeltaByteArray/null_probability:99/unique_values:-1    1.398 GiB/sec    1.353 GiB/sec    -3.218  {'family_index': 17, 'per_family_instance_index': 3, 'run_name': 'BM_ReadBinaryViewColumnDeltaByteArray/null_probability:99/unique_values:-1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
                                      BM_ReadColumn<true,DoubleType>/25/25    1.972 GiB/sec    1.891 GiB/sec    -4.113                                         {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,DoubleType>/25/25', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 18}
                                        BM_ReadColumn<true,Int32Type>/25/5    1.060 GiB/sec    1.015 GiB/sec    -4.257                                           {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumn<true,Int32Type>/25/5', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 19}
 BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50  615.394 MiB/sec  588.496 MiB/sec    -4.371   {'family_index': 13, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:50', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
                                                 BM_ReadListOfListColumn/1  907.752 MiB/sec  866.414 MiB/sec    -4.554                                                   {'family_index': 23, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListOfListColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 79}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:99    2.424 GiB/sec    2.314 GiB/sec    -4.566                       {'family_index': 9, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 40}
                                       BM_ReadColumn<true,Int64Type>/25/10    1.986 GiB/sec    1.893 GiB/sec    -4.669                                          {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'BM_ReadColumn<true,Int64Type>/25/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 18}
                                         BM_ReadColumn<true,Int64Type>/1/1    3.363 GiB/sec    3.201 GiB/sec    -4.798                                            {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,Int64Type>/1/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 31}
                                        BM_ReadColumn<true,Int64Type>/75/1    2.404 GiB/sec    2.287 GiB/sec    -4.859                                          {'family_index': 3, 'per_family_instance_index': 10, 'run_name': 'BM_ReadColumn<true,Int64Type>/75/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 22}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99    1.297 GiB/sec    1.233 GiB/sec    -4.942             {'family_index': 11, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 47}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (25)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                benchmark        baseline       contender  change %                                                                                                                                                                                                               counters
                                    BM_ReadColumn<false,DoubleType>/-1/20  12.151 GiB/sec  11.520 GiB/sec    -5.190                                      {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<false,DoubleType>/-1/20', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 99}
                                              BM_ReadStructOfListColumn/0 835.554 MiB/sec 790.684 MiB/sec    -5.370                                               {'family_index': 20, 'per_family_instance_index': 0, 'run_name': 'BM_ReadStructOfListColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 50}
                                       BM_ReadColumn<true,Int64Type>/99/0   4.479 GiB/sec   4.226 GiB/sec    -5.661                                        {'family_index': 3, 'per_family_instance_index': 12, 'run_name': 'BM_ReadColumn<true,Int64Type>/99/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 40}
                    BM_ReadColumnPlain<true,Int32Type>/null_probability:1   2.104 GiB/sec   1.984 GiB/sec    -5.717                      {'family_index': 9, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnPlain<true,Int32Type>/null_probability:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 37}
                                                    BM_ReadStructColumn/1   1.942 GiB/sec   1.830 GiB/sec    -5.763                                                    {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 115}
                                      BM_ReadColumn<true,Int64Type>/99/50   4.444 GiB/sec   4.183 GiB/sec    -5.878                                       {'family_index': 3, 'per_family_instance_index': 11, 'run_name': 'BM_ReadColumn<true,Int64Type>/99/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 40}
                                                      BM_ReadListColumn/1   1.087 GiB/sec   1.023 GiB/sec    -5.903                                                       {'family_index': 21, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListColumn/1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 94}
                                                BM_ReadListOfListColumn/0   1.009 GiB/sec 968.501 MiB/sec    -6.252                                                 {'family_index': 23, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListOfListColumn/0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 90}
 BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1   1.184 GiB/sec   1.109 GiB/sec    -6.347  {'family_index': 13, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 42}
BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99   1.293 GiB/sec   1.208 GiB/sec    -6.597 {'family_index': 13, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumnByteStreamSplit<true,Float16LogicalType>/null_probability:99', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 46}
                                              BM_ReadStructOfListColumn/1 694.703 MiB/sec 648.480 MiB/sec    -6.654                                               {'family_index': 20, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructOfListColumn/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 43}
                                       BM_ReadColumn<true,Int32Type>/99/0   2.536 GiB/sec   2.357 GiB/sec    -7.061                                         {'family_index': 1, 'per_family_instance_index': 8, 'run_name': 'BM_ReadColumn<true,Int32Type>/99/0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 45}
           BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1   1.181 GiB/sec   1.097 GiB/sec    -7.096            {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumnPlain<true,Float16LogicalType>/null_probability:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 42}
                                              BM_ReadListOfStructColumn/0 990.568 MiB/sec 920.118 MiB/sec    -7.112                                               {'family_index': 22, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListOfStructColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 58}
                                      BM_ReadColumn<true,Int32Type>/99/50   2.531 GiB/sec   2.337 GiB/sec    -7.683                                        {'family_index': 1, 'per_family_instance_index': 7, 'run_name': 'BM_ReadColumn<true,Int32Type>/99/50', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 45}
                                        BM_ReadColumn<true,Int32Type>/1/1   2.133 GiB/sec   1.967 GiB/sec    -7.780                                          {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,Int32Type>/1/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 38}
                                                   BM_ReadStructColumn/99   3.581 GiB/sec   3.262 GiB/sec    -8.906                                                   {'family_index': 18, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructColumn/99', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 216}
                                            BM_ReadStructOfStructColumn/1   1.621 GiB/sec   1.476 GiB/sec    -8.945                                             {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'BM_ReadStructOfStructColumn/1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 49}
                                              BM_ReadListOfStructColumn/1 804.476 MiB/sec 731.014 MiB/sec    -9.132                                               {'family_index': 22, 'per_family_instance_index': 1, 'run_name': 'BM_ReadListOfStructColumn/1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 47}
                                           BM_ReadStructOfStructColumn/99   3.142 GiB/sec   2.838 GiB/sec    -9.691                                            {'family_index': 19, 'per_family_instance_index': 3, 'run_name': 'BM_ReadStructOfStructColumn/99', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 93}
                                        BM_ReadColumn<true,Int64Type>/5/5   2.330 GiB/sec   2.073 GiB/sec   -11.041                                          {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'BM_ReadColumn<true,Int64Type>/5/5', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 21}
                                                      BM_ReadListColumn/0   1.300 GiB/sec   1.149 GiB/sec   -11.568                                                      {'family_index': 21, 'per_family_instance_index': 0, 'run_name': 'BM_ReadListColumn/0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 112}
                                     BM_ReadColumn<true,DoubleType>/10/50   2.022 GiB/sec   1.787 GiB/sec   -11.608                                       {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'BM_ReadColumn<true,DoubleType>/10/50', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                                       BM_ReadColumn<true,Int64Type>/10/5   2.035 GiB/sec   1.775 GiB/sec   -12.771                                         {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumn<true,Int64Type>/10/5', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 18}
                                      BM_ReadColumn<true,Int32Type>/10/10   1.086 GiB/sec 970.014 MiB/sec   -12.778                                        {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'BM_ReadColumn<true,Int32Type>/10/10', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 20}
archery benchmark diff --suite-filter='parquet-column-reader' --repetitions=3
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (30)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                        benchmark        baseline       contender  change %                                                                                                                                                                                                                          counters
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   8.384 GiB/sec   9.193 GiB/sec     9.644    {'family_index': 6, 'per_family_instance_index': 6, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 388671}
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   8.388 GiB/sec   9.153 GiB/sec     9.126    {'family_index': 6, 'per_family_instance_index': 4, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 389485}
   ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   8.549 GiB/sec   9.315 GiB/sec     8.955    {'family_index': 6, 'per_family_instance_index': 5, 'run_name': 'ReadLevels_BitPack/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 396627}
    ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024  13.626 GiB/sec  14.825 GiB/sec     8.801     {'family_index': 5, 'per_family_instance_index': 2, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 636201}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   8.247 GiB/sec   8.848 GiB/sec     7.280    {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 377260}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7   8.245 GiB/sec   8.836 GiB/sec     7.174    {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 382577}
ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024   8.236 GiB/sec   8.813 GiB/sec     7.005 {'family_index': 6, 'per_family_instance_index': 2, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1024', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 382123}
   ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   8.576 GiB/sec   9.031 GiB/sec     5.304    {'family_index': 6, 'per_family_instance_index': 3, 'run_name': 'ReadLevels_BitPack/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 398009}
     RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10/LevelsPerPage:80000   1.332 GiB/sec   1.355 GiB/sec     1.755         {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 187}
   RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:1000/LevelsPerPage:80000  45.353 GiB/sec  46.100 GiB/sec     1.647      {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:1000/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 6961}
       ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   6.379 GiB/sec   6.482 GiB/sec     1.605        {'family_index': 5, 'per_family_instance_index': 4, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 296390}
RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10000/LevelsPerPage:1000000  23.205 GiB/sec  23.371 GiB/sec     0.716    {'family_index': 4, 'per_family_instance_index': 2, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:0/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 284}
                  RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:0  38.378 GiB/sec  38.560 GiB/sec     0.474                     {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5823}
     RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10/LevelsPerPage:80000 178.626 MiB/sec 179.468 MiB/sec     0.471          {'family_index': 4, 'per_family_instance_index': 3, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 49}
                              RecordReaderSkipRecords/Repetition:0/BatchSize:1000  56.072 GiB/sec  56.315 GiB/sec     0.434                                 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'RecordReaderSkipRecords/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 8144}
       ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1   6.118 GiB/sec   6.140 GiB/sec     0.366        {'family_index': 5, 'per_family_instance_index': 3, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 287642}
RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10000/LevelsPerPage:1000000   1.079 GiB/sec   1.082 GiB/sec     0.244     {'family_index': 4, 'per_family_instance_index': 5, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 24}
                  RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:0 595.972 MiB/sec 596.755 MiB/sec     0.131                      {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 160}
   RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:1000/LevelsPerPage:80000   1.039 GiB/sec   1.040 GiB/sec     0.086       {'family_index': 4, 'per_family_instance_index': 4, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:1/BatchSize:1000/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 282}
                  RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:1   3.190 GiB/sec   3.176 GiB/sec    -0.452                      {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'RecordReaderReadRecords/Repetition:1/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 880}
                           ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000   4.001 GiB/sec   3.956 GiB/sec    -1.119                              {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1102}
                  RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:1  39.466 GiB/sec  38.949 GiB/sec    -1.308                     {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'RecordReaderReadRecords/Repetition:0/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 5826}
                                ColumnReaderSkipInt32/Repetition:0/BatchSize:1000  60.157 GiB/sec  59.317 GiB/sec    -1.396                                   {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'ColumnReaderSkipInt32/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 8911}
     RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10/LevelsPerPage:80000 164.960 MiB/sec 162.160 MiB/sec    -1.697          {'family_index': 4, 'per_family_instance_index': 6, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 42}
RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10000/LevelsPerPage:1000000 686.955 MiB/sec 673.895 MiB/sec    -1.901     {'family_index': 4, 'per_family_instance_index': 8, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:10000/LevelsPerPage:1000000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 12}
    RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:100/LevelsPerPage:80000 480.692 MiB/sec 468.740 MiB/sec    -2.486        {'family_index': 4, 'per_family_instance_index': 7, 'run_name': 'RecordReaderReadAndSkipRecords/Repetition:2/BatchSize:100/LevelsPerPage:80000', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 121}
                  RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:0 418.474 MiB/sec 407.546 MiB/sec    -2.612                      {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:0', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 106}
       ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1   5.648 GiB/sec   5.497 GiB/sec    -2.679        {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 262791}
                              RecordReaderSkipRecords/Repetition:1/BatchSize:1000   4.059 GiB/sec   3.943 GiB/sec    -2.862                                 {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'RecordReaderSkipRecords/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1119}
                                ColumnReaderSkipInt32/Repetition:1/BatchSize:1000   4.129 GiB/sec   4.010 GiB/sec    -2.872                                   {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'ColumnReaderSkipInt32/Repetition:1/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 1139}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (8)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                 benchmark       baseline      contender  change %                                                                                                                                                                                                                   counters
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1  7.188 GiB/sec  6.754 GiB/sec    -6.044 {'family_index': 5, 'per_family_instance_index': 5, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:2048/LevelRepeatCount:1', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 331733}
                    ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000 64.035 GiB/sec 57.609 GiB/sec   -10.034                       {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:0/BatchSize:1000', 'repetitions': 3, 'repetition_index': 2, 'threads': 1, 'iterations': 9709}
                       RecordReaderSkipRecords/Repetition:2/BatchSize:1000  1.416 GiB/sec  1.273 GiB/sec   -10.080                           {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'RecordReaderSkipRecords/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 362}
           RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:1  1.381 GiB/sec  1.235 GiB/sec   -10.580               {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'RecordReaderReadRecords/Repetition:2/BatchSize:1000/ReadDense:1', 'repetitions': 3, 'repetition_index': 1, 'threads': 1, 'iterations': 357}
                    ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000  1.771 GiB/sec  1.563 GiB/sec   -11.750                        {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'ColumnReaderReadBatchInt32/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 450}
                         ColumnReaderSkipInt32/Repetition:2/BatchSize:1000  1.818 GiB/sec  1.592 GiB/sec   -12.410                             {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'ColumnReaderSkipInt32/Repetition:2/BatchSize:1000', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 469}
ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  2.313 GiB/sec  1.613 GiB/sec   -30.257 {'family_index': 5, 'per_family_instance_index': 6, 'run_name': 'ReadLevels_Rle/MaxLevel:3/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 107433}
ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7  2.250 GiB/sec  1.438 GiB/sec   -36.066 {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'ReadLevels_Rle/MaxLevel:1/NumLevels:8096/BatchSize:1024/LevelRepeatCount:7', 'repetitions': 3, 'repetition_index': 0, 'threads': 1, 'iterations': 104425}

The new implementation is on-par with the previous one on the parquet-arrow-reader benchmark. Sometimes better, sometimes worse, possibly slightly worse on average.
For the parquet-column-reader worst case regression goes intp -30%

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@pitrou this is ready for review

@AntoinePrv AntoinePrv marked this pull request as ready for review August 26, 2025 15:56
@AntoinePrv AntoinePrv requested a review from wgtmac as a code owner August 26, 2025 15:56
@AntoinePrv AntoinePrv force-pushed the rle branch 2 times, most recently from 026be9a to 0344842 Compare August 27, 2025 12:14
while ((v & 0xFFFFFFFFFFFFFF80ULL) != 0ULL) {
result &= PutAligned<uint8_t>(static_cast<uint8_t>((v & 0x7F) | 0x80), 1);
v >>= 7;
constexpr auto kMaxBytes = bit_util::MaxLEB128ByteLenFor<decltype(v)>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr auto kMaxBytes = bit_util::MaxLEB128ByteLenFor<decltype(v)>;
constexpr auto kMaxBytes = kMaxVlqByteLengthForInt64;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's member of the reader that was kept for compatibility.


// Need to check if there are bits that would overflow the output.
// Also checks that there is no continuation.
if (ARROW_PREDICT_FALSE((byte & kHighForbiddenMask) != 0)) {
Copy link
Copy Markdown
Contributor

@HuaHuaY HuaHuaY Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, due to right shift of uint64_t, high bit remains bit 0 when original BitWriter::PutVlqInt(uint64_t v) writes the last byte. In which case should we treat the last byte specially in WriteLEB128 and ParseLeadingLEB128 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a uint32, that four bytes, it can be written in up to five bytes in LEB128.
But not all five bytes LEB128 fit in a uint32.
On the last byte, a possible 01111111 would be shifted by 4*7=28 overflowing some of the bits.
I added a test for that.

Writing does not have this issue.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 28, 2025
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Sep 2, 2025

There is a fuzz regression failure which should probably be addressed @AntoinePrv : https://github.com/apache/arrow/actions/runs/17266367915/job/48999342089?pr=47294#step:7:6933

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Sep 2, 2025

I ran the benchmarks locally with an AMD Zen 2 CPU on Ubuntu 24.04:
https://gist.github.com/pitrou/9539520b5f7850b9cb6ffd25341fdeb0

There are a couple significant speedups and a couple significant regressions, but the overall picture is rather reassuring: this refactoring is globally neutral performance-wise.

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial code review, I'll finish tomorrow.

out += read;

// Stop reading and store remaining decoder
if (ARROW_PREDICT_FALSE(values_read == batch_size || read == 0)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about ARROW_PREDICT_FALSE either here. Its probability might depend on RLE stream and batching patterns in the upper layers.

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we've been a bit carried away. This is looking excellent now. I've rebased on git main to up-to-date CI results.

@pitrou pitrou merged commit 79f9764 into apache:main Sep 24, 2025
38 of 39 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Sep 24, 2025
@AntoinePrv AntoinePrv deleted the rle branch September 24, 2025 13:23
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 79f9764.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Oct 15, 2025
### Rationale for this change

### What changes are included in this PR?
New independent abstractions:
- A `BitPackedRun` to describe the encoded bytes in a bit packed run.
- A minimal `BitPackedDecoder` that can decode this type of run (no dict/spaced methods).
- A `RleRun` to describe the encoded value in a RLE run.
- A minimal `RleDecoder` that can decode this type of run (no dict/spaced methods).
- A `RleBitPackedParser` that read the encoded headers and emits different runs.

These new abstractions are then plugged into `RleBitPackedDecoder` (formerly `RleDecode`) to keep the compatibility with the rest of Arrow (improvements to using the parser independently can come in follow-up PR).

Misc changes:
- Separation of LEB128 reading/writing from `BitReader` into a free functions, and add check for a special case for handling undefined behavior overflow.

### Are these changes tested?
Yes, on top of the existing tests, many more unit tests have been added.

### Are there any user-facing changes?
API changes to internal classes.

* GitHub Issue: apache#47112

Authored-by: AntoinePrv <AntoinePrv@users.noreply.github.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@HuaHuaY
Copy link
Copy Markdown
Contributor

HuaHuaY commented Mar 4, 2026

We're experiencing a performance regression with the Parquet reader on our internal program. We are still locating the regression and suspecting it's related to this PR. I checked the full Conbench report, which mentions several unstable performance issues, all seemingly related to the changes in this PR. However, the URLs in the report are no longer accessible.

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@HuaHuaY it is possible this PR (and some follow-up changes found by fuzzing) introduce light regressions in some cases but it should not be much (let me know what you find).
This goal of this PR is for longer term gain where we will be able to completely short-circuit the decoding in some cases, so that will more than offset it.
Recently, GH-47994 should also be a gain.

@HuaHuaY
Copy link
Copy Markdown
Contributor

HuaHuaY commented Mar 5, 2026

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

  • parquet-scan-main is compiled from the latest commit of the main branch.
  • parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented Mar 12, 2026

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

  • parquet-scan-main is compiled from the latest commit of the main branch.
  • parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

Update from @HuaHuaY's comment: we just located the performance regression comes from the API change of the internal unpack function introduced by #47994. The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1. @AntoinePrv @pitrou

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1.

This should not be the case, where do you see it being left to -1? Or do you mean in the unpack benchmark themselves?

@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented Mar 12, 2026

That's in our internal codebase where the unpack function is called somewhere else. I made a mistake by not setting it explicitly :)

@HuaHuaY
Copy link
Copy Markdown
Contributor

HuaHuaY commented Mar 13, 2026

We tested using a Parquet file with data generated from SSB Flat. We found that the less the number of distinct values, the more the performance regression.

  • parquet-scan-main is compiled from the latest commit of the main branch.
  • parquet-scan-old is compiled from 64f2055ffb68e5077420f4253e76d78952438cab which is the previous commit of this PR on the main branch.

Both are compiled in release mode with -DARROW_RUNTIME_SIMD_LEVEL=AVX2 cmake flag.

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu ARROW_RUNTIME_SIMD_LEVEL=AVX2 hyperfine -w 5 -r 20 --sort mean-time "cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0" "cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0" 
Benchmark 1: cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      31.1 ms ±   0.3 ms    [User: 27.2 ms, System: 3.7 ms]
  Range (min … max):    30.7 ms …  31.8 ms    20 runs
 
Benchmark 2: cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0
  Time (mean ± σ):      22.8 ms ±   0.4 ms    [User: 19.3 ms, System: 3.3 ms]
  Range (min … max):    22.2 ms …  23.3 ms    20 runs
 
Summary
  cpp/out/build/ninja-release/release/parquet-scan-old --columns=6 /dev/shm/TableSink0 ran
    1.37 ± 0.02 times faster than cpp/out/build/ninja-release/release/parquet-scan-main --columns=6 /dev/shm/TableSink0

❯ env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LLVM_ROOT/lib/x86_64-unknown-linux-gnu cpp/out/build/ninja-release/release/parquet-reader --only-metadata /dev/shm/TableSink0 
File Name: /dev/shm/TableSink0
Version: 2.6
Created By: cz-cpp version BuildInfo:GitBranch:release/20240820_rc8,GitVersion:73a4383,BuildTime:1725298762,CloudEnv:ALIYUN
Total rows: 6250733
Number of RowGroups: 1
Number of Real Columns: 40
Number of Columns: 40
Number of Selected Columns: 40
......
Column 6: lo_orderpriority (BYTE_ARRAY / String / UTF8)
......
--- Row Group: 0 ---
--- Total Bytes: 1016215218 ---
--- Total Compressed Bytes: 552911018 ---
--- Sort Columns:
column_idx: 5, descending: 0, nulls_first: 1
column_idx: 0, descending: 0, nulls_first: 1
--- Rows: 6250733 ---
......
Column 6
  Values: 6250733, Null Values: 0, Distinct Values: 5
  Max (exact: unknown): 5-LOW, Min (exact: unknown): 1-URGENT
  Compression: LZ4_RAW, Encodings: PLAIN(DICT_PAGE) RLE_DICTIONARY
  Uncompressed Size: 2267694, Compressed Size: 2092132
......

Update from @HuaHuaY's comment: we just located the performance regression comes from the API change of the internal unpack function introduced by #47994. The quick fix is to explicitly set UnpackOptions::max_read_bytes to not use its default value -1. @AntoinePrv @pitrou

I tested by arrow's parquet-scan target and this argument has already been set to max_read_bytes_ - bytes_fully_read at cpp/src/arrow/util/rle_encoding_internal.h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants