Skip to content

faster duplicate_overlapping#69

Merged
PSeitz merged 1 commit intomainfrom
the_real_main
Jan 31, 2023
Merged

faster duplicate_overlapping#69
PSeitz merged 1 commit intomainfrom
the_real_main

Conversation

@PSeitz
Copy link
Owner

@PSeitz PSeitz commented Jan 31, 2023

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.

improve duplicate_overlapping unsafe version. The compiler generates unfavourable assembly for the simple version.
Now we copy 4 bytes, instead of one in every iteration.
Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
This is not what we want, as large overlapping copies are not that common.
@PSeitz PSeitz merged commit febf558 into main Jan 31, 2023
PSeitz added a commit that referenced this pull request Apr 30, 2023
fixes checked decode checks
revert #69 as this leads to out of bounds writes
@PSeitz PSeitz mentioned this pull request Apr 30, 2023
PSeitz added a commit that referenced this pull request Apr 30, 2023
fixes checked decode checks
revert #69 as this leads to out of bounds writes
PSeitz added a commit that referenced this pull request May 27, 2023
This is another attempt to replace the aggressive compiler after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.
PSeitz added a commit that referenced this pull request May 27, 2023
This is another attempt to replace the aggressive compiler after the
failed attempt #69 (wrote out of bounds in some cases)

The unrolling is avoided by manually unrolling less aggressive.
Decompression performance is slightly improved by ca 4%, except the
smallest test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant