Add support for zstd-compression#1786
Merged
danielmitterdorfer merged 1 commit intoelastic:masterfrom Sep 27, 2023
Merged
Conversation
With this commit we add support for zstd compressed corpora. Compared to bzip, the zstd format produces compressed files that are roughly 40% smaller and took around a third of the time to decompress in our tests. Closes elastic#1781
pquentin
approved these changes
Sep 27, 2023
Member
pquentin
left a comment
There was a problem hiding this comment.
I haven't tested it but I assume you have and the code looks good to me.
I also checked that you're using the fastest python-zstandard API and the only one that supports reading across ZSTD frames. (This was an issue in urllib3 which had to use the standard library abstraction which is slower and where we need to pass read_across_frames=True explicitly!)
Member
Author
|
Thanks! Yes, I've tested both cases (with the library and the native binary). Also, the library I've picked seems to be the most mature of them. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With this commit we add support for zstd compressed corpora. Compared to bzip, the zstd format produces compressed files that are roughly 40% smaller and took around a third of the time to decompress in our tests.
Closes #1781