Skip to content

feat(transform): Add FormatFromZip Transform#221

Merged
jshlbrd merged 2 commits intomainfrom
jshlbrd/feat/tf-fmt-from-zip
Jul 30, 2024
Merged

feat(transform): Add FormatFromZip Transform#221
jshlbrd merged 2 commits intomainfrom
jshlbrd/feat/tf-fmt-from-zip

Conversation

@jshlbrd
Copy link
Contributor

@jshlbrd jshlbrd commented Jul 27, 2024

Description

  • Adds support for non-text (binary) files to all file reading apps
  • Adds FormatFromZip transform for unarchiving Zip files

Motivation and Context

This adds basic support for unarchiving Zip files (mentioned in #219). Most data processing systems don't work on archive files, so this doesn't add a complementary FormatToZip transform (that would require much more design work).

The more important addition in this PR is support for non-text files -- in pre-v1.0 this behavior was configurable using an environment variable, but now it's dynamic based on media (file) type. This could go in two directions in the future:

  • Add more transforms like FormatFromZip (these can be configurable)
  • Add support for dynamic unarchiving (similar to existing decompression) (these cannot be configurable)

I'm inclined to keep the existing text support as-is (with decompression) and lean into adding more transforms -- the use cases for reading binary files is limited (most users are working with text files) and recursively unarchiving / decompressing files may become a challenge over time.

How Has This Been Tested?

  • Added unit tests for the transform.
  • Added integration test (examples/config/transform/format/zip/)
  • This has been end to end tested in several production pipelines.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jshlbrd jshlbrd marked this pull request as ready for review July 30, 2024 19:33
@jshlbrd jshlbrd requested a review from a team as a code owner July 30, 2024 19:33
@jshlbrd jshlbrd merged commit d9304ca into main Jul 30, 2024
@jshlbrd jshlbrd deleted the jshlbrd/feat/tf-fmt-from-zip branch July 30, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants