Skip to content

feat: Customizable Sink Files#93

Merged
jshlbrd merged 18 commits intomainfrom
jshlbrd/s3-path
Apr 18, 2023
Merged

feat: Customizable Sink Files#93
jshlbrd merged 18 commits intomainfrom
jshlbrd/s3-path

Conversation

@jshlbrd
Copy link
Contributor

@jshlbrd jshlbrd commented Apr 13, 2023

Description

  • Adds support for dynamic file formats, compression, and paths
  • Adds a file sink to write data to local disk
  • Updates the AWS S3 sink to support new settings (backcompat with current version)
  • Adds file and http sink support to the force-sink flag in the dev app

Motivation and Context

Motivation and context is described in #91. Here's the impact of this PR:

  • Backwards compatible with current version (non-breaking change), but replaces existing configurations
    • Existing file path configs in the AWS S3 sink will be removed in v1.0.0 (whenever that is)
  • By default the behavior of the current version is preserved in this change
    • Default file paths are [year]/[month]/[day]/[uuid].gz, file format is JSON, file compression is Gzip
  • Every configuration in the custom file path is optional, which will likely lead to unintentional outcomes if users aren't careful
    • Most likely outcome is accidentally overwriting previously written data
    • Not clear how much responsibility the code should take to keep users safe from themselves -- the safest thing to do is always use the default settings (an empty file path config)
  • Multiple settings that were hard-coded in the AWS S3 sink were abstracted into a private struct called fw (file wrapper), this provides support for additional custom file format and compression

The file sink was added to test the usefulness of abstracting settings into a reusable struct versus keeping it only for the AWS S3 sink. In my opinion it works fine, and can be reused for other cloud service providers in the future.

How Has This Been Tested?

Locally integration tested all filePath config options using AWS S3 and local files.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jshlbrd jshlbrd changed the title feat: Customizable Sink File Paths feat: Customizable Sink Files Apr 13, 2023
@jshlbrd jshlbrd marked this pull request as ready for review April 13, 2023 18:41
@jshlbrd jshlbrd requested a review from a team as a code owner April 13, 2023 18:41
Copy link
Contributor

@julieagnessparks julieagnessparks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

//
// If the struct is empty, then this returns an empty string. The caller is
// responsible for creating a default file path if needed.
func (p filePath) New() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test: could we add a test case for this logic that shows, verifies it can match selective data processing patterns/requirements? (like the previous default, AWS Glue, etc)

Copy link
Contributor Author

@jshlbrd jshlbrd Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test that covers some functionality. It's worth mentioning that it can't catch every variation of the struct due to the use of UUID and time.

@jshlbrd jshlbrd requested a review from shellcromancer April 17, 2023 20:21
@jshlbrd jshlbrd merged commit bee2463 into main Apr 18, 2023
@jshlbrd jshlbrd deleted the jshlbrd/s3-path branch April 18, 2023 15:42
@jshlbrd jshlbrd mentioned this pull request Apr 22, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants