Skip to content

Create a model for binary data #4096

@dlvenable

Description

@dlvenable

Is your feature request related to a problem? Please describe.

Data Prepper has sources which can pull binary data (mostly in base64) format. And we are adding some new processors which can decompress binary data. It would be good to handle binary data consistently so that we don't too much code spread across the project which will result in some processor combinations breaking a pipeline.

I'd like Data Prepper's sources and sinks to know their own encodings as much as possible.

Describe the solution you'd like

Create a new BinaryData model in data-prepper-api. Allow this to be set and retrieved from the Event model. This model can also be designed to avoid unnecessary encoding/decoding.

When a Data Prepper source gets binary data, it wraps it in the BinaryData model. Similarly, when writing to a sink use that same model.

There are some situations where the source cannot know the encoding. For example, JSON could have binary data encoded as base64 or base64. In such cases, the pipeline author will need to know the encoding and convert it accordingly.

class BinaryData {
  public byte[] getBinaryData();
  
  public static fromBase64Data(String base64) { ... }
}

There may also be an good way to decouple the binary data from the encoding itself.

Describe alternatives you've considered (Optional)

There may some useful third party libraries that have a similar solution we could make use of. Though, I'd still propose we keep our interface and use that for the internals.

Additional context

Coming from this comment: #4016 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Unplanned

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions