Is your feature request related to a problem? Please describe.
The s3 source includes two codes in 1.5 and a new codec for CSV processing is coming in 2.0. These populate Events somewhat differently.
newline-delimited -> The newline is saved to the message key of the Event. This is a single string.
json -> The JSON is expanded into message. So, if the JSON has a key named sourceIp, it is populated in /message/sourceIp.
csv -> Each key is expanded directly into the root of the Event (/). Thus, if the CSV has a key named sourceIp, it is populated in /sourceIp.
Also, the s3 processor adds two special keys to all Events: bucket and key. These indicate the S3 bucket and key, respectively, for the object. The S3 Processor populates this, not the Codecs.
Describe the solution you'd like
First, all codecs should put the data in the same place consistently. Second, we should decide where we want this data to reside (/message or /). Third, it should avoid conflicting with the bucket and key.
One possible solution is to change the s3 source to save the bucket and key to a top-level object named s3. Then the codecs save to the root (/). This could lead to conflicts if the actual data has a column or field named s3. But, if we make this key configurable, then pipeline authors could potentially avoid this.
Describe alternatives you've considered (Optional)
An alternative would be more robust support for Event metadata. The bucket and key could be saved as metadata. However, Data Prepper's conditional routing and processors don't support Event metadata presently.
Additional context
Is your feature request related to a problem? Please describe.
The
s3source includes two codes in 1.5 and a new codec for CSV processing is coming in 2.0. These populate Events somewhat differently.newline-delimited-> The newline is saved to themessagekey of the Event. This is a single string.json-> The JSON is expanded intomessage. So, if the JSON has a key namedsourceIp, it is populated in/message/sourceIp.csv-> Each key is expanded directly into the root of the Event (/). Thus, if the CSV has a key namedsourceIp, it is populated in/sourceIp.Also, the
s3processor adds two special keys to all Events:bucketandkey. These indicate the S3 bucket and key, respectively, for the object. The S3 Processor populates this, not the Codecs.Describe the solution you'd like
First, all codecs should put the data in the same place consistently. Second, we should decide where we want this data to reside (
/messageor/). Third, it should avoid conflicting with thebucketandkey.One possible solution is to change the
s3source to save thebucketandkeyto a top-level object nameds3. Then the codecs save to the root (/). This could lead to conflicts if the actual data has a column or field nameds3. But, if we make this key configurable, then pipeline authors could potentially avoid this.Describe alternatives you've considered (Optional)
An alternative would be more robust support for Event metadata. The bucket and key could be saved as metadata. However, Data Prepper's conditional routing and processors don't support Event metadata presently.
Additional context