feat(transform): Add FormatFromParquet Transform#302
feat(transform): Add FormatFromParquet Transform#302andrew-kline merged 1 commit intobrexhq:mainfrom
Conversation
| rows := make([]any, reader.NumRows()) | ||
| for { | ||
| if n, err := reader.Read(rows); err != nil { | ||
| if err.Error() == "EOF" || n == 0 { |
There was a problem hiding this comment.
If err is not nil and not EOF, should it break or return an error?
There was a problem hiding this comment.
Currently this returns the error (line 66) if the error is not EOF or skips the file if it's empty (mimics the behavior here).
| reader := parquet.NewGenericReader[any](fi) | ||
| defer reader.Close() | ||
|
|
||
| rows := make([]any, reader.NumRows()) |
There was a problem hiding this comment.
Does it make sense to add a batchSize here to be careful with larger files and memory issues (or with scalability in mind)?
The Read() function seems like it can accept a chunk size rather than processing all rows together 🤔
https://deepwiki.com/parquet-go/parquet-go/2-reading-parquet-files?utm_source=chatgpt.com#:~:text=Person%2C%20100)-,//%20Read%20in%20batches,-for%20%7B%0A%20%20%20%20n
There was a problem hiding this comment.
It's the same here as other file readers -- there's no protection from reading large files. I haven't observed this as an issue in production deployments but that issue could exist in multiple places throughout the source code, so it may be worth a refactor later if there's interest in it.
Mallika05
left a comment
There was a problem hiding this comment.
Added a couple questions to clarify!
Description
Motivation and Context
This is similar to the formatFromZip transform in that a Parquet file is made up of rows which are converted to Substation's internal message format. This works out of the box with existing applications since Parquet is not a text-based file format.
How Has This Been Tested?
Added unit tests and an example config.
Types of changes
Checklist: