feat(transform): Add FormatFromParquet Transform by jshlbrd · Pull Request #302 · brexhq/substation

jshlbrd · 2025-06-17T14:20:47Z

Description

Adds a transform for reading Parquet files into messages

Motivation and Context

This is similar to the formatFromZip transform in that a Parquet file is made up of rows which are converted to Substation's internal message format. This works out of the box with existing applications since Parquet is not a text-based file format.

How Has This Been Tested?

Added unit tests and an example config.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

Mallika05 · 2025-06-30T17:28:03Z

transform/format_from_parquet.go

+	rows := make([]any, reader.NumRows())
+	for {
+		if n, err := reader.Read(rows); err != nil {
+			if err.Error() == "EOF" || n == 0 {


If err is not nil and not EOF, should it break or return an error?

Currently this returns the error (line 66) if the error is not EOF or skips the file if it's empty (mimics the behavior here).

Mallika05 · 2025-06-30T17:51:08Z

transform/format_from_parquet.go

+	reader := parquet.NewGenericReader[any](fi)
+	defer reader.Close()
+
+	rows := make([]any, reader.NumRows())


Does it make sense to add a batchSize here to be careful with larger files and memory issues (or with scalability in mind)?
The Read() function seems like it can accept a chunk size rather than processing all rows together 🤔
https://deepwiki.com/parquet-go/parquet-go/2-reading-parquet-files?utm_source=chatgpt.com#:~:text=Person%2C%20100)-,//%20Read%20in%20batches,-for%20%7B%0A%20%20%20%20n

It's the same here as other file readers -- there's no protection from reading large files. I haven't observed this as an issue in production deployments but that issue could exist in multiple places throughout the source code, so it may be worth a refactor later if there's interest in it.

Mallika05

Added a couple questions to clarify!

feat(transform): Add FormatFromParquet Transform

a2a5327

jshlbrd marked this pull request as ready for review June 23, 2025 15:44

jshlbrd requested a review from a team as a code owner June 23, 2025 15:44

Mallika05 reviewed Jun 30, 2025

View reviewed changes

Mallika05 reviewed Jul 2, 2025

View reviewed changes

andrew-kline approved these changes Jan 3, 2026

View reviewed changes

andrew-kline merged commit 64928e1 into brexhq:main Jan 3, 2026
4 checks passed

github-actions bot mentioned this pull request Jan 3, 2026

chore(main): release 2.8.0 #314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transform): Add FormatFromParquet Transform#302

feat(transform): Add FormatFromParquet Transform#302
andrew-kline merged 1 commit intobrexhq:mainfrom
jshlbrd:jshlbrd/feat/fmt-from-parquet

jshlbrd commented Jun 17, 2025

Uh oh!

Mallika05 Jun 30, 2025

Uh oh!

jshlbrd Aug 3, 2025

Uh oh!

Mallika05 Jun 30, 2025 •

edited

Loading

Uh oh!

jshlbrd Aug 3, 2025

Uh oh!

Mallika05 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jshlbrd commented Jun 17, 2025

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Mallika05 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

jshlbrd Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Mallika05 Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jshlbrd Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Mallika05 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mallika05 Jun 30, 2025 •

edited

Loading