Skip to content

valid metadata results in Invalid metadata$r warning from read_feather() #32729

@asfimport

Description

@asfimport

I have some C# code using the Arrow 9.0.0 nuget to create record batches like

Dictionary<string, string> metadata = new()
{` {}    { "resourceUnit", "foo" }{{}}{},{}     // other keys... }; Schema schema = new(fields, metadata);`

For some reason using the key "resourceUnit" results in  arrow::read_feather() in R failing in .deserialize_arrow_r_metadata(), triggering the warning

Warning message:
Invalid metadata$r 

There are at least five issues here:

  1. .deserialize_arrow_r_metadata()'s error handler swallows the actual error, leaving the caller without any information as to what's breaking

  2. The error handler commutes the error to a warning without any caller control.

  3. It's unclear why there's an R metadata deserialization path when read_feather(as_data_frame = FALSE) deserializes the metadata without issue to $metadata.

  4. The warning is confusing as the deserialized fragment goes in $r_metadata, not $r.

  5. "resourceUnit" should be a perfectly valid UTF8 string and deserialize without issue. Probing shows the "resource" bit is the problem and, if I change it to something like "_esourceUnit" no error/warning occurs on deserialization. I also have C# generating other feather files with "resourceUnit" as the first metadata key and those files deserialize without the error/warning in R. This suggests the root issue might something in the direction of alignment fragility in the schema portion of the stream.

I can't share the file publicly and the code hasn't pushed to github yet but both should be available by the time someone's ready to look at this. Just bump the issue and let me know.

(I think this is a normal priority issue but normal isn't available in the priority dropdown.)

Reporter: Todd West

Note: This issue was originally created as ARROW-17466. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions