Skip to content

Fix: decode JSON type before to_list or to_dict is called#8137

Open
ItsTania wants to merge 2 commits intohuggingface:mainfrom
ItsTania:fix/accessing-json-types-from-list
Open

Fix: decode JSON type before to_list or to_dict is called#8137
ItsTania wants to merge 2 commits intohuggingface:mainfrom
ItsTania:fix/accessing-json-types-from-list

Conversation

@ItsTania
Copy link
Copy Markdown

@ItsTania ItsTania commented Apr 16, 2026

Motivation:
There is a change in the approach to decoding JSON types, added in d560b58. Since 4.7.0, to_list() returns raw JSON strings for columns stored as Json(), while direct access via getitem returns dicts.

In versions prior to 4.7.0, to_list() returned dicts.

Added context:

This PR contains:

  • An update to the to_list and to_dict functions for the arrow dataset inspired by a (and using the utils developed for) similar fix to another problem caused by the addition of the JSON type.
  • Unit tests specifying the intended outcome (which was how dictionaries were processed or handled prior to v4.7.0).

Additional questions/considerations:

  • This PR restore behaviour prior to the breaking change introduced by the addition of the JSON datatype in v4.7.0. However, does mean we now decode one data type (JSON) and not others. We could decide to decode elements of the list in the same way _get_item does but this would be a big change (i.e memory concerns with images and audio etc etc)
  • I implemented the PR with a lazy import - just checking that's in line with the repo's design/ vibe?

@ItsTania ItsTania changed the title Fix/accessing json types from list Fix: decode JSON type before to_list or to_dict is called Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant