Skip to content

WIP - Response to Issue #292: Pandas Pickle data reader and writer#303

Merged
skrawcz merged 14 commits intoapache:mainfrom
benhhack:bh-pandas-pickle-292
Sep 8, 2023
Merged

WIP - Response to Issue #292: Pandas Pickle data reader and writer#303
skrawcz merged 14 commits intoapache:mainfrom
benhhack:bh-pandas-pickle-292

Conversation

@benhhack
Copy link
Copy Markdown
Contributor

@benhhack benhhack commented Aug 26, 2023

Response to Issue #292 which required the implementation of classes for reading and writing Pickle files with Pandas

Changes

Implemented the PandasPickleReader and PandasPickleWriter classes in pandas_extensions.py

How I tested this

I looked for the polars tests and couldn't find them and am unsure how to test these classes, please provide some assistance.

Notes

Implementation closely followed the Pandas Pickle documentation and the example Polars classes provided

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Aug 26, 2023

I looked for the polars tests and couldn't find them and am unsure how to test these classes, please provide some assistance.

Yep so good point 😅 .

Here's what I think:

  1. create a folder plugins under tests/.
  2. create a file test_pandas_extensions.py
  3. create a function (we use pytest) def test_pandas_pickle(tmp_path):, that uses the writer to write to a file in tmp_path (see pytest docs for tmp_path), and the the reader to read it. The test will exercise the code, and then validate that things are equivalent.
  4. create a folder pandas/materialization under examples. Write a simple example that uses the materializers here. My thought here is you can take the hello_world example's my_notebook.ipynb and adjust it to be more like this notebook that uses your new writer to pickle a dataframe.

How does that sound?

@benhhack
Copy link
Copy Markdown
Contributor Author

How robust would you like the testing?

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Aug 27, 2023

How robust would you like the testing?

I think the test you have is fine; the PR is looking great!

How about point (4) above? I think all that we need now is an example of usage.

@benhhack
Copy link
Copy Markdown
Contributor Author

Should have the full request completed tomorrow/Wednesday. I'm a little more confused about point 4 so would appreciate feedback.

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Aug 29, 2023

Should have the full request completed tomorrow/Wednesday. I'm a little more confused about point 4 so would appreciate feedback.

Sounds good. Yeah once you have something up, happy to comment.

@benhhack
Copy link
Copy Markdown
Contributor Author

benhhack commented Sep 1, 2023

Should have the full request completed tomorrow/Wednesday. I'm a little more confused about point 4 so would appreciate feedback.

Sounds good. Yeah once you have something up, happy to comment.

Sorry for the delay, I've put up a first attempt at the script for pandas/materializers. Was quite confused about what was going on in the materialization module so did it in a very basic way. Let me know if you have any pointers and I'll see if I can fix.

Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!
For examples, we're trying to also include a notebook version of how to run things. So after my_script.py is done, if you could create notebook file that's equivalent to it, that would be great please!

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Sep 7, 2023

@benhhack https://github.com/benhhack/hamilton/pull/1 shows the fix. If you merge it I think this PR should also update.

benhhack and others added 3 commits September 7, 2023 10:19
Fixes example to call things appropriately
Was missing an import, and removed duplicate code.
Copy link
Copy Markdown
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm calling this complete! Good job @benhhack 🚀 . Do you have a twitter or linkedin we can use for social media purposes when we talk about the release this will go out in?

@skrawcz skrawcz merged commit 3488d5a into apache:main Sep 8, 2023
@benhhack
Copy link
Copy Markdown
Contributor Author

benhhack commented Sep 8, 2023

I'm calling this complete! Good job @benhhack 🚀 . Do you have a twitter or linkedin we can use for social media purposes when we talk about the release this will go out in?

Thanks @skrawcz ! I really appreciate the support and active communication. Do you have any feedback on my work style/code/anything you can think of that could help me improve?

For social media purposes, my LinkedIn is https://www.linkedin.com/in/benjaminhhack.

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Sep 8, 2023

I really appreciate the support and active communication. Do you have any feedback on my work style/code/anything you can think of that could help me improve?

You're welcome! -- Nothing really of note. You did well with little instruction! Code passed our formatter, looked like our other code. You added a test for it. I tested it, it worked. I omitted asking you for a nice commit message/tidying up the commit history. So I can ask for that next time :)

Want to pick up more materializers? They should look pretty similar to what you just did. I can also think of a harder task.... 😀

@skrawcz
Copy link
Copy Markdown
Contributor

skrawcz commented Sep 8, 2023

You can see your contribution now showed up in our docs too --https://hamilton.dagworks.io/en/latest/reference/io/available-data-adapters/#data-savers

@benhhack
Copy link
Copy Markdown
Contributor Author

benhhack commented Sep 8, 2023

You're welcome! -- Nothing really of note. You did well with little instruction! Code passed our formatter, looked like our other code. You added a test for it. I tested it, it worked. I omitted asking you for a nice commit message/tidying up the commit history. So I can ask for that next time :)

Want to pick up more materializers? They should look pretty similar to what you just did. I can also think of a harder task.... 😀

Thanks for the feedback, always good to learn and grow. I'll get back to you re picking up other materialisers, am currently deep in the job hunt so I don't have too much spare time. Hopefully will chat again soon.

@bryangalindo
Copy link
Copy Markdown
Contributor

@benhhack hi. how did you decide which kwargs (e.g., compression, storage_options) to include in the Pandas pickle data reader/writer?

@benhhack
Copy link
Copy Markdown
Contributor Author

@benhhack hi. how did you decide which kwargs (e.g., compression, storage_options) to include in the Pandas pickle data reader/writer?

Hey, @bryangalindo , I used all of them! Or, at least, I gave the option for the user to use them all while setting defaults. I'm not really sure how this translates to practical use, you'd have to ask @skrawcz about that.

My process was to give a type hint for each of the kwargs depending on what was written in the pandas io docs. The typing library came in very handy here. I'd then just set the variables in the _get_saving_kwargs function which is called from save_data in the class.

Hope this helps, let me know if I can clarify anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants