Skip to content

Conversation

@PavelCz
Copy link
Member

@PavelCz PavelCz commented Aug 25, 2022

Implement a reward learning algorithm that trains with supervised learning on a dataset. This simple CNN learns to predict rewards from next_states. For simplicity we train on trajectories from expert policies.

Also some formatting changes

@PavelCz
Copy link
Member Author

PavelCz commented Aug 25, 2022

Hi @Rocamonde , Adam mentioned that you could help out with the engineering side on this project.
This repo is a fork of reward-preprocessing which is heavily based on imitation.
When I'm developing things I'm also trying to keep things as compatible as possible with imitation.

I'm looking for feedback on the general engineering and whether there are any things that jump out as sub-optimal.

@PavelCz PavelCz requested a review from AdamGleave August 25, 2022 02:04
@PavelCz PavelCz requested a review from Rocamonde August 25, 2022 02:04
@Rocamonde
Copy link
Member

Hi @PavelCz, thanks for your message. I am not familiar at all with this project, and there is some initial fixed cost for me before I am able to give detailed advice on PRs for it. (not even sure what the project does). I am still more than happy to take a look at your PR and point out any potential improvements, and will do so now, but you probably want someone who is an active contributor to take a second look.

Copy link
Member

@Rocamonde Rocamonde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

@PavelCz PavelCz requested a review from Rocamonde September 2, 2022 20:20
@PavelCz
Copy link
Member Author

PavelCz commented Sep 2, 2022

Hi @Rocamonde , I think I addressed all of your comments.
The idea is for the project to most likely make use of Erik's reward-preprocessing in the future, that's why it is a fork of his project. However, at this point there is basically no interaction with the already existing code, so anything that I am interested in for now is really in this PR and this PR is basically the initial commit.
I did use both imitation and this repo as inspiration for how to engineer this code

Copy link
Member

@Rocamonde Rocamonde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor suggestion. Also, seems like there's no CI checks in place, you might want to consider adding tests for these changes and running the CI on every commit, I haven't done this locally so don't know if tests are passing. Conditional on CI being okay, I think it should be fine to merge.

)

generator = None
if seed is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could optionally set the generator directly in your if-else above, but it's also fine this way.

@PavelCz
Copy link
Member Author

PavelCz commented Sep 4, 2022

Thanks a lot!
Todos for this PR:

  • Add tests for new code
  • Run tests

I will add automatic CI in a separate PR.

@PavelCz PavelCz mentioned this pull request Sep 15, 2022
@PavelCz
Copy link
Member Author

PavelCz commented Sep 15, 2022

I ran the existing tests.
I'll go ahead and merge so we're not blocked too long on this PR. I'll add tests in a separate PR (see issue #6).

@PavelCz PavelCz merged commit 96aa262 into main Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants