-
Notifications
You must be signed in to change notification settings - Fork 0
Implement reward model trained with supervised learning #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
by using flatten_trajectories_with_rew()
|
Hi @Rocamonde , Adam mentioned that you could help out with the engineering side on this project. I'm looking for feedback on the general engineering and whether there are any things that jump out as sub-optimal. |
|
Hi @PavelCz, thanks for your message. I am not familiar at all with this project, and there is some initial fixed cost for me before I am able to give detailed advice on PRs for it. (not even sure what the project does). I am still more than happy to take a look at your PR and point out any potential improvements, and will do so now, but you probably want someone who is an active contributor to take a second look. |
Rocamonde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
Co-authored-by: Juan Rocamonde <[email protected]>
Co-authored-by: Juan Rocamonde <[email protected]>
|
Hi @Rocamonde , I think I addressed all of your comments. |
Rocamonde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor suggestion. Also, seems like there's no CI checks in place, you might want to consider adding tests for these changes and running the CI on every commit, I haven't done this locally so don't know if tests are passing. Conditional on CI being okay, I think it should be fine to merge.
| ) | ||
|
|
||
| generator = None | ||
| if seed is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could optionally set the generator directly in your if-else above, but it's also fine this way.
|
Thanks a lot!
I will add automatic CI in a separate PR. |
|
I ran the existing tests. |
Implement a reward learning algorithm that trains with supervised learning on a dataset. This simple CNN learns to predict rewards from next_states. For simplicity we train on trajectories from expert policies.
Also some formatting changes