Implement reward model trained with supervised learning #5

PavelCz · 2022-08-25T01:56:30Z

Implement a reward learning algorithm that trains with supervised learning on a dataset. This simple CNN learns to predict rewards from next_states. For simplicity we train on trajectories from expert policies.

Also some formatting changes

by using flatten_trajectories_with_rew()

PavelCz · 2022-08-25T02:04:40Z

Hi @Rocamonde , Adam mentioned that you could help out with the engineering side on this project.
This repo is a fork of reward-preprocessing which is heavily based on imitation.
When I'm developing things I'm also trying to keep things as compatible as possible with imitation.

I'm looking for feedback on the general engineering and whether there are any things that jump out as sub-optimal.

Rocamonde · 2022-08-25T09:48:11Z

Hi @PavelCz, thanks for your message. I am not familiar at all with this project, and there is some initial fixed cost for me before I am able to give detailed advice on PRs for it. (not even sure what the project does). I am still more than happy to take a look at your PR and point out any potential improvements, and will do so now, but you probably want someone who is an active contributor to take a second look.

Rocamonde

Some minor comments.

src/reward_preprocessing/models.py

src/reward_preprocessing/trainers/supervised.py

src/reward_preprocessing/train_regression.py

src/reward_preprocessing/trainers/supervised.py

Co-authored-by: Juan Rocamonde <[email protected]>

PavelCz · 2022-09-02T20:22:20Z

Hi @Rocamonde , I think I addressed all of your comments.
The idea is for the project to most likely make use of Erik's reward-preprocessing in the future, that's why it is a fork of his project. However, at this point there is basically no interaction with the already existing code, so anything that I am interested in for now is really in this PR and this PR is basically the initial commit.
I did use both imitation and this repo as inspiration for how to engineer this code

Rocamonde

LGTM. Minor suggestion. Also, seems like there's no CI checks in place, you might want to consider adding tests for these changes and running the CI on every commit, I haven't done this locally so don't know if tests are passing. Conditional on CI being okay, I think it should be fine to merge.

Rocamonde · 2022-09-03T17:36:45Z

src/reward_preprocessing/trainers/supervised.py

+            )
+
+        generator = None
+        if seed is not None:


You could optionally set the generator directly in your if-else above, but it's also fine this way.

PavelCz · 2022-09-04T04:14:31Z

Thanks a lot!
Todos for this PR:

Add tests for new code
Run tests

I will add automatic CI in a separate PR.

PavelCz · 2022-09-15T21:33:59Z

I ran the existing tests.
I'll go ahead and merge so we're not blocked too long on this PR. I'll add tests in a separate PR (see issue #6).

PavelCz added 28 commits August 12, 2022 18:59

Merge branch 'imitation-requirement' into manual-reward-model

27a2062

Start implementing CNN regression rewards

459b50b

Merge branch 'imitation-requirement' into manual-reward-model

af4e720

Continue implementing CNN regression

d09d711

Reformat using black

a6cca33

[WIP] Add CNN reward net using SB3 default features extractor

434f86e

Add ObsRewDataset

1e1a1e0

Implement CNN regression model

1e06bdb

Add train_regression to train using ObsRewDataset

27949c2

Update .gitignore

86c1903

Rename types to rfi_types to fix problems with debugging

27d9fba

Instead of using ObsRewDataset use Dataset built into imitation

8c0a6e4

by using flatten_trajectories_with_rew()

Transpose observations for CNN and fix minor problems with model

c03584d

Add test step to supervised training

809f421

Remove empty file?

49419e4

Use correct collate_fn and activate testing

7d472a7

Use trainer class instead of functions for training supervised model

5156173

Add logging using imitation logging capabilities

d981ff6

Use imitation built-in cnn (from fork) instead of my own

2e2932d

Use Daniel's fork with CNN

7d0d231

Implement saving checkpoints

596de5e

Clean up unused ingredients

6a9c6b8

Remove unused code

8514803

Clean up imports

eda07c1

Add procgenAISC requirements to poetry

d25e967

Add necessary requirements for procgen to Dockerfile

4d09878

Merge branch 'main' into manual-reward-model

fa88745

Remove util function that is not necessary anymore

5d8ee38

PavelCz requested a review from AdamGleave August 25, 2022 02:04

PavelCz requested a review from Rocamonde August 25, 2022 02:04

Rocamonde reviewed Aug 25, 2022

View reviewed changes

PavelCz and others added 10 commits August 25, 2022 11:11

Fix flake8 problems, use isort to sort imports

5cd9c09

Refer to torch as th instead of torch in code

1936e83

Fix comment

702bfd9

Co-authored-by: Juan Rocamonde <[email protected]>

Remove superfluous cast

ba9c159

Add assert to ensure obs is of right type

f18372e

Improve clarity of comment

4660590

Co-authored-by: Juan Rocamonde <[email protected]>

Change import naming

776403a

Rename main

e3cf3e1

Simplify data loading

bd028d3

Lint and format

624b062

PavelCz requested a review from Rocamonde September 2, 2022 20:20

PavelCz added 2 commits September 2, 2022 13:23

Use float instead of 1D tensor to record loss

3436dfa

Add seeding for datasets

1694337

Rocamonde reviewed Sep 3, 2022

View reviewed changes

Minor refactor to generator position

3253e17

PavelCz mentioned this pull request Sep 15, 2022

Add tests for PR #5 #6

Open

PavelCz merged commit 96aa262 into main Sep 15, 2022

Implement reward model trained with supervised learning #5

Implement reward model trained with supervised learning #5

Uh oh!

Conversation

PavelCz commented Aug 25, 2022

Uh oh!

PavelCz commented Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocamonde commented Aug 25, 2022

Uh oh!

Rocamonde left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PavelCz commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocamonde left a comment

Choose a reason for hiding this comment

Uh oh!

Rocamonde Sep 3, 2022

Choose a reason for hiding this comment

Uh oh!

PavelCz commented Sep 4, 2022

Uh oh!

PavelCz commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PavelCz commented Aug 25, 2022 •

edited

Loading

PavelCz commented Sep 2, 2022 •

edited

Loading