Add basic eval table logging for WandbCallback#31050
Add basic eval table logging for WandbCallback#31050andrewtruong wants to merge 9 commits intohuggingface:mainfrom
Conversation
|
I'm not sure how to get the failing tests to pass, even on Failing tests are:
|
|
This PR also makes 2 changes to the trainer internals that could be split into their own PRs: |
|
Hi @amyeroberts! Sorry to bother -- are you the right person to take a look at this? |
|
Hi @andrewtruong, thanks for opening this PR! I'm going bring in @muellerzr here, who knows far more about Trainer and its interactions with W&B. One thing to note, is that integrations like W&B aren't actively maintained by the Hugging Face team. Looking at the PR, I'm not sure we want to add attributes to core objects like Trainer to enable feature integrations, unless it would unlock other things (here @muellerzr can definitely advise!) |
|
Thanks! I'm hoping these are small and reasonable changes:
|
|
Hey @muellerzr, any chance you can take a look this week? I'd love to get this in :) |
muellerzr
left a comment
There was a problem hiding this comment.
On paper this seems okay, let's try our best to come up with a solution that doesn't involve self-referencing the trainer if possible though please
src/transformers/trainer.py
Outdated
There was a problem hiding this comment.
Some things to be very careful with, I'd appreciate checking the memory usage before/after this change. To make sure we don't have a memory leak, and we don't increase the VRAM used by the user by utilizing this
There was a problem hiding this comment.
Any profiling tools you recommend / would want to see?
There was a problem hiding this comment.
wandb logs work just fine :)
There was a problem hiding this comment.
What's the update here?
There was a problem hiding this comment.
Where is eval_loop_output coming from exactly?
There was a problem hiding this comment.
Thanks for catching, I missed this commit. It's added here:
b8d5c6e
src/transformers/trainer.py
Outdated
There was a problem hiding this comment.
Is there a better way we can do this rather than adding the trainer to self here? I'm not a fan of this because otherwise if users are saving states, they have an entire reference to the trainer in there, not good.
There was a problem hiding this comment.
I didn't love this either, but it's a tradeoff.
- The current Trainer reporting interface looks like
report_to="wandb"which is fine if your callback doesn't need any args/kwargs, but in this case we do (the tokenizer, dataset, etc.) and the one object that has all of these is the Trainer. - The alternative is also implemented in this PR, but you can't pass
report_to="wandb". I think counter-intuitively you have to NOT report to wandb. Instead, you need to instantiate a callback and manually pass it in -- not the end of the world, but it didn't seem idiomatic.
trainer = Trainer(...)
wandb_callback = WandbCallback(..., tokenizer=..., dataset=...)
trainer.add_callback(wandb_callback)
There was a problem hiding this comment.
I'd much prefer the non-idiomatic way please
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
andrewtruong
left a comment
There was a problem hiding this comment.
whoops, just realized this was stuck in "pending review" instead of actually posting.
src/transformers/trainer.py
Outdated
There was a problem hiding this comment.
Any profiling tools you recommend / would want to see?
src/transformers/trainer.py
Outdated
There was a problem hiding this comment.
I didn't love this either, but it's a tradeoff.
- The current Trainer reporting interface looks like
report_to="wandb"which is fine if your callback doesn't need any args/kwargs, but in this case we do (the tokenizer, dataset, etc.) and the one object that has all of these is the Trainer. - The alternative is also implemented in this PR, but you can't pass
report_to="wandb". I think counter-intuitively you have to NOT report to wandb. Instead, you need to instantiate a callback and manually pass it in -- not the end of the world, but it didn't seem idiomatic.
trainer = Trainer(...)
wandb_callback = WandbCallback(..., tokenizer=..., dataset=...)
trainer.add_callback(wandb_callback)
4235289 to
a80db7a
Compare
|
Hey @muellerzr any feedback? |
muellerzr
left a comment
There was a problem hiding this comment.
Thanks! This looks much better :)
|
@andrewtruong can you fix the conflicts then @amyeroberts can give it a final review! |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for working on this!
At the moment, there's too many assumptions hard-coded about the model being a text-based model. I'd suggest maybe a new, text-specific callback class which we can safely make these assumptions
src/transformers/trainer.py
Outdated
There was a problem hiding this comment.
What's the update here?
| self.trainer = trainer | ||
|
|
||
| if tokenizer is None: | ||
| tokenizer = self.trainer.tokenizer |
There was a problem hiding this comment.
This assumes trainer is not None
|
|
||
| if tokenizer is None: | ||
| tokenizer = self.trainer.tokenizer | ||
| self.tokenizer = tokenizer |
There was a problem hiding this comment.
Do we want to set even if tokenizer is None
| self.tokenizer = tokenizer | ||
|
|
||
| if dataset is None: | ||
| dataset = self.trainer.eval_dataset |
There was a problem hiding this comment.
Same here - assumes self.trainer and self.trainer.dataset is not None
| """ | ||
|
|
||
| def __init__(self): | ||
| def __init__( |
There was a problem hiding this comment.
There should be a docstring for all these args. In particular num_samples and freq which aren't obvious
| try: | ||
| sampled_dataset = dataset.select(range(num_samples)) | ||
| except IndexError as e: | ||
| print(f"WARNING: Could not get those indices: {e=}") |
There was a problem hiding this comment.
Could we make this a bit clearer - the user never specifies indices. so it's a bit weird to refer to them as "those indices". Maybe something along the lines of "Could not select {num_sample=} rows from the dataset"
| dataset = self.trainer.eval_dataset | ||
|
|
||
| try: | ||
| sampled_dataset = dataset.select(range(num_samples)) |
There was a problem hiding this comment.
This assumes dataset is not None
| print(f"WARNING: Could not get those indices: {e=}") | ||
| sampled_dataset = dataset | ||
|
|
||
| self.sample_dataset = sampled_dataset |
There was a problem hiding this comment.
Do we want to store both the full and sampled dataset?
| if ignore_tokens is None: | ||
| ignore_tokens = [-100] | ||
|
|
||
| padding_token_id = self.tokenizer.pad_token_id |
There was a problem hiding this comment.
Assumes tokenizer is not None. Confusingly, the tokenizer for Trainer may not be a tokenizer at all c.f. #32385
| a[mask] = padding_token_id | ||
| return a | ||
|
|
||
| self._replace_ignored_tokens_func = replace_ignored_tokens |
There was a problem hiding this comment.
Why do we need this functionality in the callback?
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR adds basic support for logging raw evals using
WandbCallback.trainer.evaluate()will log an eval table with inputs and outputs; andWANDB_LOG_EVALSenv var.This also adds some changes to trainer internals to support this, including:
EvalLoopOutputsto includeinputsHere's an example table that gets logged when the user calls

trainer.evaluate()Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts are you the right person to review?