Inspect training data without data indices#593
Merged
2015aroras merged 10 commits intomainfrom May 24, 2024
Merged
Conversation
epwalsh
reviewed
May 23, 2024
| os.environ["FS_LOCAL_RANK"] = "1" | ||
|
|
||
| for step in steps: | ||
| dataloader = build_train_dataloader(cfg, world_size=world_size) |
Member
There was a problem hiding this comment.
This would have to rebuild the indices file every time, right? That could be slow, but we could probably avoid rebuilding for every rank.
Collaborator
Author
There was a problem hiding this comment.
Setting FS_LOCAL_RANK=1 avoids rebuilding the indices file every time since it's only down for local FS rank 0.
# Set FS_LOCAL_RANK to a non-zero number so that global data indices are not rewritten
Member
There was a problem hiding this comment.
Ah, right. Could you just add a comment explaining that for future reference?
epwalsh
approved these changes
May 24, 2024
| os.environ["FS_LOCAL_RANK"] = "1" | ||
|
|
||
| for step in steps: | ||
| dataloader = build_train_dataloader(cfg, world_size=world_size) |
Member
There was a problem hiding this comment.
Ah, right. Could you just add a comment explaining that for future reference?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR updates the
inspect_train_data.pyscript to enable inspecting training data when the device data indices are not present. Our runs save these indices locally but not in remote storage. The implementation has the following advantages:The implementation is such that the script will default to the original behavior when data indices are present.