Skip to content

Comments

Make the training data properly public#527

Merged
dirkgr merged 6 commits intomainfrom
PublicTrainingData
Apr 3, 2024
Merged

Make the training data properly public#527
dirkgr merged 6 commits intomainfrom
PublicTrainingData

Conversation

@dirkgr
Copy link
Member

@dirkgr dirkgr commented Mar 26, 2024

It is not fast like this, because we make a lot of small range requests per batch, but it works.

@dirkgr dirkgr requested review from 2015aroras and epwalsh March 26, 2024 18:09
Copy link
Collaborator

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It's nice to see OLMo even easier to use.

nit: Can you remove/update the "Once you've updated the data paths in the config..." part of the README?

datasets:
v3-small-c4_en-validation:
- r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy
- http://olmo-data.org/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, any reason for http instead of https?

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also just curious why HTTP and not HTTPS.

@dirkgr dirkgr merged commit 9a0a84a into main Apr 3, 2024
@dirkgr dirkgr deleted the PublicTrainingData branch April 3, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants