Skip to content

Document the usage of temp rather than $HOME for keeping temporary data files #218

@jnywong

Description

@jnywong

Message from Yuvi on Slack:

... The issue here is that writing data to $HOME is very slow, and it is also shared across all users. The cell that got stuck was trying to write ~1GB of data to $HOME, and when spread across the 100+ users, it turned everything super slow! This is one of the reasons 'cloud native' workflows directly doing object storage are faster, because they don't have to touch possibly slow local disks. $HOME is designed to store code, rather than data.

The solution here is to use the temporary directory to keep temporary data files. These will reset each time the user server restarts, and are also much faster. Plus they are not shared across all users. This also works across local machines and any cloud providers. The python tempfile standard library module is probably very helpful here.

So the upshot here is don't use $HOME to store data. It also means it doesn't get cleaned up, and will cost money sort of indefinitely into the future as well. Plus, it leads to issues when doing workshops. Use tempfile if you need to download data locally.

I hope this was helpful! I think it'll also be helpful for this to be set up in some sort of outside documentation, but not sure where.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions