Skip to content

Prefetch DIII-D plasmas rundb table#536

Merged
gtrevisan merged 7 commits intodevfrom
glt/rundb
Apr 2, 2026
Merged

Prefetch DIII-D plasmas rundb table#536
gtrevisan merged 7 commits intodevfrom
glt/rundb

Conversation

@gtrevisan
Copy link
Copy Markdown
Member

@gtrevisan gtrevisan commented Mar 30, 2026

DIII-D EFIT tree assignments are not static, but rather dynamic as stored in the code_rundb.dbo.plasmas table.

previously, all processes where executing individual queries for each shot.

now we prefetch the whole table with respect to a given runtag -- caching it to disk into the "daily" default temporary folder (/local-scratch/$USER/disruption-py/YYYY-MM-DD).

downstream forked processes, and further repeat executions, should be hitting the cache and skipping queries altogether.

suggested reviewers:

  • @samc24 hopefully a quick look to tell me whether the implementation makes sense (and to be aware of any possible conflict with your backend abstraction);
  • @yumouwei @ZanderKeith as DIII-D users, you should be able to quickly test functionality and usability.

we can change runtag from the default DIS as follows:

DISPY_EFIT__RUNTAG=DISPY uv run disruption-py -l debug

@gtrevisan gtrevisan added the machine: DIII-D Related to the DIII-D tokamak label Mar 30, 2026
@gtrevisan gtrevisan linked an issue Mar 30, 2026 that may be closed by this pull request
@gtrevisan gtrevisan marked this pull request as ready for review March 30, 2026 15:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces SQL load for DIII-D EFIT tree resolution by prefetching code_rundb.dbo.plasmas for the configured runtag once per run and caching the result to a per-day temp folder so forked workers (and repeat executions) can avoid per-shot queries.

Changes:

  • Trigger EFIT tree prefetch once at workflow start, before spawning the multiprocessing pool.
  • Add DIII-D-specific prefetch + CSV cache logic to DisruptionNicknameSetting, with an in-memory shot→tree lookup fast-path.
  • Tighten ShotDatabase.query typing and adjust DummyDatabase.query return behavior for use_pandas=False.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
disruption_py/workflow.py Calls nickname-setting DB prefetch before multiprocessing begins.
disruption_py/settings/nickname_setting.py Implements DIII-D plasmas table prefetch, daily CSV caching, and lookup path.
disruption_py/inout/sql.py Adds type hint to query and updates DummyDatabase.query signature/behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yumouwei
Copy link
Copy Markdown
Contributor

The code does work. However, do we need to check if the code_rundb.dbo.plasmas table has been modified when there's already a cached rundb_DIS.csv or rundb_DISPY.csv file in the scratch folder?

@gtrevisan
Copy link
Copy Markdown
Member Author

yes, I thought about that but decided against it for simplicity, at least for now.

I'm currently using a "daily" temporary folder, so there is a very slim chance that something cool happens to the table after we cached it but before we rerun a workflow.

in the specific case of our DisruptionEFIT runs, those happen in the wee hours of the morning, so even by our eastern 8am everything is completed and archived.

if someone was running EFIT for DIII-D and then expecting to find the tree right away, then yes, they should remove the cached CSV.

(unrelated to the db, but) if someone was running EFIT for C-MOD they would probably still have to wait overnight in this big mess of servers and trees (archive vs test vs new, etc), or carefully craft their default_tree_path lookup env var.

bottom line, let's see how this works for now!

@samc24
Copy link
Copy Markdown
Collaborator

samc24 commented Apr 2, 2026

LGTM, clean sensible optimization, no conflicts with the backend abstraction, or with potential work on extracting shared fields from Params classes. Two minor nits added but non-blocking so feel free to ignore and merge .

@gtrevisan
Copy link
Copy Markdown
Member Author

ah, for the record, this is the scale and speed of our two EFIT runtags on DIII-D:

# efit18-equivalent
[ DEBUG ] Fetched EFIT 'DIS' trees in 0.325s: 17,277 rows, 17,157 unique shots.

# efit21-equivalent
[ DEBUG ] Fetched EFIT 'DISPY' trees in 0.406s: 55,603 rows, 55,603 unique shots.

# cache sizes
372K	rundb_DIS.csv
1.2M	rundb_DISPY.csv

so I challenge anyone to break this! 🐇

@gtrevisan gtrevisan removed the request for review from ZanderKeith April 2, 2026 14:18
@gtrevisan gtrevisan merged commit facc545 into dev Apr 2, 2026
26 checks passed
@gtrevisan gtrevisan deleted the glt/rundb branch April 2, 2026 14:18
@gtrevisan gtrevisan mentioned this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

machine: DIII-D Related to the DIII-D tokamak

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize DIII-D code rundb queries

4 participants