Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces SQL load for DIII-D EFIT tree resolution by prefetching code_rundb.dbo.plasmas for the configured runtag once per run and caching the result to a per-day temp folder so forked workers (and repeat executions) can avoid per-shot queries.
Changes:
- Trigger EFIT tree prefetch once at workflow start, before spawning the multiprocessing pool.
- Add DIII-D-specific prefetch + CSV cache logic to
DisruptionNicknameSetting, with an in-memory shot→tree lookup fast-path. - Tighten
ShotDatabase.querytyping and adjustDummyDatabase.queryreturn behavior foruse_pandas=False.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| disruption_py/workflow.py | Calls nickname-setting DB prefetch before multiprocessing begins. |
| disruption_py/settings/nickname_setting.py | Implements DIII-D plasmas table prefetch, daily CSV caching, and lookup path. |
| disruption_py/inout/sql.py | Adds type hint to query and updates DummyDatabase.query signature/behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit 09165c8.
|
The code does work. However, do we need to check if the |
|
yes, I thought about that but decided against it for simplicity, at least for now. I'm currently using a "daily" temporary folder, so there is a very slim chance that something cool happens to the table after we cached it but before we rerun a workflow. in the specific case of our DisruptionEFIT runs, those happen in the wee hours of the morning, so even by our eastern 8am everything is completed and archived. if someone was running EFIT for DIII-D and then expecting to find the tree right away, then yes, they should remove the cached CSV. (unrelated to the db, but) if someone was running EFIT for C-MOD they would probably still have to wait overnight in this big mess of servers and trees (archive vs test vs new, etc), or carefully craft their bottom line, let's see how this works for now! |
|
LGTM, clean sensible optimization, no conflicts with the backend abstraction, or with potential work on extracting shared fields from Params classes. Two minor nits added but non-blocking so feel free to ignore and merge . |
|
ah, for the record, this is the scale and speed of our two EFIT runtags on DIII-D: so I challenge anyone to break this! 🐇 |
DIII-D EFIT tree assignments are not static, but rather dynamic as stored in the
code_rundb.dbo.plasmastable.previously, all processes where executing individual queries for each shot.
now we prefetch the whole table with respect to a given runtag -- caching it to disk into the "daily" default temporary folder (
/local-scratch/$USER/disruption-py/YYYY-MM-DD).downstream forked processes, and further repeat executions, should be hitting the cache and skipping queries altogether.
suggested reviewers:
we can change runtag from the default
DISas follows: