Conversation
This comment was marked as duplicate.
This comment was marked as duplicate.
|
I have run some A/B tests with the dev/xarray branches. So far everything looks good. However, I noticed this new behavior. When you rerun get_shots_data() and the output file is not removed in advance, you get the following error. File ".../disruption_py/workflow.py", line 178, in get_shots_data
output_setting.to_disk()
File ".../disruption_py/settings/output_setting.py", line 282, in to_disk
raise FileExistsError(f"File already exists! {self.path}")Is this to be expected? In dev the output file is overwritten always. If that is the new behavior you could have this error after retrieving a big amount of the data. So it would be nice to test if the file exist before running the queries. If not, overwrite needs to be forced at the end. I can not wait to have datasets. Adding profiles or spectral data is going to be great! :-) |
|
that is kind of the intended behavior, so far, because every run should have a different temporary folder. are you running |
changes
executive summary:
OutputSettingis still the abstract base class,OutputSettingListis kept for testing purposes, might be deleted in the future,DictOutputSetting = Dict[int, xr.Dataset]is the new under-the-hood format,SingleOutputSettingis a new semi-abstract class for single-file output,DatasetOutputSettingwill be the new default in the future, concatenates onidx,DataTreeOutputSettinggroups by shot, rather than concatenating,DataFrameOutputSettingis the usual dataframe, still used by testing.tests:
to do:
supersedes:
closes:
index
our new index is a simple row-number-like variable named
idx, whileshotandtimeare available as coordinates.I thought about making our
idxa MultiIndex of bothshotandtime, but apparently it cannot be serialized to disk just yet:it can be done in-memory, though:
to re-obtained "native" dimensions, one can then unstack, but this will create a humongous dataset which is the reason we had to close #407, so be mindful of your memory constraints.
attributes
we are then definitely ready to archive attributes for each physics method!
the first two that pop up would be units of measure, and IMAS path reference.
we could also store refined metadata in our datasets, eg full settings for reproducibility.
dimensions
furthermore, we should be already prepared for multidimensional outputs! 🎉 this very simple example works:
as a side-note, we lose the option of having lower dimensional columns since now both
shotandtimeare "reserved" coordinates for theidxdimension. oh, well.output
Dict[int, xr.Dataset]xr.Datasetxr.DataTreepd.DataFrame