Skip to content

Koch group runthrough dec2025#146

Closed
mkuehbach wants to merge 74 commits intomainfrom
koch_group_runthrough_dec2025
Closed

Koch group runthrough dec2025#146
mkuehbach wants to merge 74 commits intomainfrom
koch_group_runthrough_dec2025

Conversation

@mkuehbach
Copy link
Collaborator

No description provided.

…overview which projects have a payload that is larger than the main memory of the host system
…gnored, hopefully also bugfix that yaml emits complex-formatted payload
…st modified, author from the microscope directory as user, atom types
… 2020ish nionswift project files, add bypass to first go through small datasets, as system is currently busy
… or data to the NeXus, i.e., when the nionswift project parser is used in so-called analysis mode that metadata from data.npy and hfive files will not read the entire content but only metadata from the respective file headers, this should consume substantially less main memory than before and thus be also faster
…t least for the analysis mode should work fine, documented a potential memory leak that needs addressing soonish, likely related to switching logs though and log state variables not gc freed, one should check carefully when running pynxtools in production mode on the actual datasets.
…a few files had nionswift projects files missing, this commit configures the script to generate a complete nsproj_to_eln file that identifies which hashed results logs and yaml files belong to which project
…respective subsequent function call already, implement a fix for cases where individual ndata files were corrupted causing an uncatched IOError that stopped parsing, right now we warn about this issue but continue parsing. The idea is that the pynxtools parser should always warn when specific portions cannot be parsed but it should not stop in an uncontrolled manner, even if that results in a NeXus file that might end up empty or not fully filled with instance data to qualify for matching with NXem, otherwise these uncontrolled throwing would affect automated processing pipelines like in RDM systems, e.g., NOMAD which is undesired
… different NXdata types, each dataset not more than 8GiB payload, to get all 35 different types, would demand allowing the processing of even the largest datasets with 130GiB payload, this we do not wish to pursue after the Dec, 11th talk to focus and at the same time sample broad
@mkuehbach
Copy link
Collaborator Author

Included and functionally superseeded by #157

@mkuehbach mkuehbach closed this Mar 22, 2026
@mkuehbach mkuehbach deleted the koch_group_runthrough_dec2025 branch March 23, 2026 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants