Koch group runthrough dec2025 by mkuehbach · Pull Request #146 · FAIRmat-NFDI/pynxtools-em

mkuehbach · 2025-12-08T10:36:43Z

No description provided.

…overview which projects have a payload that is larger than the main memory of the host system

…gnored, hopefully also bugfix that yaml emits complex-formatted payload

…gib, tib

…ssing and batch analysis

…st modified, author from the microscope directory as user, atom types

…and bugfixing incorrect name for row

… 2020ish nionswift project files, add bypass to first go through small datasets, as system is currently busy

… or data to the NeXus, i.e., when the nionswift project parser is used in so-called analysis mode that metadata from data.npy and hfive files will not read the entire content but only metadata from the respective file headers, this should consume substantially less main memory than before and thus be also faster

…t least for the analysis mode should work fine, documented a potential memory leak that needs addressing soonish, likely related to switching logs though and log state variables not gc freed, one should check carefully when running pynxtools in production mode on the actual datasets.

…a few files had nionswift projects files missing, this commit configures the script to generate a complete nsproj_to_eln file that identifies which hashed results logs and yaml files belong to which project

…d ipynb with analyses of metadata

…respective subsequent function call already, implement a fix for cases where individual ndata files were corrupted causing an uncatched IOError that stopped parsing, right now we warn about this issue but continue parsing. The idea is that the pynxtools parser should always warn when specific portions cannot be parsed but it should not stop in an uncontrolled manner, even if that results in a NeXus file that might end up empty or not fully filled with instance data to qualify for matching with NXem, otherwise these uncontrolled throwing would affect automated processing pipelines like in RDM systems, e.g., NOMAD which is undesired

… different NXdata types, each dataset not more than 8GiB payload, to get all 35 different types, would demand allowing the processing of even the largest datasets with 130GiB payload, this we do not wish to pursue after the Dec, 11th talk to focus and at the same time sample broad

mkuehbach · 2026-03-22T23:50:30Z

Included and functionally superseeded by #157

atomprobe-tc added 30 commits December 3, 2025 17:08

Initial commit, run-through specific code

a697c02

Batch processing skeleton

f2e98be

minor

7cf3ef9

microseconds in the log, adding filesize

0d58bb9

logger fix

fa6d74a

Progress reporting

635da9e

optional deps odfpy to parse ods files

9a5b468

optional openpyxl, loading of ods files

3b42dd1

add missing parameter value

727173c

bugfix

fa3bdb3

bugfix

f4dfbe1

Implementing more of the skeleton

1005136

Fix bug in calling of hash computation

5a5e998

Fix bug in calling of hash computation

1e6feb2

deactivate verification during conversion step

6c25dfc

path mangling, testing adding of author

543f009

Reporting of size for each file, reporting of each nsproj, to get an …

f5ee097

…overview which projects have a payload that is larger than the main memory of the host system

Resolving of aliases etc.

c8dc0cd

Bugfix that orcid gets picked, bugfix that recycle bin content gets i…

b11cace

…gnored, hopefully also bugfix that yaml emits complex-formatted payload

Recycle bin

1aabefe

Modified maximum length for yaml reporting, bug with print statement …

0853f67

…gib, tib

Proper versioning

61ebde7

Splitting the scripts to support multiple analysis modes, batch proce…

c24e3ef

…ssing and batch analysis

Minor bugfixing!

0b49466

Add eln basic metadata, author, start_time as time nsproj file was la…

524bf94

…st modified, author from the microscope directory as user, atom types

A few more edits

0716909

Activated reporting of mapping nsproj_fpath to eln_fpath hashes used …

8d2a91e

…and bugfixing incorrect name for row

foo params modified

5f9456c

Fix incorrect importing of datetime

cdc0c16

Respective changes on usage of datetime and fixes

275aa43

atomprobe-tc added 27 commits December 6, 2025 23:43

Deactivate writing of metadata but activate that these will get reported

0a4e224

fixing

57bf765

fixing

959704f

fixing

057c9bc

fixing

2745dd1

fixing

44a7b3d

fixing

fb013b1

fixing

1256b16

Run02 run-through version to analyze metadata

f658ada

Customizable start and end, no skipping of large files

409cb22

Customizable start and end, no skipping of large files

dc87edc

Catch situations when display_items is not present like in some older…

b7f6115

… 2020ish nionswift project files, add bypass to first go through small datasets, as system is currently busy

Fix as return value is tuple of dict and int

f3a6335

Run02 successfully completed, was indeed orders of magnitude faster, …

b5a0442

…a few files had nionswift projects files missing, this commit configures the script to generate a complete nsproj_to_eln file that identifies which hashed results logs and yaml files belong to which project

Deactivated the bypassing that was added in the previous commit, adde…

3d63799

…d ipynb with analyses of metadata

Minor

7be19e6

Bugfix

51163b6

Monitoring the white_list

e62b592

fixing

1f97167

Merge branch 'main' into koch_group_runthrough_dec2025

8873481

linting

0420ac2

refactor location and naming of _versionpy file

59b9f33

fix location of autogenerated _version.py

398fe48

mkuehbach closed this Mar 22, 2026

mkuehbach deleted the koch_group_runthrough_dec2025 branch March 23, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Koch group runthrough dec2025#146

Koch group runthrough dec2025#146
mkuehbach wants to merge 74 commits intomainfrom
koch_group_runthrough_dec2025

mkuehbach commented Dec 8, 2025

Uh oh!

mkuehbach commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkuehbach commented Dec 8, 2025

Uh oh!

mkuehbach commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants