Skip to content

feat: add Pruefidentifikator mapping from AHB database#341

Merged
hf-kklein merged 10 commits intomainfrom
feature/add-pruefidentifikatoren
Apr 2, 2026
Merged

feat: add Pruefidentifikator mapping from AHB database#341
hf-kklein merged 10 commits intomainfrom
feature/add-pruefidentifikatoren

Conversation

@hf-kklein
Copy link
Copy Markdown
Contributor

Summary

  • Bumps rebdhuhn>=1.2.0 to use the new pruefidentifikatoren field and EbdPruefidentifikator model
  • Adds fundamend[sqlmodels] and py7zr dependencies
  • New ahb_pruefi module: downloads AHB SQLite DB from xml-migs-and-ahbs releases and queries the v_ahbtabellen view for EBD↔Prüfidentifikator mappings
  • Integrates the mapping into the pipeline — populates pruefidentifikatoren on both EbdNoTableSection and DocxTableConverter paths
  • Passes pruefidentifikatoren through to SVG generation for clickable links in the footer
  • New env vars: AHB_DB_PATH, GITHUB_TOKEN, FORMAT_VERSION

Context

Relates to Hochfrequenz/entscheidungsbaumdiagramme#637
Depends on Hochfrequenz/rebdhuhn v1.2.0

Test plan

  • Verify rebdhuhn>=1.2.0 is available on PyPI
  • Run pipeline with AHB_DB_PATH pointing to a local AHB database
  • Verify JSON output contains pruefidentifikatoren field
  • Verify SVG output contains clickable PI links in bottom-left footer

🤖 Generated with Claude Code

hf-kklein and others added 6 commits April 2, 2026 11:31
- Bump rebdhuhn>=1.2.0, add fundamend[sqlmodels] and py7zr dependencies
- Add ahb_pruefi module to download AHB DB and query EBD-to-Pruefi mapping
- Integrate mapping into the pipeline: populate pruefidentifikatoren on
  EbdTableMetaData and pass through to SVG generation
- Add AHB_DB_PATH, GITHUB_TOKEN, FORMAT_VERSION settings

Relates to Hochfrequenz/entscheidungsbaumdiagramme#637

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove backports-zstd, librt, mypy, greenlet etc. which are
platform/version-specific transitive deps that fail on Python 3.14.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix dict type annotation to use EbdPruefidentifikator instead of str
- Use EdifactFormatVersion enum properly in ahb_pruefi.py
- Add types-requests to type_check dependencies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Empty AHB_DB_PATH= in .env was being interpreted as current directory,
causing SQLite errors. Now empty strings are coerced to None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use .get(ebd_key) instead of .get(ebd_key, []) so missing keys return
  None (= didn't check) rather than [] (= checked, none found)
- Make pruefidentifikatoren assignment consistent (always post-assignment)
- Push EBD qualifier filter into SQL WHERE clause (LIKE 'E_%')
- Delete .7z archive after extraction to free disk space

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@hf-kklein hf-kklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the readme of the repository and explain how to use it together with xml-migs-and-ahbs

if qualifier is not None and _EBD_QUALIFIER_PATTERN.match(qualifier):
if qualifier not in seen:
seen[qualifier] = set()
seen[qualifier].add((EdifactFormatVersion(fv), pruefidentifikator))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you use the SQL i suggested where the regex is evaluted directy in the sqlite instead of in python? wouldn't that dramatically reduce the number of rows to read from the db?
Hochfrequenz/entscheidungsbaumdiagramme#637 (comment)

SELECT v_ahbtabellen.format_version,
       qualifier AS ebd_key,
       JSON_GROUP_ARRAY(DISTINCT pruefidentifikator) AS pruefidentifikatoren
FROM v_ahbtabellen
WHERE qualifier REGEXP 'E_[0-9]+'
GROUP BY v_ahbtabellen.format_version, qualifier
ORDER BY v_ahbtabellen.format_version DESC, ebd_key;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cfd3999 — replaced SQLModel query with the raw SQL from the issue comment. REGEXP + JSON_GROUP_ARRAY + GROUP BY all run in SQLite now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: replaced REGEXP with GLOB 'E_[0-9][0-9][0-9][0-9]' in 6001b3f because SQLite does not ship a built-in REGEXP implementation — it requires a user-defined function to be registered on the connection, which neither sqlmodel nor sqlalchemy do by default. GLOB supports character classes natively and is equivalent here.

Comment on lines +158 to +163
db_path = settings.ahb_db_path
if db_path is None and settings.github_token is not None:
click.secho("Downloading AHB database from xml-migs-and-ahbs...", fg="cyan")
db_path = download_ahb_db(settings.github_token)
if db_path is not None:
click.secho(f"Loading EBD-to-Prüfi mapping from {db_path} (format_version={settings.format_version})")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the settings.ahb_db_path to be configurable. just assume its never given and always None when the script starts, so as a consequence you always download it from the release artifact on the first run. YAGNI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b1e8c62 — removed ahb_db_path, always downloads from release when GITHUB_TOKEN is set.


kroki_port: int = Field(alias="KROKI_PORT")
kroki_host: str = Field(alias="KROKI_HOST")
ahb_db_path: Optional[Path] = Field(default=None, alias="AHB_DB_PATH")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in b1e8c62.

kroki_port: int = Field(alias="KROKI_PORT")
kroki_host: str = Field(alias="KROKI_HOST")
ahb_db_path: Optional[Path] = Field(default=None, alias="AHB_DB_PATH")
github_token: Optional[str] = Field(default=None, alias="GITHUB_TOKEN")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a description what it's used for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b1e8c62 — added description to the github_token field.

hf-kklein and others added 4 commits April 2, 2026 12:30
Use the suggested SQL query from the issue comment — REGEXP filtering
and JSON_GROUP_ARRAY aggregation happen entirely in SQLite, dramatically
reducing rows transferred to Python.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
YAGNI — always download the AHB DB from the xml-migs-and-ahbs release
when GITHUB_TOKEN is set. Add description to github_token field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…te assignment

- REGEXP is not natively supported in SQLite — use GLOB instead
- Merge pruefidentifikatoren across format versions instead of silently
  overwriting when format_version filter is not set
- Consolidate duplicated pruefidentifikatoren assignment into one line
  after the if/else block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant