Skip to content

Add true picaxml handler #530#761

Open
TobiasNx wants to merge 7 commits into
masterfrom
530-addPicaXmlHandler
Open

Add true picaxml handler #530#761
TobiasNx wants to merge 7 commits into
masterfrom
530-addPicaXmlHandler

Conversation

@TobiasNx
Copy link
Copy Markdown
Contributor

@TobiasNx TobiasNx commented Apr 10, 2026

Rename old PicaXmlHandler to PpXmlHandler. Then handler names follow the naming convention as stated here: https://format.gbv.de/pica

Resolves #530

This was a very quick addition since picaxml follows the convention of marcxml minus a record type and indicators.

The old picaxml opener just opened ppxml and not picaxml. I renamed the old PicaXmlHandler opener to PpXmlHandler.

Rename old PicaXmlHandler to PpXmlHandler. Then handler names follow the naming convention as stated here: https://format.gbv.de/pica
@TobiasNx TobiasNx requested review from dr0i April 10, 2026 15:56
@TobiasNx TobiasNx assigned dr0i and unassigned dr0i Apr 10, 2026
@TobiasNx
Copy link
Copy Markdown
Contributor Author

Unfortunatly the old pica handler had no tests and I added no new java tests, but I tested it with a local distribution and the following test workflows:

Input playground.data:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="info:srw/schema/5/picaXML-v1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:srw/schema/5/picaXML-v1.0 http://www.oclcpica.org/xml/picaplus.xsd">
      <datafield tag="001@">
      <subfield code="0">0917:14-03-05</subfield>
    </datafield>
    <datafield tag="001B">
      <subfield code="0">0917:23-03-05</subfield>
      <subfield code="t">16:15:13.000</subfield>
    </datafield>
    <datafield tag="001D">
      <subfield code="0">0917:23-03-05</subfield>
    </datafield>
    <datafield tag="001X">
      <subfield code="0">0</subfield>
    </datafield>
    <datafield tag="002@">
      <subfield code="0">Aau</subfield>
    </datafield>
    <datafield tag="003@">
      <subfield code="0">481592954</subfield>
    </datafield>
    <datafield tag="004A">
      <subfield code="0">3774250936</subfield>
    </datafield>
    <datafield tag="011@">
      <subfield code="a">2004</subfield>
    </datafield>
    <datafield tag="021A">
      <subfield code="a">Der Hamster</subfield>
      <subfield code="d">artgerecht halten, gesund ernähren, richtig verstehen</subfield>
      <subfield code="h">Peter Hollmann</subfield>
    </datafield>
    <datafield tag="028A">
      <subfield code="d">Peter</subfield>
      <subfield code="a">Hollmann</subfield>
    </datafield>
    <datafield tag="032@">
      <subfield code="a">5. Aufl</subfield>
    </datafield>
    <datafield tag="033A">
      <subfield code="p">München</subfield>
      <subfield code="n">Gräfe und Unzer</subfield>
    </datafield>
    <datafield tag="034D">
      <subfield code="a">127 S</subfield>
    </datafield>
    <datafield tag="034M">
      <subfield code="a">zahlr. Ill</subfield>
    </datafield>
    <datafield tag="036E">
      <subfield code="a">Mein Heimtier</subfield>
    </datafield>
    <datafield tag="044K">
      <subfield code="a">Ratgeber</subfield>
    </datafield>
    <datafield tag="044L">
      <subfield code="S"> </subfield>
      <subfield code="a">Ratgeber</subfield>
    </datafield>
    <datafield tag="044L" occurrence="01">
      <subfield code="S"> </subfield>
      <subfield code="a">Hamsterhaltung</subfield>
    </datafield>
    <datafield tag="045B">
      <subfield code="a">Xbp 3</subfield>
    </datafield>
</record>

Flux 1:

default inputFile = FLUX_DIR + "playground.data";
default transformationFile = FLUX_DIR + "playground.fix";
inputFile
| open-file
| decode-xml
| handle-picaxml
| change-id(keepidliteral="true", idliteral="003@.0")
| encode-pica
| print
;

Output:

001@ 00917:14-03-05001B 00917:23-03-05t16:15:13.000001D 00917:23-03-05001X 00002@ 0Aau003@ 0481592954004A 03774250936011@ a2004021A aDer Hamsterdartgerecht halten, gesund ernähren, richtig verstehenhPeter Hollmann028A dPeteraHollmann032@ a5. Aufl033A pMünchennGräfe und Unzer034D a127 S034M azahlr. Ill036E aMein Heimtier044K aRatgeber044L SaRatgeber044L SaHamsterhaltung045B aXbp 3

and:

Flux 2:

default inputFile = FLUX_DIR + "playground.data";
default transformationFile = FLUX_DIR + "playground.fix";
inputFile
| open-file
| decode-xml
| handle-picaxml
| encode-yaml
| print
;

Output:

---
"001@":
  "0": "0917:14-03-05"
"001B":
  "0": "0917:23-03-05"
  t: "16:15:13.000"
"001D":
  "0": "0917:23-03-05"
"001X":
  "0": "0"
"002@":
  "0": "Aau"
"003@":
  "0": "481592954"
"004A":
  "0": "3774250936"
"011@":
  a: "2004"
"021A":
  a: "Der Hamster"
  d: "artgerecht halten, gesund ernähren, richtig verstehen"
  h: "Peter Hollmann"
"028A":
  d: "Peter"
  a: "Hollmann"
"032@":
  a: "5. Aufl"
"033A":
  p: "München"
  "n": "Gräfe und Unzer"
"034D":
  a: "127 S"
"034M":
  a: "zahlr. Ill"
"036E":
  a: "Mein Heimtier"
"044K":
  a: "Ratgeber"
"044L":
  S: ""
  a: "Ratgeber"
"044L":
  S: ""
  a: "Hamsterhaltung"
"045B":
  a: "Xbp 3"
``

Copy link
Copy Markdown
Member

@dr0i dr0i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests in other modules should not be an excuse for dismissing tests in new modules. As more as I think in the code atm are bugs - which would reveal themselves if properly tested.
Ideall, the rename PpxXmlHandler would also get a test (but that would be nice to have and is up to you).
Also, we shouldn't forget to mark this as a breaking change!

/*
* Copyright 2014 hbz, Pascal Christoph
* Copyright 2013, 2014 Deutsche Nationalbibliothek
*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not hbz? Didin't you/we at hbz implement it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted as suggested

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you added M. Geipel as author. As he worked at the dnb we should than not delete dnb here, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. This is a reuse of the existing marcXmlHandler btw. this is also already heavily modified by hbz workers. I am not sure. Perhaps I should just refer to the other class? And delete Geipel?

*
* @author Pascal Christoph (dr0i)
* A Pica xml reader. To read marc data without namespace specification set option `namespace=""` or to null when using JAVA code.
* @author Tobias Bülte
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ,as not little of the code was just a reuse of the former code, you just add you name as author , not deleting the first one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are still the creator of PpXmlHandler, PicaXmlHandler is a reuse of MarcXmlHandler which was created by Markus Michael Geipel

*
*/
@Description("A pica xml reader")
@Description("A Pica XML reader. To read pica data without namespace specification set option `namespace=\"\"`. To ignore namespace specification set option `ignorenamespace=\"true\".")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we should link to the new PpXmlHandler and hint when to use it?

* A pica xml handler.
*
* @author Pascal Christoph (dr0i)
* A Pica xml reader. To read marc data without namespace specification set option `namespace=""` or to null when using JAVA code.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use capital letters "XML" like below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done.

* @param namespace the namespace. Set to null if namespace shouldn't be checked. Set to empty string
* if the namespace is missing in the data.
*/
public void setNamespace(final String namespace) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one namespace valid (see below checkNamespacelogic). Everything not being tat namespace, i.e. also other namespaces, will be ignored. This should be documented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reuse of the mechanism from MarcXmlHandler

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reused the code and the documentation

currentTag = attributes.getValue("tag");
}
else if (RECORD.equals(localName) && checkNamespace(uri)) {
getReceiver().startRecord("");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested - but what if the the namespace ist set to be ignored? It won't work then, won't start the record, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the tests.

@dr0i
Copy link
Copy Markdown
Member

dr0i commented Apr 13, 2026

I've not yet reviewed the code thoroughly because I think you first have to provide some tests so we can assure this code is working as proposed.

@dr0i dr0i assigned TobiasNx and unassigned TobiasNx Apr 13, 2026
Copy link
Copy Markdown
Member

@dr0i dr0i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline.

@dr0i dr0i assigned TobiasNx and unassigned dr0i Apr 13, 2026
@TobiasNx TobiasNx force-pushed the 530-addPicaXmlHandler branch from 974a033 to 563f8b0 Compare April 15, 2026 16:17
@TobiasNx TobiasNx requested a review from dr0i April 16, 2026 14:44
@TobiasNx
Copy link
Copy Markdown
Contributor Author

TobiasNx commented Apr 16, 2026

I added tests and adjusted the code. Also the initial implementation ignored the occurence these are now added as part of the element label if they exist.

@TobiasNx TobiasNx assigned dr0i and unassigned TobiasNx and dr0i Apr 17, 2026
Comment thread metafacture-biblio/src/main/java/org/metafacture/biblio/pica/PicaXmlHandler.java Outdated
/*
* Copyright 2014 hbz, Pascal Christoph
* Copyright 2013, 2014 Deutsche Nationalbibliothek
*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you added M. Geipel as author. As he worked at the dnb we should than not delete dnb here, right?

In order to encode pica data the handling of occurence must be adjusted and add an slash.
@TobiasNx
Copy link
Copy Markdown
Contributor Author

TobiasNx commented May 5, 2026

The latest commit daa9e5f conforms the handling of occurrence by handle-ppxml as well as handle-pica-xml to decode-pica as well as encode-pica:

Field 201U and occurence 1: was 201U1/201U01 now it is 201U/01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable handle-picaxml handler to handle pica/xml not just ppxml

2 participants