Skip to content

Use ContextGenerator for JSON serializer#23

Merged
DiegoPino merged 6 commits into
Islandora:8.x-1.xfrom
whikloj:issue-661
Sep 19, 2017
Merged

Use ContextGenerator for JSON serializer#23
DiegoPino merged 6 commits into
Islandora:8.x-1.xfrom
whikloj:issue-661

Conversation

@whikloj
Copy link
Copy Markdown
Member

@whikloj whikloj commented Jul 12, 2017

GitHub Issue: Islandora/documentation#661

What does this Pull Request do?

Uses Json-ld Context Generator for normalizer.

What's new?

  • Does this change require documentation to be updated? no
  • Does this change add any new dependencies? no
  • Does this change require any other modifications to be made to the repository
    (ie. Regeneration activity, etc.)? no
  • Could this change impact execution of existing code? no

How should this be tested?

  1. Bring up CLAW
  2. Add an Islandora Image with some metadata (image not required)
  3. View the Json-ld version (ie. http://localhost:8000/node/1?_format=jsonld)
  4. Then pull down this PR over jsonld
  5. In terminal go to Drupal home, cd /var/www/html/drupal/web (for vagrant)
  6. Run drupal router:rebuild
  7. Run drupal cache:rebuild all
  8. Reload the above web page (http://localhost:8000/node/1?_format=jsonld)
  9. See new @type properties.
http://purl.org/dc/terms/title: [
    {
        @value: "Test Image",
        @type: "http://www.w3.org/2001/XMLSchema#string"
    }
],

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

Interested parties

@DiegoPino @dannylamb @ajs6f

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 12, 2017

🤦‍♂️ tests!

public function __construct(LinkManagerInterface $link_manager,
EntityResolverInterface $entity_Resolver,
JsonldContextGeneratorInterface $jsonld_context) {
parent::__construct($jsonld_context);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the parent::__construct($jsonld_context)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the $this->jsonldContextGenerator is initialized in the FieldItemNormalizer which this class extends. So that variable is already handled in the parent class constructor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj got it! thanks

$field->getFieldDefinition(),
$context['namespaces']
);
$values_clean = $values_clean + $field_context[$field_keys[0]];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this key always populated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm good point.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj This looks good from what I can make of it. I'm going to give it a go with a new vagrant and see what happens.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj I'm getting this when attempting to rebuild the router after pulling in your changes:

TypeError: Argument 3 passed to Drupal\jsonld\Normalizer\EntityReferenceItemNormalizer::__construct() must be an instance of Drupal\jsonld\ContextGenerator\JsonldContextGeneratorInterface, none given, called in /var/www/html/drupal/web/core/lib/Drupal/Component/DependencyInjection/Container.php on line 272 in Drupal\jsonld\Normalizer\EntityReferenceItemNormalizer->__construct() (line 47 of /var/www/html/drupal/web/modules/contrib/jsonld/src/Normalizer/EntityReferenceItemNormalizer.php).

@DiegoPino
Copy link
Copy Markdown
Contributor

DiegoPino commented Jul 20, 2017

@dannylamb did you clear caches? Service YAML file gets cached in Database https://www.drupal.org/node/2540430, so probably after you got this code it could be still using the old one. Looking through the rest of the code anyway now, but first look seems fine, https://github.com/Islandora-CLAW/jsonld/pull/23/files#diff-59b468be4869b82b42122d98a064f7bbR4 is referencing the right Service, well named, and that service is of a class that extends Drupal\jsonld\ContextGenerator\JsonldContextGeneratorInterface

@dannylamb
Copy link
Copy Markdown
Contributor

@DiegoPino I was getting that error when trying to either rebuild the router or clear caches. Did you manage to get it running when reviewing previously?

If all else fails maybe I have to do a full uninstall of our modules, then pull in the code and reinstall all the modules.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj So there's issues uninstalling the main islandora module, but that's totally independent of this PR. Nonetheless, I did manage to uninstall and re-install everything and that obliterated whatever was getting me in a "cache 22".

Unfortunately, now I'm getting this when serializing out to jsonld:

Error: Unsupported operand types in Drupal\jsonld\Normalizer\EntityReferenceItemNormalizer->normalize() (line 114 of /var/www/html/drupal/web/modules/contrib/jsonld/src/Normalizer/EntityReferenceItemNormalizer.php)

Copy link
Copy Markdown
Contributor

@DiegoPino DiegoPino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure all is of type Array before "+" them

$context['namespaces']
);
if (isset($field_context[$field_keys[0]])) {
$values_clean = $values_clean + $field_context[$field_keys[0]];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one of the vars at line 114 here is not an array it will fail via
PHP Fatal error: Unsupported operand types. This is what @dannylamb is seeing. Which means we need to make sure that everything is A) and array or cast. Not sure about the cast because it can lead to undesired results.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that initializing $values_clean to an empty array could work here. or moving $values_clean['@id'] = key($embedded['@graph']); before all conditions are applied (just a few lines up)

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 28, 2017

Adding $values_clean = []; as it appears to just assume its an array anyways.

@dannylamb Can you explain what type of object you were getting your error on? I can add it as a test case.

@DiegoPino
Copy link
Copy Markdown
Contributor

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 28, 2017

Oops, didn't a) run the tests and b) missed that array is already set in one.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 28, 2017

😡 I spend more time destroying vagrant machines because Drupal gets stuck unable to run with a module and at the same time unable to uninstall that module.

SERENITY NOW!!!

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 28, 2017

@Islandora-CLAW/committers Ok, I pulled and built a new vagrant and then immediately did this.

ubuntu@claw:/var/www/html/drupal/web$ drupal module:uninstall islandora_image islandora_collection islandora jsonld

                                                                                                                        
 [ERROR] Attempt to create a field field_height that does not exist on entity type media.                               
                                                                                                                        

Gaa?

ubuntu@claw:/var/www/html/drupal/web/modules/contrib$ drupal module:uninstall islandora_image 
 Nothing to do. All modules are already uninstalled
ubuntu@claw:/var/www/html/drupal/web/modules/contrib$ drupal module:uninstall islandora_collection
 Nothing to do. All modules are already uninstalled
ubuntu@claw:/var/www/html/drupal/web/modules/contrib$ drupal module:uninstall islandora

                                                                                                                        
 [ERROR] Attempt to create a field field_image that does not exist on entity type media.                                
                                                                                                                        

Ideas?

@dannylamb
Copy link
Copy Markdown
Contributor

I've had this happen before. Sometimes our configs in config/install aren't getting deleted in the right order when uninstalling. I've manually fixed it at times by clobbering the fields on the thumbnail media bundle manually with drupal:config delete.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Jul 28, 2017

Ok, I manually deleted the TN field configs and then reinstalled this PR along with the same islandora, islandora_collection and islandora_image. It worked. So perhaps Islandora/documentation#685 needs to be a priority

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj I can verify that this is working as intended, but when checking out Fedora, the metadata never makes its way in. Fedora is giving 400s. Looks like it doesn't like the @type entries.

I'll investigate some more and get back to you on it.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj So the document we're sending to fedora looks like this:

[
    {
        "@id":"",
        "@type":[
            "http:\/\/pcdm.org\/models#Object",
            "http:\/\/schema.org\/ImageObject"
        ],
        "http:\/\/purl.org\/dc\/terms\/title":[
            {
                "@value":"Ooo-be-doo",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
        ],
        "http:\/\/schema.org\/author":[
            {
                "@type":"@id",
                "@id":"http:\/\/localhost:8000\/user\/1?_format=jsonld"
            }
        ],
        "http:\/\/schema.org\/dateCreated":[
            {
                "@value":"2017-08-04T17:38:05+00:00",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
        ],
        "http:\/\/schema.org\/dateModified":[
            {
                "@value":"2017-08-04T17:38:46+00:00",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
        ],
        "http:\/\/purl.org\/dc\/terms\/description":[
            {
                "@value":"I wanna be like you-oo-oo.",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
        ],
        "http:\/\/pcdm.org\/models#hasFile":[
            {
                "@type":"@id",
                "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld"
            },
            {
                "@type":"@id",
                "@id":"http:\/\/localhost:8000\/media\/2?_format=jsonld"
            }
        ]
    }
]

It's this segment that's causing the problem:

        "http:\/\/schema.org\/author":[
            {
                "@type":"@id",
                "@id":"http:\/\/localhost:8000\/user\/1?_format=jsonld"
            }
        ],

Fedora is returning 400 with RDF Stream contains subject(s) (http://localhost:8000/user/1?_format=jsonld) not in the domain of this repository.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj So I'm guessing Fedora is interpreting everything with a @type of @id as a resource inside the repository, and is trying to enforce referential integrity. But I don't know enough about Fedora's jsonld particulars to be 100% on that.

@dannylamb
Copy link
Copy Markdown
Contributor

Hrm... and stripping out the author bit makes it blow up with the same error on the hasFile triples: RDF Stream contains subject(s) (http://localhost:8000/media/2?_format=jsonld) not in the domain of this repository.

@whikloj I'm thinking we may have to drop the @type: "@id" for entity references.

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj Bump, we can't have @type: "@id" or this won't work with the community impl of Fedora. Does that basically hose the plan to use the context generator for this?

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 7, 2017

Umm no I think entity references can be different. I'll have to get back into it and have another look.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 8, 2017

@dannylamb oh wait, I didn't fully think through your statement...
While I think I can remove the @type: @id from the JSON-LD. We are getting into the area of jury rigging our JSON-LD to squeeze past Fedora's restriction.

I'm just wondering if (and I know this is more work) we should investigate this on the Fedora side as well.

It seems weird that it requires the object to exist in the repository if we say the type is an id. Especially if this is the object of a triple and so therefore could be anywhere in the world.

@ajs6f
Copy link
Copy Markdown

ajs6f commented Sep 8, 2017

It seems weird that it requires the object to exist in the repository if we say the type is an id. Especially if this is the object of a triple and so therefore could be anywhere in the world.

This. Six hundred and twelve times this. The "integrity constraint" here is a pure artifact of the entanglement between RDF types and Modeshape node types in the Fedora/LDP impl under discussion. It's hard to imagine that any other system would show it.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 11, 2017

I just did a test PUT into Fedora built off master with the following and it worked.

[
    {
        "@id":"",
        "@type":[
            "http:\/\/pcdm.org\/models#Object",
            "http:\/\/schema.org\/ImageObject"
        ],
        "http:\/\/purl.org\/dc\/terms\/title":[
            {
                "@value":"Ooo-be-doo",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
        ],
        "http:\/\/schema.org\/author":[
            {
                "@value":"http:\/\/localhost:8000\/user\/1?_format=jsonld"
            }
        ],
        "http:\/\/schema.org\/dateCreated":[
            {
                "@value":"2017-08-04T17:38:05+00:00",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
        ],
        "http:\/\/schema.org\/dateModified":[
            {
                "@value":"2017-08-04T17:38:46+00:00",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
        ],
        "http:\/\/purl.org\/dc\/terms\/description":[
            {
                "@value":"I wanna be like you-oo-oo.",
                "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
        ],
        "http:\/\/pcdm.org\/models#hasFile":[
            {
                "@value":"http:\/\/localhost:8000\/media\/1?_format=jsonld"
            },
            {
                "@value":"http:\/\/localhost:8000\/media\/2?_format=jsonld"
            }
        ]
    }
]

@ajs6f
Copy link
Copy Markdown

ajs6f commented Sep 11, 2017

I share @DiegoPino's concern about whether @value is actually correct here. @type is RDF type, no? That would normally be an URI...

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

Just to stick my nose in here, but you should be using @id in this case, not @value. @value is for RDF Literals, which is not what you want in this case. If the problem is with Fedora, then it should be addressed there.

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

@ajs6f the use of @type in JSON-LD is entirely context-related: that is, it means different things depending on exactly where it is used. Sometimes it refers to rdf:type, but in this case it's used to distinguish between a literal and an IRI.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 11, 2017

I was able to PUT the following, so perhaps it is the format we are sending to Fedora?

{
  "@id": "",
  "@type": [
    "http://pcdm.org/models#Object",
    "http://schema.org/ImageObject"
  ],
  "http://pcdm.org/models#hasFile": [
    {
      "@id": "http://localhost:8000/media/1?_format=jsonld"
    },
    {
      "@id": "http://localhost:8000/media/2?_format=jsonld"
    }
  ],
  "http://purl.org/dc/terms/description": {
    "@type": "http://www.w3.org/2001/XMLSchema#string",
    "@value": "I wanna be like you-oo-oo."
  },
  "http://purl.org/dc/terms/title": {
    "@type": "http://www.w3.org/2001/XMLSchema#string",
    "@value": "Ooo-be-doo"
  },
  "http://schema.org/author": {
    "@id": "http://localhost:8000/user/1?_format=jsonld"
  },
  "http://schema.org/dateCreated": {
    "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
    "@value": "2017-08-04T17:38:05+00:00"
  },
  "http://schema.org/dateModified": {
    "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
    "@value": "2017-08-04T17:38:46+00:00"
  }
}

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

@whikloj: it is likely due to the fact that Fedora is using really old RDF processing libraries. The 4.7.x branch uses a patched/forked (old) version of Jena, and so updating that will be pretty painful, but updating that is also out of the question b/c of the RDF 1.0 vs RDF 1.1 issue from a year ago (yuck!). The master branch has a slightly more modern Jena (and it doesn't use a forked version of the library), so it's not surprising that it would do better.

@ajs6f
Copy link
Copy Markdown

ajs6f commented Sep 11, 2017

@acoburn You sure? Are you sure that the Modeshape impl just doesn't just fail to deal properly with non-repo RDF type URIs?

@ajs6f
Copy link
Copy Markdown

ajs6f commented Sep 11, 2017

Oh, this isn't connected to RDF type? If not then I am throwing dirt in the air, sorry.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 11, 2017

Ok, here are the 5 tests I tried against Fedora.
https://gist.github.com/whikloj/e63ce22074e0079548b44292b3ef2d5a

2 failed, 3 succeeded but one of those was using @value instead of @id.

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

@whikloj were those tests applied to Fedora 4.7 or against master?

@DiegoPino
Copy link
Copy Markdown
Contributor

@whikloj which ones are alternative/by spec transformations via the JSON-LD playground and which ones are modified manually. Thanks a lot Jared, I feel that is what we needed

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 11, 2017

@acoburn this was against master (specifically fcrepo/fcrepo@26c64ae)

@DiegoPino I think 3, 4, & 5 were alternatives, but I validated them all using the JSON-LD playground site.

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

@whikloj: I say all of this just by looking at the JSON-LD (without doing any validation) so take this w/ a grain of salt.

Failed_1.json: this one failed because of the "single-subject restriction": You're sending a graph of resources with three subjects: <>, <http://localhost:8000/user/1?_format=jsonld, http://localhost:8000/media/1?_format=jsonld. The last two subjects are the ones causing the errors.

Failed_5.json: this failed b/c you shouldn't have the "@type": ["@id"] sections (e.g. in pcdm:hasFile). This is expanded JSON-LD, and those constructs are part of compact JSON-LD.

Succeeded_3.json: this is valid JSON-LD, there's nothing to say

Succeeded_4.json: this is valid JSON-LD, there's nothing to say

Succeeded_but_wrong_2.json: this is also valid JSON-LD. The issue here is you're sending the IRI as a literal, which Fedora will accept, since it accepts any literal values (and performs no integrity checks or anything).

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 11, 2017

@acoburn w/r/t Failed_5.json I got that syntax from the JSON-LD playground app. So I can't speak to its correctness (really I can't speak to any of its correctness 😄 ) but I still see it (ie. the json-ld playground website) providing that syntax (ie. the @type: [@id]) for expanded... but you are saying that should be used for compacted?

@acoburn
Copy link
Copy Markdown

acoburn commented Sep 11, 2017

@whikloj I'm sure it's correct on the JSON-LD playground -- it's just a syntax that I'm not familiar with. In fact, I generally try to avoid expanded JSON-LD; compact JSON-LD is much nicer to deal with.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 15, 2017

Ok, so I'm going to take Succeeded_4.json from the above Gist and make that my test's expected outcome and work towards that.

If there is any reason you can see for not doing that, please let me know.

@DiegoPino
Copy link
Copy Markdown
Contributor

DiegoPino commented Sep 15, 2017

@whikloj go for it. I'm too deep into release process today but i promise to do research in the next days on possible issues and edge cases and come back and comment on any changes you make. Thanks a lot for this.

Update: Seems like Succeeded_4.json requires just the removal of "@type":"@id" for internal drupal entities right? the use of "@type":"@id" on @id:IRI is just a reaffirmation/strict conformance to the 1.0 specs and keeping only the @id: blablaIRI is good enough for any smart enough parser, so nothing is lost there. Good, you did not go for @value:!!

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 15, 2017

Ok so question. @dannylamb you said here that the document being sent to Fedora had that format. How did you determine that?

Because the format in my tests included the referenced entity (like the Author or File).

I was going to remove that as that causes a problem for Fedora (trying to dereference those out of repo URIs), but your comment doesn't have the extra referenced entity graphs?

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj Author is referenced in that resource. Not sure what you mean.

I tested by going to the ?_format=jsonld version of the resource and applied the same processing Milliner does (e.g. removing every other resource and dropping @graph). This produces documents that look like Succeeded_4.json.

@whikloj
Copy link
Copy Markdown
Member Author

whikloj commented Sep 18, 2017

@dannylamb So by default we embed the other resources in the same JSON-LD document.

So if you have a node with a JPG in it the JSON-LD contains both the graph for the node (which references the media entity), but also a graph entry for the media entry.

That is what I see from the output in the tests (ie. this exists in the output to dereference this).

But your example doesn't have those extra bits in the JSON-LD, so are you removing them? I ask because I was wondering if we should add an option to allow us to not embed the referenced entity in the output JSON-LD if we so choose.

All we need to do (I think) is change this line to

$normalized_in_context = ['@graph' => [$context['current_entity_id'] => $normalized_prop]];

We could do that based on either:

  • a configuration option
  • a context switch

@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj Yes, I'm stripping out the @graph and other embedded entities in Milliner. I like the idea of being able to customize jsonld output, we just need to be careful how we do it because a user could totally wreck Fedora syncing.

I think ideally it'd accept a header or query param and the user can set the default value of that in configuration. That way Milliner can always request it the same way and we can program against just one of the options while still giving people flexibility.

For this issue, I say pick whatever output you think is best and I can help adapt Milliner if need be. Then go ahead and make another issue for making the serializer and REST resource work with headers/params and be user configurable because we're scope creeping pretty hard right now.

@dannylamb
Copy link
Copy Markdown
Contributor

FYI, there's already a mechanism to do this sort of thing through for requesting expanded/compacted foms through the Accept header (not that Drupal even respects it): https://www.w3.org/TR/json-ld/#application-ld-json

Copy link
Copy Markdown
Contributor

@dannylamb dannylamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj++

I've tested the most recent changes and they don't break Fedora sync'ing.

@DiegoPino Can you release your block / merge this if all your feedback has been addressed?

@DiegoPino DiegoPino merged commit 36f37cb into Islandora:8.x-1.x Sep 19, 2017
@dannylamb
Copy link
Copy Markdown
Contributor

@whikloj++ @DiegoPino++

@DiegoPino
Copy link
Copy Markdown
Contributor

All credits to @whikloj. Good work =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants