Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdf Flushing Microservice #29

Merged
merged 9 commits into from
Apr 21, 2017
Merged

Rdf Flushing Microservice #29

merged 9 commits into from
Apr 21, 2017

Conversation

dannylamb
Copy link
Contributor

GitHub Issue: Part of Islandora/documentation#597

What does this Pull Request do?

Adds a microservice that retrieves Drupal entites as JSONLD and persists them in a Fedora repository.

What's new?

An entire new Silex microservice.

How should this be tested?

  • Run it with the PHP builtin php -S localhost:8088 -t src
  • Create a FedoraResource entity in Drupal. Let's pretend it's at http://localhost:8000/fedora_resource/1.
  • curl -X POST -H "Authorization: Bearer SOME_CRAZY_TOKEN" "http://localhost:8088/fedora_resource/1" should return the uri of the created resource. I got `http://localhost:8080/fcrepo/rest/4/e2/18/ab/14e218ab-aec4-4262-a71a-36d0466353dd".
  • Link the two using Gemini (Milliner purposefully does not do this itself, as the asynchronous workflow should handle that. PR to come soon). curl -X POST -H "Authorization: Bearer SOME_CRAZY_TOKEN" -H "Content-Type: application/json" -d '{"drupal": "fedora_resource/1", "fedora": "14/e2/18/ab/14e218ab-aec4-4262-a71a-36d0466353dd"}' "http://localhost:8000/gemini/" You'll have to provide your own fedora path using the uri you got in the previous step.
  • Update the resource in Drupal (like change title or description)
  • curl -X PUT -H "Authorization: Bearer SOME_CRAZY_TOKEN" "http://localhost:8088/fedora_resource/1" should return 204 No Content.
  • GET it from Fedora to verify your changes took effect curl -H "Authorization: Bearer SOME_CRAZY_TOKEN" "http://localhost:8080/fcrepo/rest/14/e2/18/ab/14e218ab-aec4-4262-a71a-36d0466353dd"
  • curl -X DELETE -H "Authorization: Bearer SOME_CRAZY_TOKEN" "http://localhost:8088/fedora_resource/1" should return 204 No Content
  • Try to GET it from Fedora again, and you should get a tombstone. curl -H "Authorization: Bearer SOME_CRAZY_TOKEN" "http://localhost:8080/fcrepo/rest/14/e2/18/ab/14e218ab-aec4-4262-a71a-36d0466353dd"

Interested parties

@Islandora-CLAW/committers

@codecov
Copy link

codecov bot commented Apr 19, 2017

Codecov Report

Merging #29 into master will decrease coverage by 29.15%.
The diff coverage is 45.85%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master      #29       +/-   ##
=============================================
- Coverage     98.41%   69.25%   -29.16%     
- Complexity       35       56       +21     
=============================================
  Files             3        6        +3     
  Lines           126      283      +157     
=============================================
+ Hits            124      196       +72     
- Misses            2       87       +85
Impacted Files Coverage Δ Complexity Δ
Hypercube/src/Controller/HypercubeController.php 100% <ø> (ø) 4 <0> (ø) ⬇️
Milliner/src/Converter/DrupalEntityConverter.php 0% <0%> (ø) 3 <3> (?)
Milliner/src/Service/MillinerService.php 0% <0%> (ø) 9 <9> (?)
Milliner/src/Controller/MillinerController.php 100% <100%> (ø) 9 <9> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78e5d2f...9b5f4b3. Read the comment docs.

@ruebot
Copy link
Member

ruebot commented Apr 19, 2017

Paging @manez; Avatar time! 🎉

@manez
Copy link
Member

manez commented Apr 19, 2017

Aw, c'mon man. flushing microservice?

@ruebot
Copy link
Member

ruebot commented Apr 19, 2017

wobster in a toilet?

@manez
Copy link
Member

manez commented Apr 19, 2017

image
I can't even...

- Install `composer`. [Install instructions here.][4]
- `$ cd /path/to/Milliner` and run `$ composer install`
- Then either
- For production, configure your web server appropriately (e.g. add a VirtualHost for Houdini in Apache) OR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these references to Houdini refer to Milliner?

- `$ cd /path/to/Milliner` and run `$ composer install`
- Then either
- For production, configure your web server appropriately (e.g. add a VirtualHost for Houdini in Apache) OR
- For development, run the PHP built-in webserver `$ php -S localhost:8888 -t src` from Houdini root.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


For example, suppose you create an entity at `http://localhost:8000/fedora_resource/`. If running the PHP built-in server command described in the Installation section:
```
$ curl -X "POST" "localhost:8888/metadata/fedora_resource/1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a really dumb question, but why is there a 1 on the end of that URI? What does it mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is the first one. It'll increment from there basically like drupalsite.foo/node/1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first what? The first time you did the sync (i.e. the first "push")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First "Fedora Resource" created in Drupal. I don't have a VM fired up right now, but iirc, localhost:8000, the add content, then select "Fedora Resource" (that's our content type), and select what kind you want, and then go from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's weird. When I fired up the VM, it just offered the choices "Page" or "Article" to create. Maybe I did something wrong.

Anyways, I'm sorry to be so thick about this, but what is the difference between localhost:8888/metadata/fedora_resource/1 and localhost:8888/metadata/other_fedora_resource/1 and localhost:8888/metadata/fedora_resource/2? If I do five POSTs to localhost:8888/metadata/fedora_resource/ I'll be at localhost:8888/metadata/fedora_resource/5, right? Will I have created five separate resources in Fedora, all of them copied from the same Drupal resource http://localhost:8000/fedora_resource/?

Or is it that localhost:8888/metadata/fedora_resource/1 is the endpoint associated with a Drupal resource http://localhost:8000/fedora_resource/1? In that case, should it say above "suppose you create an entity at http://localhost:8000/fedora_resource/1" (note that I added the 1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajs6f

As I read it the difference between http://localhost:8888/metadata/fedora_resource/1 and http://localhost:8888/metadata/fedora_resource/2 is that the first refers to the Drupal entity at http://localhost:8000/fedora_resource/1 and the second refers to the other Drupal entity at http://localhost:8000/fedora_resource/2.

Or POSTing to <Milliner hostname/port>/metadata/<path> pushes the entity at <Drupal hostname/port>/<path> to Fedora.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool, then I think I finally get it, but as I wrote above

In that case, should it say above "suppose you create an entity at http://localhost:8000/fedora_resource/1" (note that I added the 1)"

(Sidebar: This is why in any doc-ish situation where I need to use a random ID for a resource, I never use 1. I always pick some obviously random int like 88 or 1234 or something. 1 confuses really simpleminded people like me who suppose it to have some actual semantic.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whikloj has it right

POSTing to <Milliner hostname/port>/metadata/ pushes the entity at <Drupal hostname/port>/ to Fedora.

@dannylamb
Copy link
Contributor Author

I give you a microservice with the name of a hat maker. I receive a toilet avatar. THANKS INTERNET


This retrieves a jsonld representation of the specified Drupal entity and inserts it in Fedora.

For example, suppose you create an entity at `http://localhost:8000/fedora_resource/`. If running the PHP built-in server command described in the Installation section:
Copy link
Contributor

@ajs6f ajs6f Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the discussion below, shouldn't this be "For example, suppose you create an entity at http://localhost:8000/fedora_resource/1."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should

@ruebot
Copy link
Member

ruebot commented Apr 20, 2017

@dannylamb can you update the main Crayfish README too? https://github.com/dannylamb/Crayfish/tree/milliner#services

@dannylamb
Copy link
Contributor Author

@ruebot yeah np. and i'm not sure if i should update config now to use yaml as per #27 or if i should just wait until @acoburn's work lands.

@manez
Copy link
Member

manez commented Apr 20, 2017

image
Hats?

@acoburn
Copy link
Contributor

acoburn commented Apr 20, 2017

@dannylamb don't feel like you should wait for me. I think you should just proceed, and I'll integrate whatever you do into my eventual PR. I am still figuring out the best way to structure all of this (thanks for the suggestions @whikloj!) and I don't want to hold you up.


return [
'fedora base url' => 'http://localhost:8080/fcrepo/rest',
'drupal base url' => 'http://localhost:8000',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to avoid spaces in these array keys? (just thinking about the future YAML-based config...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah no prawb

@whikloj
Copy link
Member

whikloj commented Apr 21, 2017

@manez I'm not sure the toilet handle is required, but I do love the hats. Perhaps you could include the artists palette from http://daviddunkley.me/queens-plate/??

@manez
Copy link
Member

manez commented Apr 21, 2017

Ok no toilet hardware. Just hats. @dannylamb you have options 😄

image

if ($fedora_path !== null) {
throw new \RuntimeException(
"$drupal_path already exists in Fedora at $fedora_path",
200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not be a 200 response code. Perhaps 409 Conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, 409 for sure

Copy link
Member

@whikloj whikloj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the 200 OK -> 409 Conflict change (or some other code). But otherwise this works as expected.

I feel like Miliner or MilinerService should be adding the Gemini record so you don't end up generating more Fedora resources than Drupal resources?

@dannylamb
Copy link
Contributor Author

@whikloj Milliner very purposefully does not create the link using gemini. That will be a separate operation in the async workflow. Suppose Fedora is up, and after you create the resource with Milliner but before you link it, the db goes down. If you retried the operation, you would create a duplicate resource until the db came up. If linking is a separate operation, then you can just keep retrying that until you succeed without creating duplicates.

@whikloj
Copy link
Member

whikloj commented Apr 21, 2017

@dannylamb I'm not sure I understand that work flow. Here is what I did that raised the issue with me.

  1. I created a resource in Drupal (http://localhost:8000/fedora_resource/999)
  2. I POSTed that to Miliner and it responded with the new Fedora URL.
  3. I forgot to link them in Gemini (or perhaps the database was down)
  4. I tried to POST again to Miliner and it created another copy in Fedora.

So it seems that if you don't get that link made in Gemini you could end up in a situation of a GET from Gemini returning a 404 and then you POST to Miliner (because as far as you know it hasn't been posted yet) again and again.

I'm not really sure how to mitigate this issue...

@ajs6f
Copy link
Contributor

ajs6f commented Apr 21, 2017

@whikloj Is the problem here that Milliner doesn't create the link, or that the action against Milliner isn't idempotent? For example, if you could use PUT to create as well as update, and you tried the same workflow, would you avoid the problem?

port: 3306
dbname: gemini
user: changeme
password: changeme
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this section need to be here for Milliner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's for the PathMapper that gets injected into the MillinerService.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I guess it does, so Gemini and Milliner both need a PathMapper DB connection.

@dannylamb
Copy link
Contributor Author

dannylamb commented Apr 21, 2017

@whikloj If you combine the operations, then it fails to be idempotent which is going to be a problem. Separating them leaves us with two operations that can be retried independently of each other without introducing duplicates or making weird state.

The solution is that we use an alpaca pipeline that pokes milliner first, then pokes gemini.

@dannylamb
Copy link
Contributor Author

@ajs6f The problem with PUT is that then I need to know where I want to put it. For now I'm just POSTing and letting the default Fedora path minter kick in. I check to see if it's already been created and abort the POST to Milliner with 409 to keep it from introducing duplicates.

@ajs6f
Copy link
Contributor

ajs6f commented Apr 21, 2017

@dannylamb Makes sense, but from a µservice POV, would it be better to factor the default id mapping into Milliner as a service?

@whikloj
Copy link
Member

whikloj commented Apr 21, 2017

@ajs6f I like the idea of mapping our own paths, I'm always just unclear on what would be a good method to generate a consistent path.

@whikloj
Copy link
Member

whikloj commented Apr 21, 2017

I'm good to merge this. Comments/concerns @Islandora-CLAW/committers? Speak now or...it'll be too late.

@ajs6f
Copy link
Contributor

ajs6f commented Apr 21, 2017

Just leaving a note here to connect to some relevant IRC discussion: http://irclogs.islandora.ca/2017-04-21.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants