-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates existing user docs and adds one more #616
Updates existing user docs and adds one more #616
Conversation
Also feel free to suggest new resources to add to the list, or recommend cutting certain things that actually aren't good resources for linked data newbies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong here, just some suggestions.
docs/user-documentation/CLAWfor1x.md
Outdated
|
||
* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers. | ||
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.). | ||
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `resource` to act as a collection of other `resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last sentence is missing a capital letter, resources
-> Resources
...unless you were trying to keep a common feel for that word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I was trying to keep it the same as how it was before, but I feel like capitalizing it really drives home the point that this is a Specially Named Thing. Fixing.
docs/user-documentation/CLAWfor1x.md
Outdated
In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source). | ||
In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object (RELS-EXT, RELS-INT, DC, MODS, PREMIS, etc) while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x. | ||
|
||
Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, its totally correct...I'm wondering if a small diagram might help. Once you start with the whole
...
RDF Sources
containing only RDF data, withpcdm:Files
having links to the URL of...
I feel it might start to get hazy to some, but I'm not sure. Consider this an idea and not a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. We'll probably need a few more diagrams in general. I'll make a separate PR for that when I get some time to draw it out.
- [Wikipedia article on Serialization](https://en.wikipedia.org/wiki/Serialization) | ||
- [W3C’s RDF/XML Syntax Specification](https://www.w3.org/TR/REC-rdf-syntax/) | ||
- [W3C’s RDF 1.1 Turtle](https://www.w3.org/TR/turtle/) | ||
- [W3C’s JSON-LD 1.0](https://www.w3.org/TR/json-ld/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://json-ld.org
and
http://json-ld.org/playground/
are great JSON-LD resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, totally forgot about those! Adding those now.
docs/user-documentation/CLAWfor1x.md
Outdated
@@ -27,13 +27,16 @@ Fedora 3 objects are FOXML (Fedora Object eXtensible Markup Language) documents, | |||
* `System Properties`: A set of system-defined descriptive properties that is necessary to manage and track the object in the repository. | |||
* `Datastream(s)`: The element in a Fedora digital object that represents a content item. | |||
|
|||
In Fedora 4 , what we would have called `objects` are now referred to as `resources` and are not composed of XML; instead, they are stored in ModeShape as nodes with RDF properties. They can contain the following elements: | |||
In Fedora 4 , what we would have called `objects` are now referred to as [`Resources`](https://www.w3.org/TR/ld-glossary/#resource) (and *everything* in Fedora 4 is a `Resource`). Instead of being composed of XML as they were in Fedora 3, they are stored in [ModeShape](http://modeshape.jboss.org/) as nodes with RDF properties. A `Resource` in Islandora CLAW may [contain](https://www.w3.org/TR/ldp/#dfn-containment) RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A
Resource
in Islandora CLAW may contain RDF data or binary files, similar to the way Islandora 7.x-1.x objects stored descriptive metadata and binary files in datastreams.
NonRdfSource
's are Resources
and do not contain anything. Containment is limited to containers.
docs/user-documentation/CLAWfor1x.md
Outdated
|
||
* `Resource`: Roughly equivalent to a Fedora 3 object - a conceptual representation of a thing that can contain files or other containers. | ||
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.). | ||
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `Resource` to act as a collection of other `Resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `Resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `Resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move the clarification between RdfSources and NonRdfSources up to the paragraph above? Then you could cut out some of the preceding paragraph.
docs/user-documentation/CLAWfor1x.md
Outdated
* `Non-RDF Source`: Roughly equivalent to a datastream. A Non-RDF Source (or binary) is simply a bitstream (e.g. JPG, PDF, XML, MP3, etc.). | ||
Unlike Islandora 7.x-1.x objects that store metadata and binary files in a predefined way depending on the content model, Islandora CLAW uses [Linked Data Platform Containers](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-container), or LDPCs, to allow resources to contain each other in a flexible way. LDPCs allow one `Resource` to act as a collection of other `Resources` similar to the way an Islandora 7.x-1.x collection contains objects, or objects contain datastreams. When part of a `Resource`, binary files (such as JPG, PDF, MP3, etc) are referred to as [`Non-RDF Sources`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-non-rdf-source) because their content is not RDF data. `Resources` that contain only RDF data are called [`RDF Sources`](https://www.w3.org/TR/ldp/#ldpr-resource). | ||
|
||
CLAW makes use of the [Portland Common Data Model (PCDM)](https://github.com/duraspace/pcdm/wiki) as a layer of abstraction over LDPCs to make containment simpler to understand for users; a `pcdm:Collection` may contain other `pcdm:Collections` or `pcdm:Objects` (similar to an Islandora 7.x-1.x collection content model), and a `pcdm:Object` may contain other `pcdm:Objects` (similar to the way an Islandora 7.x-1.x compound object has child objects) or `pcdm:Files` (similar to the way Islandora 7.x-1.x objects have datastreams). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for bringing up that basically all our turn-key datatype can be compound objects.
docs/user-documentation/CLAWfor1x.md
Outdated
|
||
### Datastreams | ||
In Islandora CLAW, RDF datastreams (RELS-EXT and RELS-INT) are stored as RDF in Fedora. Binary datastreams are files or `nonRdfResources` (see [PCDM](https://github.com/duraspace/pcdm/wiki)). Descriptive metadata datastreams (MODS, DC, DwC, PBCore, etc) are stored as RDF; [`RDFSource`](https://www.w3.org/TR/ldp/#dfn-linked-data-platform-rdf-source). | ||
In Islandora 7.x-1.x, every object has a specific content model which defined what datastreams it could have and which were absolutely required. Some of these Islandora 7.x-1.x datastreams contained metadata about the object (RELS-EXT, RELS-INT, DC, MODS, PREMIS, etc) while others contained binary files (JPG, PDF, MP3, PNG, TIFF, etc). In Islandora CLAW, all metadata about a resource is stored as RDF attributes directly on the resource itself, whether that resource is a `pcdm:Collection`, `pcdm:Object` or a `pcdm:File`, so we no longer need to separate metadata by type (MODS, DC, PREMIS, etc) and store it in binary files as we did in Islandora 7.x-1.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add RELS-EXT and RELS-INT to the list of no longer needed metadata datastreams. And perhaps add that putting RDF on the NonRdfSources serves the same purpose as RELS-INT in the paragraph below.
## Islandora 7.x-1.x (with Fedora 3) | ||
Islandora 7.x-1.x is "middleware" for Drupal 7.x and Fedora 3, meaning that it fits as a layer in between these two systems and acts as a bridge allowing them to talk to each other. This is sometimes expressed as a hamburger: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more inclined to describe 7.x as Drupal modules that talk to a Fedora server instead of Drupal's database (although that is used in places as well).
![image](../assets/claw-chimera.png) | ||
|
||
Or, for a diagram that doesn't involve food or animals: | ||
Islandora CLAW does more than simply replace that base layer with Fedora 4. It is a total re-architecting of the interaction between the various pieces. Rather than a hamburger, Islandora CLAW is a [chimera](https://en.wikipedia.org/wiki/Chimera_(mythology)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLAW is where it really becomes middleware. I'd maybe add that here.
|
||
This new structure has several advantages: | ||
Like Islandora 7.x-1.x, Islandora CLAW uses Drupal modules to extend Drupal's native functionality to handle new types of content (Fedora Resources), but unlike Islandora 7.x-1.x, Islandora CLAW contains a completely new layer of "plumbing" between Drupal, Fedora, Blazegraph (CLAW's default triplestore), Solr and any other [microservices](https://en.wikipedia.org/wiki/Microservices) to allow all of these systems to pass messages to each other and stay in sync. This new structure has several advantages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace 'other microservices' with just 'microservices'. Otherwise this would imply Solr an Blazegraph are microservices.
Thanks for all the great feedback, @dannylamb! New commit should address all of these issues. I rewrote some of these parts, so there may be fresh 'bugs' to address... |
@bryjbrown Thanks so much for this. I'm going to let this sit for a bit to give folks form the CLAW call a chance to comment, and then I'll merge it. |
docs/user-documentation/CLAWfor1x.md
Outdated
|
||
Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`. | ||
Binary files, such as JPGs, PNGs, MP3s, and PDFs, are handled via `pcdm:Files` which are contained by a parent `pcdm:Object`, similar to how an Islandora 7.x-1.x cmodel may hold a PDF or JPG as a datastream. Unlike Islandora 7.x-1.x, these binary files can actually have their own technical metadata attached them. This is because `pcdm:Collections`, `pcdm:Objects` and even `pcdm:Files` are all `RDF Sources` containing only RDF data, with `pcdm:Files` having links to the URL of the `Non-RDF Source` (binary file) they represent as part of their RDF data in addition to whatever other metadata you may want about the file. Using this system, a `pcdm:Object` can contain as many `pcdm:Files` as necessary, and each `pcdm:File` can have separate metadata about itself and its relationship to other `pcdm:Files` attached to the parent `pcdm:Object`, serving the same purpose RELS-INT datastreams served in Islandora 7.x-1.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that people can still have their MODS/PBCORE/DC XML datastreams as a NonRdfSource (pcdm:File) if they really want to store an XML representation of RDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be a good idea. But, with the caveat that I don't see us supporting editing or indexing that anytime soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruebot @whikloj Even though you can do this, why would you want to? I feel like putting this in the docs would encourage people to do it, and then they might get wrong ideas like the MODS auto-updates when the RDF changes and other misunderstandings.
I'm 100% willing to add this bit to the docs, I just want to make sure I understand why first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryjbrown IMO you don't want to be doing anything like that. But you can if you want to move into f4 and migrate iteratively into something that will work with CLAW.
If we mention it, we need to mention that by doing it you miss out on pretty much everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dannylamb Fair enough, updating this now.
New section about 'Note that you can use a |
GitHub Issue: #510
What does this Pull Request do?
into-to-ld-for-claw.md
, a guided reading list for RDF newbies to get up to speedHow should this be tested?
Read it and make sure I'm not spreading misinformation
Interested parties
@Islandora-CLAW/committers