-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Package "profiles" #87
Comments
_Somewhat_ related: [JSON-LD](http://json-ld.org/) (and adding `@context` to a resource) |
+1 on this. This is rather import for the work I'm currently doing where a lot of the metadata I need to provide, doesn't apply to the general data package. |
I've updated profiles propose to change to plural in that a given data package can implement multiple profiles. |
Has there been any further discussion or work on this? It seems pretty necessary in order to be able to have pluggability between data packages and different tools. As things stand, there doesn't seem to be anything to indicate that a data package is a tabular data package, rather than any other kind - other than inspecting all the fields. I'd suggest that the reference to the schema should be a URI though, and perhaps a dereferencable schema URL. |
@stevage i agree we should progress this - probably as a small separate spec similar to http://dataprotocols.org/data-package-identifier/ - would you be up for drafting something? Personally I'd like to have a single name rather than a URI as for most users this will be easier but to have a way for that to be dereferenceable e.g. have a convention that profiles.dataprotocols.org or something exists and profile X goes to profiles.dataprotocols.org/X |
yeah that works. You can actually have the best of both worlds:
Give me a chance to learn the rest of the ecosystem first, and see #183. Might be good to chat if you have time? |
@stevage definitely good to chat. Ping on #okfn on freenode or rufuspollock on skype. |
OK, I think we are going to introduce this and probably merge "microschemas" - #183 - with it. My sense is this will be a separate spec referenced from Data Package rather than inlined into that spec. Thoughts anyone? |
@pwalsh could you update on the status of the profiles / schemas registry. Is this now live? |
Need to clarify inheritance model for profiles: For Data Packages themselves and the use of the profiles field we go for explicitness. You write out all the profiles you expect to conform to even if some imply the other e.g. you'd have both "tabular" and "budget" listed even if the budget data package is a type of tabular data package. |
I propose that this will become a separate protocol that extends Data Packages but is not part of the base specification. @jpmckinney @pwalsh @paulfitz @stevage wdyt? |
I'm not sure I understand. Profiles are a mechanism for extending DP. Are you saying that the mechanism itself should be an extension of DP, and not part of the base spec? |
https://github.com/dataprotocols/registry Registry is live in the sense that we have a javascript library that works with it, using rawgit.com to serve the registry file straight from github: https://github.com/okfn/datapackage-registry-js/blob/master/index.js We are using this in DataPackagist with success. I'd be happy formalise it, and make an announcement about it, by:
|
Ok, just to check that I understand what this proposal currently is:
(I wonder if it would be better to also have direct links to schemas, for simplicity. It would also better support use cases where for some reason whoever is curating the registry doesn't want to accept a proposed profile. That situation would be a real road block with this proposal because there's no other way to link to it...).
Are informal profiles (that is, an agreement to add additional fields to data packages, without actually writing a schema) still allowed/encouraged? |
@stevage that's a great summary. Right now, I don't think we do want to link schemas in the datapackage.json itself as I think that isn't yet quite resolved and is also not necessarily essential for a lot of what people may use this for (e.g. i'm just looking for all tabular data packages - i'm not looking to validate against the schema) |
Sure, fair enough. |
I'm also thinking what about pure JSON Table Schemas vs full data package profiles. Should we support listing pure JSON Table Schemas? |
Just a quick comment. I understand why you would want to have profiles as a hash of all profiles the datapackage follows. It makes it simple to parse, but it does make it more tedious to generate where you'd need to know a lot more than just the datapackage profile you're trying to follow. Made up example: I have an "historical budget data package" which might combine "budget data package" and "historical data package" (made up) profiles. Budget data package might include "openspending data package" and these are "tabular data packages" and the historical data package might be something like a "function data package" (made up). Now if I were just reading the "historical budget data package" profile page I would now possibly have to read a lot of background profiles or which I'd probably just do: copy things from one package to another without any thought. I also do not like versioning, mostly because I don't think that versioning is handled in a good way and it becomes confusing quickly. It would make sense to me if it was only a single profile I'd be inheriting from, but I might be inheriting the budget data package profile version 2.0 which supports multiple versions of the "tabular data package". I'm also slightly afraid of using names like I think we should try to go for simplicity and I think we're going in the wrong direction with this. Why not just a link to the specification or the schema? Sorry for being a naysayer here, but I'm just very afraid that I might be less inclined to use data package profiles if this gets too complicated. |
@tryggvib so:
I'm still very open-minded and the proof of this stuff is in the implementation so trying this out is what matters :-) Aside: "*" or simple "" means that you can have any version of the schema you like. |
If we already have a canonical name in the registry, couldn't we then just have the registry manage the inheritance? That does make the use case of "I want all tabular data packages which should also pick up budget data packages and other derived profiles" slightly more difficult, but at least we won't have to rely on package maintainers to remember to add all the inheritance stuff (which would mean we might miss out on packages in the use case because of the assumption that everyone will add all profiles). Also I'm sceptical whether the "" and "" are a good thing. If the profile updates and becomes backwards incompatible. Those packages that used "" or "" will not adhere to the profile and we'll end up with inconsistencies. This puts a lot of restrictions on profile creator about how they can develop their profiles in the future. |
Just thinking of the way other kinds of systems manage this stuff, I imagine a bit of XSL and NPM. XSL documents tend to start by explicitly listing all the namespaces they refer to, and you just copy this chunk of text from one similar document to the next. I imagine this working out the same - there'll just be a chunk of three lines that you include in every OpenSpending Data Package, for instance. The |
@stevage i borrowed the |
Heh, maybe that was an older npm mechanism - deprecated now if it's even supported. The current mechanism works very well - "^1.0.0" means "any 1.x.x". |
@rgrp @pwalsh So with our working implementation of profiles, any given Data Package has only one unversioned profile. And all inheritance from any other profile (e.g. And so, would a simple
Should we then also specify alternate registry URLs (as already supported in
|
I don't follow. What do you mean by:
And, what problem are you solving by adding |
@pwalsh earlier in the thread @rgrp says this:
But in describing how a Data Package publisher can specify a profile for their Data Package, we can just keep it simple and suggest adding a single property (e.g. I suggested "profileRegistryURL" based on @stevage suggestion to allow for a mechanism to allow for profiles that don't exist on the core registry.
|
Ok, so on point 1, no, the JSON Schema for a profile (which is really just an implementation detail, or, if you like, a representation of the spec) does not implicitly or explicitly declare any inheritance, so I still don't understand what you mean there. On point 2, well, I get the idea, but there is not actually any dependency between the spec and the core registry (the specs do not say, for example, that |
|
@pwalsh @rgrp @stevage @jpmckinney drafting a description of profiles here: https://github.com/dataprotocols/dataprotocols/blob/add-profiles-documentation/data-package-profiles/index.md I think the next section in this doc could be a template for writing such a profile (#196) |
A few comments on what you have there:
|
Thanks for the comments, @stevage. To your points:
|
I implemented this in f1ccbee It applies to Data Package, and also to the new Data Resource, descriptors. It is not implemented as an object, nor with hierarchy - those suggestions are way too complex and require a publisher to know details about subclassing and inheritance that a publisher should frankly never need to know. Instead, we have:
|
@pwalsh ok. BTW this is one example where i think we could do the PR separately and distinctly - but ok if not. |
From the draft commit: f1ccbee:
I am lost here - what is a profile? Datapackage itself or a resource? Does it mean that any datapackage MUST have |
@Fak3 using a single commit is probably not the best way to get the context you need. The profile concept is explained in narrative form in the base data package spec, and is not further explained in the reference information for each profile. I'll push a built site for @rufuspollock to review and we'll take it from there. |
…parate mini-spec. * /profiles/ is a mini-spec explaining and defining meaning and syntax of profile property * [dr]: add profile property * [dp]: add profile property
It is clear people will want to extend base Data Package spec for particular formats or structures. This could be both for types of data (e.g. tabular as we do with Simple Data Format) and topical areas (e.g. financial data).
This proposal is about:
Proposal
Introduction of a
profiles
attribute whose value is a hash consisting of profile names and a version of that profile. Structure is same as fordependencies
.Material from #183 on microschemas and profiles
Idea: allow people to start registering simple little schemas for their data (esp tabular data). This would be in the form of a JSON Table Schema. For example, we have Budget Data Package for public finances, we could have a simple schema for crime reports or restaurant inspections.
Notes:
The text was updated successfully, but these errors were encountered: