-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are primary and foreign keys on the right level of abstraction? #297
Comments
I like current JTS Foreign Keys specification and I would expect foreign keys to be part of a resource or table definition. It looks logical. Regarding implementation, I think, libraries that work on single resource level, should ignore foreign keys, because they reference other resources and that is out of library scope. But higher level libraries implementing data package specification could depend on those lower level libraries working on single resource level and add foreign key support. |
@sirex |
I think it is critical to the spec: PK and FK are critical to the context of a table. |
@pwalsh |
@roll honestly, I'm not sold on there being a practical distinction between |
@pwalsh
For example resources have names but tables (atomic csv files) just can't have names (not filenames of course=) because there is no namespace for it. As example: # datapackage
name: datapackage
resources:
- name: resource1
# resource
- name: resource2
# table
schema:
fields: ...
foreignKeys: <to resource1> Extracting resource2 JTS (we want to describe csv file separately): # JTS of resource 2
fields: ...
foreignKeys: <to resource1> Now foreignKey points to nowhere because in this case foreignKey just doesn't make sense. No namespace. So question is simple why referential entities not on resource level like this: # datapackage
name: datapackage
resources:
- name: resource1
- name: resource2
foreignKeys: <to resource1>
schema:
fields: ... |
@roll I see resource and table to be the same thing and my idea, to do the separation in the implementation not in specification, by just ignoring Since resource can be anything, not just tabular data, then Also, there is possibility to have schema outside of resource: {
"resources": [{"schema": "xyz-schema"}],
"schemas": {
"xyz-schema": {
schema goes here ...
}
}
} Foreight keys are directly tied to the schema, because they refer to fields defined in a schema, so I thing it is not good idea, to move foreight keys outside of schema definition. |
@sirex |
@roll, in that case, maybe foreign keys should point to jts schemas, no to data packages? |
well, JTS lib doesn't need to depend on DP lib. JTS lib just needs "something" to tell it the table/resource that is being referenced. If it has a defined API for this, then DP just needs to follow that API, but it does not mean that JTS lib depends on the DP lib. |
@sirex @pwalsh |
@roll I guess we should talk about this more, in the coming days. For me, it is not a specification issue at all, but rather an implementation issue. However, I agree that the specs need work in this area anyway. |
@pwalsh
|
In #297 (comment) comment I was saying that for example here: "foreignKeys": [
{
"fields": "state",
"reference": {
"datapackage": "http://data.okfn.org/data/mydatapackage/",
"resource": "the-resource",
"fields": "state_id"
}
} reference points to an external data package, so this makes circular dependency between two specs. To fix that, reference could point directly to a JTS schema instead of data package resource. For example,
If two specifications depend on each other, they should be either merged to one, otherwise dependencies should be removed. So I sort of agree with @roll, but as I understand, specs can't be changed that freely. |
@sirex To clarify my point. Imagine I start specs in this area from scratch:
In this case foreign keys is used to provide datapackage integrity exactly like in SQL. For me it makes sense because data packages is about data containerization. When we have container we could do references. One thing I suppose should be considered anyway - removing cross-datapackage referencing for v1. It's even solves circle-dependency between jts and dp: On JTS level saying that
On DP level adding that |
I would like to move to close this issue. Valuable discussion but if i read it correctly I do not think there is now anything outstanding in terms of a specific proposed change or an immediate bug with the specs. Let me know if any objections. |
INVALID / WONTFIX. See previous comment. See #314 for specific suggestion. |
Overview
For now primary and foreign keys are part of JSON Table Schema spec. There is strong reasoning why I suppose - like self-referencing, ability to create just one csv file with FK to "somewhere" etc. May be just easier to put it here.
But for implementations honestly it really badly breaks normal situation with level of abstractions. For example to follow this spec perfect JTS lib should have a circular dependency with DP lib to download and check referenced datapackage. Or goodtables checking some atomic tables should I don't know to do so on FK definition because between atomic tables there is no cross-naming mechanism.
Thoughts
For me it looks like a situations when idea is great but it will be just not supported in a real word. May be we will be able to hack it in python somehow but other implementations..
May be better to reduce the scope of this to make it really workable? Like json file can't have comments but has implementations for 100 languages. I see here one great concept that could lead the things:
Instead having PKs and FKs as an aside part of JTS spec moving it to DP to be core part of DP spec ensuring datapackage integrity like in SQL. It will be much more implementable across all implementations. In this case it will be also a better separation of concerns:
@akariv
@pwalsh
@sirex
@Stephen-Gates
et al WDYT?
The text was updated successfully, but these errors were encountered: