Are primary and foreign keys on the right level of abstraction? #297

roll · 2016-09-20T09:13:49Z

Overview

For now primary and foreign keys are part of JSON Table Schema spec. There is strong reasoning why I suppose - like self-referencing, ability to create just one csv file with FK to "somewhere" etc. May be just easier to put it here.

But for implementations honestly it really badly breaks normal situation with level of abstractions. For example to follow this spec perfect JTS lib should have a circular dependency with DP lib to download and check referenced datapackage. Or goodtables checking some atomic tables should I don't know to do so on FK definition because between atomic tables there is no cross-naming mechanism.

Thoughts

For me it looks like a situations when idea is great but it will be just not supported in a real word. May be we will be able to hack it in python somehow but other implementations..

May be better to reduce the scope of this to make it really workable? Like json file can't have comments but has implementations for 100 languages. I see here one great concept that could lead the things:

datapackage integrity

Instead having PKs and FKs as an aside part of JTS spec moving it to DP to be core part of DP spec ensuring datapackage integrity like in SQL. It will be much more implementable across all implementations. In this case it will be also a better separation of concerns:

JTS is for types and constraints
DP is for data containerization

@akariv
@pwalsh
@sirex
@Stephen-Gates
et al WDYT?

sirex · 2016-09-20T09:36:45Z

I like current JTS Foreign Keys specification and I would expect foreign keys to be part of a resource or table definition. It looks logical.

Regarding implementation, I think, libraries that work on single resource level, should ignore foreign keys, because they reference other resources and that is out of library scope.

But higher level libraries implementing data package specification could depend on those lower level libraries working on single resource level and add foreign key support.

roll · 2016-09-20T09:41:52Z

@sirex
But you say almost the same what I've said - we have a situation when we ignore it on a table level (Stream->Table->Resource->Datapackage) and use it on resource level. So why it's attribute of table not resource?

pwalsh · 2016-09-20T09:42:23Z

I think it is critical to the spec: PK and FK are critical to the context of a table.

roll · 2016-09-20T09:47:00Z

@pwalsh
Table (eg lonely csv file) or resource (part of datapackage)? For example we have just one csv file with defined FK. It's like to have one SQL table without a database as a context.

pwalsh · 2016-09-20T09:53:35Z

@roll honestly, I'm not sold on there being a practical distinction between Table and Resource, for our needs.

roll · 2016-09-20T10:24:12Z

@pwalsh
On the last iteration of work on python libraries I've found that this distinction like a key to solve some problems that was like unsolvable without this abstraction:

# jts level
Stream - headers+rows
Table - Stream+schema

# dp level
Resource - Table/Image/Document/etc + metadata in context of data container (datapackage)
Datapackage - container contains Resources + metadata

For example resources have names but tables (atomic csv files) just can't have names (not filenames of course=) because there is no namespace for it.

As example:

# datapackage
name: datapackage
resources:
  - name: resource1
  # resource
  - name: resource2
    # table
    schema:
      fields: ...
      foreignKeys: <to resource1>

Extracting resource2 JTS (we want to describe csv file separately):

# JTS of resource 2
fields: ...
foreignKeys: <to resource1>

Now foreignKey points to nowhere because in this case foreignKey just doesn't make sense. No namespace.

So question is simple why referential entities not on resource level like this:

# datapackage
name: datapackage
resources:
  - name: resource1
  - name: resource2
    foreignKeys: <to resource1>
    schema:
      fields: ...

sirex · 2016-09-20T10:39:38Z

@roll I see resource and table to be the same thing and my idea, to do the separation in the implementation not in specification, by just ignoring foreignKeys for jts level libraries.

Since resource can be anything, not just tabular data, then foreignKeys will not make sense for other resource types except tabular data.

Also, there is possibility to have schema outside of resource:

{
  "resources": [{"schema": "xyz-schema"}],
  "schemas": {
    "xyz-schema": {
      schema goes here ...
    }
  }
}

Foreight keys are directly tied to the schema, because they refer to fields defined in a schema, so I thing it is not good idea, to move foreight keys outside of schema definition.

roll · 2016-09-20T10:50:17Z

@sirex
You're right that it's tied to schema. But when your lower level things depends on higher-level things (jts has reference to dp) it's also a bad thing. Seems no perfect solution here. In SQL spec everything OK related to this problem because there is no separate specification for table that should make sense by itself.

sirex · 2016-09-20T10:53:39Z

@roll, in that case, maybe foreign keys should point to jts schemas, no to data packages?

pwalsh · 2016-09-20T10:54:28Z

well, JTS lib doesn't need to depend on DP lib.

JTS lib just needs "something" to tell it the table/resource that is being referenced.

If it has a defined API for this, then DP just needs to follow that API, but it does not mean that JTS lib depends on the DP lib.

roll · 2016-09-20T10:59:58Z

@sirex
What do you mean? My main point about difference between table and resource (aside it's too different specs) that resource has name but table hasn't. So we could reference only resources in datapackage context.

@pwalsh
So should it be somehow explained in specification? Like without datapackage context this, this and this will be completely ignored? For now it's not clear.

pwalsh · 2016-09-20T11:01:11Z

@roll I guess we should talk about this more, in the coming days. For me, it is not a specification issue at all, but rather an implementation issue. However, I agree that the specs need work in this area anyway.

roll · 2016-09-20T11:07:25Z

@pwalsh
Yea great. Just wanted to raise it because from my experience there are 2 main weak points in specs:

this level of abstraction break (may be is ok as you said with proper explanation in spec and impls)
tabular data package instead of mechanism to define tabular resources

sirex · 2016-09-20T12:58:11Z

In #297 (comment) comment I was saying that for example here:

  "foreignKeys": [
    {
      "fields": "state",
      "reference": {
        "datapackage": "http://data.okfn.org/data/mydatapackage/",
        "resource": "the-resource",
        "fields": "state_id"
      }
    }

reference points to an external data package, so this makes circular dependency between two specs.

To fix that, reference could point directly to a JTS schema instead of data package resource.

For example, datapackage and resource should be replaced to something like that:

"schema": "the-resource" - points to "current" space of schemas in case if jts is embedded somewhere, mapping of all schemas should be provided to a library from outside, for example, if datapackage is embeding jts schema it would provide all schemas from other resources to the library so that jts library could validate foreign keys.

"schema": "dp+http://data.okfn.org/data/mydatapackage/?resource=the-resource" - this would point to external schema, it means, that jts library still would have to support datapackage specs, but only small subset of it, to get schemas from resources.

"schema": "http://data.okfn.org/data/mydatapackage/a-schema.json" - this could point directly to other schema.

If two specifications depend on each other, they should be either merged to one, otherwise dependencies should be removed. So I sort of agree with @roll, but as I understand, specs can't be changed that freely.

roll · 2016-10-10T19:42:16Z

@sirex
This makes sense but kinda complex and use pointing to schema (description) instead of resource (dataunit).

To clarify my point. Imagine I start specs in this area from scratch:

I use jsontableschema for types and constraints
I use datapackage as it is for data packaging
I add TabularResource section to the datapackage spec saying:

TabularResource is a resource which:

MUST have schema attribute pointing to JTS
SHOULD have primaryKey attribute
COULD have foreignKeys attribute pointing to other TabularResource in this datapackage

In this case foreign keys is used to provide datapackage integrity exactly like in SQL. For me it makes sense because data packages is about data containerization. When we have container we could do references.

One thing I suppose should be considered anyway - removing cross-datapackage referencing for v1. It's even solves circle-dependency between jts and dp:

On JTS level saying that resource could mean anything on higher levels or self for self-referencing (or no resource for self-referencing).

foreignKeys
  - fields
    reference
      resource
      fields

On DP level adding that resource is a datapackage resource. So it's kinda an extend bottom levels on top levels approach. Other specifications could be able to use JTS foreign keys adding other meaning to resource.

rufuspollock · 2016-10-19T15:44:43Z

I would like to move to close this issue.

Valuable discussion but if i read it correctly I do not think there is now anything outstanding in terms of a specific proposed change or an immediate bug with the specs. Let me know if any objections.

roll · 2016-10-19T16:48:23Z

@rgrp
I've extracted a specific idea from this discussion - #314

rufuspollock · 2016-10-20T08:46:59Z

INVALID / WONTFIX. See previous comment. See #314 for specific suggestion.

roll mentioned this issue Oct 19, 2016

Remove datapackage reference from jsontableschema foreign keys for v1? #314

Closed

rufuspollock closed this as completed Oct 20, 2016

roll mentioned this issue May 30, 2017

TableSchema v1: foreignKeys don't belong here #450

Closed

roll added this to Open Knowledge Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are primary and foreign keys on the right level of abstraction? #297

Are primary and foreign keys on the right level of abstraction? #297

roll commented Sep 20, 2016 •

edited

Loading

sirex commented Sep 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

sirex commented Sep 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

sirex commented Sep 20, 2016

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

sirex commented Sep 20, 2016 •

edited

Loading

roll commented Oct 10, 2016 •

edited

Loading

rufuspollock commented Oct 19, 2016

roll commented Oct 19, 2016

rufuspollock commented Oct 20, 2016

Are primary and foreign keys on the right level of abstraction? #297

Are primary and foreign keys on the right level of abstraction? #297

Comments

roll commented Sep 20, 2016 • edited Loading

Overview

Thoughts

sirex commented Sep 20, 2016

roll commented Sep 20, 2016 • edited Loading

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 • edited Loading

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 • edited Loading

sirex commented Sep 20, 2016

roll commented Sep 20, 2016 • edited Loading

sirex commented Sep 20, 2016

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016

pwalsh commented Sep 20, 2016

roll commented Sep 20, 2016 • edited Loading

sirex commented Sep 20, 2016 • edited Loading

roll commented Oct 10, 2016 • edited Loading

rufuspollock commented Oct 19, 2016

roll commented Oct 19, 2016

rufuspollock commented Oct 20, 2016

roll commented Sep 20, 2016 •

edited

Loading

roll commented Sep 20, 2016 •

edited

Loading

roll commented Sep 20, 2016 •

edited

Loading

roll commented Sep 20, 2016 •

edited

Loading

roll commented Sep 20, 2016 •

edited

Loading

roll commented Sep 20, 2016 •

edited

Loading

sirex commented Sep 20, 2016 •

edited

Loading

roll commented Oct 10, 2016 •

edited

Loading