-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Primary and Foreign ID fields #278
Conversation
@botanize Thanks for putting this together! Here are my suggestions and feedback. Style Currently in the proposal, the Primary ID is indicated at the top of each file and then again as a field type. This creates confusing instances where we are defining a Primary ID for the file, and then again as a related but separate concept per field. In simple cases these match, such as for routes.txt and My suggestion is to only indicate what the Primary ID is at the file level, and do away with defining "Primary" or "Foreign" at the individual field levels (i.e., keep them as they were defined originally as "ID" or "ID referencing"). "Foreign" IDs are already clear IMHO with the phrase "ID referencing". I'm a fan of keeping the definitions of "Primary ID" and "Foreign ID" in the Field Types section of the documentation. Tech
Let me know what you think and if I overlooked anything! Thanks. |
@scmcca thank you for the feedback!
I intended the file-level and field-level Primary ID concepts to be identical. The file-level Primary ID uniquely identifies a row in the file, if there's only one field required to do that, that field is a Primary ID. Seeing that While the existing language for foreign IDs ( Thank you for the technical notes, I was unsure of the primary keys for many of these files, which was the motivation for the issue and pull request!
I think we could use
That was also my understanding.
Makes sense to me, do we know what any consumers of pathways.txt are doing?
I label
I agree, my understanding is that this is a one-line file. I could add something like
I don't know how consumers interpret this, OneBusAway, OpenTripPlanner and TransitClock don't seem to consume it. I used the "ID named after the file" heuristic to select the Primary ID, but now I see that it's not even a required field. I don't see anything to gain from requiring |
I definitely support clearly stating the primary keys of each table in GTFS. Overall the direction of this proposal seems good, but I share some of the same concerns as @scmcca, who says "This creates confusing instances where we are defining a Primary ID for the file, and then again as a related but separate concept per field." This confusion arises because this PR is conflating the two distinct concepts of type and key. The primary key is a set of attributes, and every attribute has a type whether or not it is in the primary key. The fact that an attribute is a member of the primary key or not is independent of its type. It so happens that in GTFS we also have (retroactively) defined a type called ID. Primary keys often include fields of type ID, but when they are included in the primary key their type remains ID. I like that the primary key is shown as a tuple at the top of each table definition. This will be immediately recognizable and understandable to anyone familiar with the relational model of data, including many people implementing GTFS consumer systems. I don't think it's contradictory or problematic to also flag which fields are members of the primary key or are foreign keys, but these should not be treated as types - doing so is likely to create subtle confusion. Either they should not be in the type column, or if they are placed in one of the existing columns for brevity, it should be clear they are just annotations and do not replace the values in that column. I think the discussion of primary and foreign keys should also be moved out of the types list into a different section. |
Note that primary and foreign keys have been defined in the canonical GTFS validator that MobilityData has been working on with input from the community. See the table schemas at: Primary keys are fields annotated with I'd suggest cross-checking this PR with that tool to see if there are any discrepancies. |
I believe representing the specification in a standard schema format (i.e. frictionless, etc) is part of the MobilityData work plan for this quarter. As such, it would be great to: |
A couple of thoughts after looking through the validator and the frictionless spec.
|
Based on the comments in the linked issue, the key on |
I can reiterate the suggestion I made earlier to have the primary key for frequencies.txt be ( Also, it's a picky detail but to keep consistency should we be using common nouns for "primary key", "unique ID", and "foreign ID"? For example "Primary Key" would be "Primary key" at the top of each file and "unique ID" would be "Unique ID" in the Type column. Otherwise LGTM. |
I think it's actually the opposite. A primary key of ( |
@botanize That makes sense. Would that mean that the primary key would be equally correct if it were ( |
Given that headways for the same trip must not overlap (see the description of |
Regarding transfers.txt: This PR would define ( |
@Bertware internally, we use ( |
We use ( |
I think this PR should not address it, but the trip-to-trip extension should. |
Just checked the repository and there already is an open PR for the trip-to-trip extension #32 I was unaware of. Whichever gets merged in second should take the other into account, but if the trip-to-trip transfer proposal can be merged first we would prevent backwards compatibility issues (since this PR would declare (from_stop_id, to_stop_id) to be unique, while this would be undone if #32 gets merged after this one) |
Related: I closed PR #32. I think it makes sense to open a separate PR that's cleaner covering transfer rules for routes and trips, and in-seat transfers. |
I think the bigger issue is, what are you supposed to do if you develop a GTFS extension that violates a primary key described in this PR? If the extension becomes part of the spec the primary key would be updated, but while it's a proprietary extension there would be a conflict between the extended feed and the spec. We could just add something to the new Dataset Attributes subheading that says un-official extensions may change these relationships by adding new fields to the end of the table's primary key? That would prevent someone from extending |
@botanize Do you have examples of other extensions where this problem presents itself? If so it would be useful to note them here. Otherwise, it seems that there is demand for |
I don't have any examples, but it seems bound to come up again at some point. How long do we sit on this PR to try to get #32 or a substitute PR approved? |
Agreed that this is a foreseeable issue. I can open a substitute PR tomorrow, at which point we'll have to let 7 discussion days pass followed by 7 days for voting. So we should be able to resume with #278 by early October. |
I ordered the fields in the transfers.txt key to keep the existing keys for producers of the extension compatible: ( @Bertware @paulswartz does this meet your needs? |
@botanize looks good to me! |
@botanize if there is no more discussion, can we call for a vote on this proposal? |
It looks like we're ready for a vote. The vote is for adding a primary key attribute to each table's description and labeling "unique" and "foreign" IDs where applicable. Voting ends on 2021-10-14T23:59:59Z. I look forward to any final feedback and wrapping this up! |
+1 OpenGeo |
+1 Samtrafiken/Trafiklab |
fare_rules.txt is a problem IMHO. Although mostly not because changes in this PR. However Primary key ( Side note: However the bigger problem is that the current gtfs doesn't allow me to do this: |
I'm extending voting until 2021-10-21T23:59:59Z so that I can address comments between when I go on vacation (now) until voting closes. |
Indeed! Can you please start a separate issue so we can elaborate the problem(s) and solutions? |
+1 |
The vote ended on 2021-10-21 at 23:59:59 UTC with 3 votes in favor and 0 opposed. As per the Specification Amendment Process, this vote passes! Thanks everyone! |
Re #266
Primary ID (ID, ...)
to file introduction to clarify requirements to uniquely identify rowshttps://groups.google.com/g/gtfs-changes/c/YIOx6JYADMk/m/Zd0clk5lAAAJ