-
Notifications
You must be signed in to change notification settings - Fork 2
Validation rules
The Validation Rules relevant to Greenlight NeTEx validator are mainly divided into two major categories:
- The rules based on XML schema
- The rules beyond XML schema
This kind of rules is relevant to data quality dimensions such as uniqueness, consistency and completeness.
The NeTEx XML schema rules can be applied automatically by any XML Validator. They concern, Syntactic checks, XML schema conformance checks, Integrity cross-checks. Examples of this kind of rules are shown in the following Table.
Rules category | Examples of rules |
---|---|
Syntactic checks | -Well-formed XML : syntactically correct. i.e. <tag attribute=“xx”>data value</tag> |
XML schema conformance checks | -Valid tags, in valid order. No empty tags -Valid cardinality: required, optional, 0,1,n -Encoding of Data Types: Date, Time, text, number, currency value, etc. -Enumerated values are valid. E.g. Mode bus, rail, tram… |
Integrity cross-checks | -Uniqueness constraints. Identifiers are unique in document -Referential integrity constraints. Any referenced entity must also be present in same file. |
The existence of the NeTEx profiles allows to check the completeness of a dataset against a particular profile. As full NeTEx schema is quite complete, including all elements that concern public transport, profiles are often used to limit the scope and address national and local specificities and needs. The Greenlight validator offers the possibility to upload your custom profile (e.g. a national or local profile), and choose from the predefined list the European Minimum Profile (EPIP).
Integrity cross-checks require a lot of memory which must create issues in performance when checking multiple files simultaneously or big files (of many GB). Therefore, in the tool the option to perform such checks “outside” XML schema validation has been added by using script coded rules.
In particular the relevant rules are:
The scripts of these rules are available in builtin folder. Check also Source codes inventory wiki page.
To apply these rules, choose NeTEx schemas without constraints (so called NeTEx Fast/EPIP fast on the web interface, and NeTEx@1.2-nc, epip@1.1.2-nc.
How to perform such checks using the Web Interface
In Configuration
page
- Select
Packages
--> NeTEx Fast (v.1.2), all rules package.
OR
- Select
Custom
-->Profile
--> NeTEx Fast OR EPIP light - Select
Rules
--> Validate NeTEx element uniqueness, Make sure NeTEx references have matching keys
How to perform such checks using the Command Line Interface
When running the docker setup you will have to add the –schema
parameter with NeTEx@1.2-nc or epip@1.1.2-nc, this will by default use all the rules available.
docker run -it itxpt/greenlight validate -schema epip@1.1.2-nc -i testdata
OR setting only the rules with this parameter : -r netexUniqueConstraints,netexKeyRefConstraints
The rules that cannot express in XML and have been developed and incorporated in the current tool correspond mostly to validity, consistency, and accuracy.
Rules category | Examples of rules |
---|---|
Complex cross-checks | E.g. Validity dates of elements fall within validity dates of frame. E.g. Stop spatial coordinates lie within their Tariff Zone spatial coordinates |
Conditional rules that only apply in some cases | E.g. Point-to-point Tariff should have a Distance Matrix but a Zonal Tariff should have Tariff Zones, etc., etc., etc |
Parameterised rules with configurable values | E.g. Appropriate distances between stops for transport mode. E.g. Appropriate transfer distances to interchange. |
Checks against external data sets/ databases. | E.g. Operator codes, spatial coordinates. |
Data modularized into multiple XML documents with cross- references. | E.g. Large National data sets. |
In total, we have been identified 138 rules potential rules beyond XML schema basics. These rules have been evaluated in terms of priority and concern both general rules and profile specific rules. It is envisaged that this list will the basis for further developments.
Priority level | Number of rules | Topics | Specific to |
---|---|---|---|
High | 5 | Header – versions; Identifiers – Codespaces; Date ranges – Frame; Journey Parts | General schemasou |
Mid- high | 31 | Journey Parts; Frequencies; Timings... | General schema |
Medium | 52 | Stop point; Stop place; Line... | EPIP and General schema |
Mid – low | 47 | Hierarchy; Topographic; Display... | Profile Specific and general schema |
Low | 3 | Unused data; Stop Point; Journey | Profile Specific and EPIP |
We currently handle 15 rules that are listed in the following Currently integrated rules. In this table you may find the details of each rule together with the link to the corresponding script.
These rules have been chosen considering several criteria (such as importance, effort to be implemented, relevance to the full NeTEx profile and more).
Script name | Description | Functional Area | Aspect | Severity (10=high50=low) | Development details | |
---|---|---|---|---|---|---|
1 | passingTimesIsNotDecreasing | On a VERSION FRAME. ToDate must not be later than FromDate on any date range. | Common Content | Date ranges - Frame | 10 | Check that from date is before to date on the VERSION FRAME |
2 | everyStopPointHaveArrivalAndDepartureTime | Every POINT IN JOURNEY POINT In a JOURNEY PATTERN used by a JOURNEY must have a PASSING TIME with arrival and departure time (except for the first and last stop) | Timetable | Timings | 20 | Check that an appropriate ArrivalTime and DepartureTime exists in for each PASSING TIME TimetabledPassingTime in a SERVICE JOURNEY. |
3 | everyStopPlaceHasAName | Every STOP PLACE has a Name or ShortName attibute | Stop | Stop Place | 20 | Name attribute should be filled in for all STOP PLACEs |
4 | passingTimesHaveIncreasingTimes | Successive DayOffset+PassingTimes for the POINTs IN JOURNEY Pattern or CALLS of a Journey must not decrease. | Timetable | Timings | 20 | Successive DayOffset+PassingTimes for the POINTs IN JOURNEY Pattern or CALLS of a Journey must not decrease |
5 | frameDefaultsHaveALocaleAndTimeZone | The FrameDefaults of a VERSION FRAME should have values appropriate to the content | Common | Frame | 30 | Depends on Frame type. - For all frames check and DefaultLanguage exists in DefaultLocale in FrameDefaults. - For frames that contain spatial coordinates, check that default LOcationSystem is specified (usually WGS84) - For Frames that contain elements with Timezones. (e.g. STOP PLACEs etc. in . SITE FRAME. SCHEDULED STOP POINT in SERVICE FRAME, Check that Time Zone is specified. - For frames that hold monetary values, e.g. FARE FRAMES or if amount specified. NB can be specified on outmost COMPOSITE FRAME if common to all. |
6 | everyStopPlaceHasACorrectStopPlaceType | Every STOP PLACE has a StopPlaceType attribute with correct value | Stop | Stop Place | 30 | Each STOP PLACE should have a StopPlaceType attribute. This should match any type on the QUAYs. |
7 | netexKeyRefConstraints.js | All stop identifiers (QUAY. all STOP PLACEs. GROUPs OF STOP PLACEs and ACCESS) must comply with the profile codification | Stop | Stop Place | 30 | For Stop specifically [COUNTRY code]: [INSEE common code]: [Type of object]: [Specific stop code]: [Issuer code of the technical code or LOC]. |
8 | locationsAreReferencingTheSamePoint | SCHEDULED STOP POINT must have similar spatial coordinates to those of the assigned STOP PLACE | Stop | Stop Point | 30 | Should be with a certain tolerance of distance, varying by mode. . Will not necessarily be the same centroid. |
9 | stopPlaceQuayDistanceIsReasonable | Distance Between QUAY and STOP PLACE too long | Stop | Spatial | 30 | Distance between QUAY and STOP PLACE should not be too far apart. STOP PLACE location is centroid of station. QUAY is centroid of QUAY. |
10 | locationsAreReferencingTheSamePoint | The location of QUAY and SCHEDULED STOP POINT should be within reasonable distance of the location or surface of STOP PLACE | Stop | Stop place | 30 | Take the positions from the QUAY and SCHEDULED STOP POINT and calculate the distance in meters. Hard code 500m in first version. later add parameter to set distance |
11 | everyScheduledStopPointHasAName | A SCHEDULED STOP POINT must have an instantiated Name field | Stop | Stop Point | 30 | The name of a stop should be given. Applies s to Both STOP PLACE and SCHEDULE D STOP POINT |
12 | everyLineIsReferenced | A LINE must have one or more ROUTE instances | Timetable | Line | 40 | Check that a LINE is referenced in at least one ROUTE |
13 | everyStopPointIsReferenced | Any SCHEDULED STOP POINT that is declared should be used. i.e. referenced by an assignment or POINT IN PATTERN etc. | Timetable | Unused data | 40 | Check that each SCHEDULED STOP POINT is used in one or more JOURNEY PATTERNs. |
14 | everyStopPlaceIsReferenced | Any STOP PLACE that is declared should be referenced by a STOP ASSIGNMENT | Stop | Unused data | 50 | Every STOP PLACE should be referenced in at least one STOP ASSIGNMENT. Depends on the profile. |
15 | locationsAreReferencingTheSamePoint | SCHEDULED STOP POINT must be assigned to a STOP PLACE | Stop | Stop Point | 50 | Every SCHEDULED STOP POINT should be referenced in at least one STOP ASSIGNMENT. Depends on the profile. |
You can change or add your own rules by cloning the Greenlight repository from GitHub and modify one of the scripts in the directory builtin. Save it with a new name and then map the builtin folder to the docker container with the Docker parameter -v.
-v c:\code\greenlight\builtin:/usr/local/greenlight/builtin
Use the script in the same way as the ones of the standard scripts with the flag -r and name of the script.
Example -r mymodifiedrule
Check also Manual for command line interface.