Skip to content

Validation rules

Eliot edited this page Nov 9, 2023 · 3 revisions

The Validation Rules relevant to Greenlight NeTEx validator are mainly divided into two major categories:

  • The rules based on XML schema
  • The rules beyond XML schema

Rules based on NeTEx XML schema

This kind of rules is relevant to data quality dimensions such as uniqueness, consistency and completeness.

image

The NeTEx XML schema rules can be applied automatically by any XML Validator. They concern, Syntactic checks, ​ XML schema conformance checks, Integrity cross-checks. Examples of this kind of rules are shown in the following Table.

Rules category Examples of rules
Syntactic checks -Well-formed XML : syntactically correct.​
i.e. <tag attribute=“xx”>data value</tag>
XML schema conformance checks -Valid tags, in valid order. No empty tags
-Valid cardinality: required, optional, 0,1,n​
-Encoding of Data Types: Date, Time, text, number, currency value, etc.
-Enumerated values are valid. E.g. Mode bus, rail, tram…​
Integrity cross-checks -Uniqueness constraints.​ Identifiers are unique in document​
-Referential integrity constraints.​ Any referenced entity must also be present in same file.

Completeness checks

The existence of the NeTEx profiles allows to check the completeness of a dataset against a particular profile. As full NeTEx schema is quite complete, including all elements that concern public transport, profiles are often used to limit the scope and address national and local specificities and needs. The Greenlight validator offers the possibility to upload your custom profile (e.g. a national or local profile), and choose from the predefined list the European Minimum Profile (EPIP).

image

Improving performance by separating integrity cross-checks from XML schema validation

Integrity cross-checks require a lot of memory which must create issues in performance when checking multiple files simultaneously or big files (of many GB). Therefore, in the tool the option to perform such checks “outside” XML schema validation has been added by using script coded rules.

In particular the relevant rules are:

The scripts of these rules are available in builtin folder. Check also Source codes inventory wiki page.
To apply these rules, choose NeTEx schemas without constraints (so called NeTEx Fast/EPIP fast on the web interface, and NeTEx@1.2-nc, epip@1.1.2-nc.

How to perform such checks using the Web Interface

In Configuration page

  • Select Packages --> NeTEx Fast (v.1.2), all rules package.

image

OR

  • Select Custom --> Profile --> NeTEx Fast OR EPIP light
  • Select Rules --> Validate NeTEx element uniqueness, Make sure NeTEx references have matching keys

image

How to perform such checks using the Command Line Interface

When running the docker setup you will have to add the –schema parameter with NeTEx@1.2-nc or epip@1.1.2-nc, this will by default use all the rules available.

docker run -it itxpt/greenlight validate -schema epip@1.1.2-nc -i testdata

OR setting only the rules with this parameter : -r netexUniqueConstraints,netexKeyRefConstraints

image

Rules beyond NeTEx XML schema

The rules that cannot express in XML and have been developed and incorporated in the current tool correspond mostly to validity, consistency, and accuracy.

Rules category Examples of rules
Complex cross-checks E.g. Validity dates of elements fall within validity dates of frame.​ 
 E.g. Stop spatial coordinates lie within  their Tariff Zone spatial coordinates
Conditional rules that only apply in some cases E.g. Point-to-point Tariff should have a Distance Matrix but a Zonal  Tariff should have Tariff Zones, etc.,  etc.,  etc
Parameterised rules with configurable values E.g. Appropriate distances between stops for transport mode.​ 
 E.g. Appropriate transfer distances to interchange.
Checks against external data sets/ databases. E.g. Operator codes, spatial coordinates.
Data modularized into multiple XML documents with cross- references. E.g. Large National data sets.

In total, we have been identified 138 rules potential rules beyond XML schema basics. These rules have been evaluated in terms of priority and concern both general rules and profile specific rules. It is envisaged that this list will the basis for further developments.

Priority level Number of rules Topics Specific to
High 5 Header – versions; Identifiers – Codespaces; Date ranges – Frame; Journey Parts General schemasou
Mid- high 31 Journey Parts; Frequencies; Timings... General schema
Medium 52 Stop point; Stop place; Line... EPIP and General schema
Mid – low 47 Hierarchy; Topographic; Display... Profile Specific and general schema
Low 3 Unused data; Stop Point; Journey Profile Specific and EPIP

We currently handle 15 rules that are listed in the following Currently integrated rules. In this table you may find the details of each rule together with the link to the corresponding script.

These rules have been chosen considering several criteria (such as importance, effort to be implemented, relevance to the full NeTEx profile and more).

  Script name Description Functional Area Aspect Severity  (10=high50=low) Development details
1 passingTimesIsNotDecreasing On a VERSION FRAME. ToDate must not be later than  FromDate on any date range. Common Content Date ranges - Frame 10 Check that from date is before to date on the VERSION FRAME
2 everyStopPointHaveArrivalAndDepartureTime Every POINT IN JOURNEY POINT In a JOURNEY PATTERN used by a JOURNEY must have a PASSING TIME with arrival and departure time (except for the first and last stop) Timetable Timings 20 Check that an appropriate ArrivalTime and DepartureTime exists in for each PASSING TIME  TimetabledPassingTime in a SERVICE JOURNEY.
3 everyStopPlaceHasAName Every STOP PLACE has a Name or ShortName attibute Stop Stop Place 20 Name attribute should be filled in for all STOP PLACEs
4 passingTimesHaveIncreasingTimes Successive DayOffset+PassingTimes for the POINTs IN JOURNEY Pattern or CALLS of a Journey must not decrease. Timetable Timings 20 Successive DayOffset+PassingTimes for the POINTs IN JOURNEY Pattern or CALLS of a Journey must not decrease
5 frameDefaultsHaveALocaleAndTimeZone The FrameDefaults of  a VERSION FRAME  should have values appropriate to the content Common Frame 30 Depends on Frame type.   -  For all frames check and DefaultLanguage exists in DefaultLocale in FrameDefaults.   - For frames that contain spatial coordinates, check that default  LOcationSystem is specified (usually WGS84)  - For Frames that contain elements with Timezones. (e.g. STOP PLACEs etc. in . SITE FRAME.  SCHEDULED STOP POINT in SERVICE FRAME, Check that Time Zone is specified.  - For frames that hold monetary values, e.g.  FARE FRAMES or if amount specified. NB can be specified on outmost COMPOSITE FRAME if common to all.
6 everyStopPlaceHasACorrectStopPlaceType Every STOP PLACE has a StopPlaceType  attribute with correct value Stop Stop Place 30 Each STOP PLACE should have a StopPlaceType attribute. This should match any type on the QUAYs.
7 netexKeyRefConstraints.js All stop identifiers (QUAY. all STOP PLACEs.  GROUPs OF STOP PLACEs and ACCESS) must comply with the profile codification Stop Stop Place 30 For Stop specifically  [COUNTRY code]: [INSEE common code]: [Type of object]: [Specific stop code]: [Issuer code of the technical code or LOC].
8 locationsAreReferencingTheSamePoint SCHEDULED STOP POINT must have similar spatial coordinates to those of the assigned STOP PLACE Stop Stop Point 30 Should be with a certain tolerance  of distance, varying by mode. .  Will not necessarily be the same centroid.
9 stopPlaceQuayDistanceIsReasonable Distance Between QUAY and STOP PLACE too long Stop Spatial 30 Distance between QUAY and STOP PLACE should not be too far apart. STOP PLACE location is centroid of station. QUAY is centroid of QUAY.
10 locationsAreReferencingTheSamePoint The location of QUAY and SCHEDULED STOP POINT should be within reasonable distance of the location or surface of STOP PLACE Stop Stop place 30 Take the positions from the QUAY and SCHEDULED STOP POINT and calculate the distance in meters. Hard code 500m in first version. later add parameter to set distance
11 everyScheduledStopPointHasAName A SCHEDULED STOP POINT must have an instantiated Name field Stop Stop Point 30 The name of a stop should be given. Applies s to Both STOP PLACE and SCHEDULE D STOP POINT
12 everyLineIsReferenced A LINE must have one or more ROUTE instances Timetable Line 40 Check that a LINE is referenced in at least one ROUTE
13 everyStopPointIsReferenced Any SCHEDULED STOP POINT that is declared should be used. i.e. referenced by an assignment or POINT IN PATTERN etc. Timetable Unused data 40 Check that each SCHEDULED STOP POINT is used in one or more JOURNEY PATTERNs.
14 everyStopPlaceIsReferenced Any STOP PLACE that is declared should be referenced by a STOP ASSIGNMENT Stop Unused data 50 Every STOP PLACE should be referenced in at least one STOP ASSIGNMENT. Depends on the profile.
15 locationsAreReferencingTheSamePoint SCHEDULED STOP POINT must be assigned to a STOP PLACE Stop Stop Point 50 Every SCHEDULED STOP POINT should be referenced in at least one STOP ASSIGNMENT. Depends on the profile.

Build your own rules

You can change or add your own rules by cloning the Greenlight repository from GitHub and modify one of the scripts in the directory builtin. Save it with a new name and then map the builtin folder to the docker container with the Docker parameter -v.

-v c:\code\greenlight\builtin:/usr/local/greenlight/builtin

Use the script in the same way as the ones of the standard scripts with the flag -r and name of the script. Example -r mymodifiedrule

Check also Manual for command line interface.