Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review the unit test approach #226

Open
javagl opened this issue Oct 4, 2022 · 1 comment
Open

Review the unit test approach #226

javagl opened this issue Oct 4, 2022 · 1 comment

Comments

@javagl
Copy link
Contributor

javagl commented Oct 4, 2022

For each aspect that is tested by the validator, I created a tileset JSON file that was supposed to cause that exact issue. The result is a set of serveral hundred JSON files in the specs/data directory, for tilesets, metadata, and subtrees.

The current approach of most of the unit tests (e.g. in TilesetValidatorSpec ) is to read each of these files, and check whether the validator generates the expected issue.

For some tests, it would probably make more sense to not read the data from a file, but to perform the validation call on an object that is directly declared in the test code itself. But having the set of files (that can easily be checked individually, at the command line) turned out to be useful occasionally, even though it may be considered to be more of an "integration test" than a "unit test".

I also considered the option to summarize these test cases in their own JSON file, roughly like in

[
  {
    "inputFile": "boundingVolumeRegionArrayInvalidElementType.json",
    "description": "One element of the region array is not a number",
    "expectedIssues": [ "ARRAY_ELEMENT_TYPE_MISMATCH" ]
   }
   ...
]

This could simplify testing in some ways, and allow documenting the test via the description. But it might turn out to be an unnecessary maintenance burden.

If somebody has preferences, thoughts, or alternative approaches, feel free to discuss them here.

@javagl
Copy link
Contributor Author

javagl commented Nov 9, 2024

This question came up, more prominently, in #319

There are a few points to consider.


First of all, the number of spec files is huge. This was not so much a problem for the 3D Tiles validation. But the NGA_gpm schema has a different structure: It is nested very deeply. Note that in 3D Tiles, the JSON can be arbitrarily deep - but that's not the point: All the validation still happens on properties of the tileset and tile and its properties. In NGA_gpm, there literally is a property
root.extensions["NGA_gpm"].masterRecord.collectionRecordList[0].sensorRecords[0].collectionUnitRecords[0].lsrAxisUnitVectors[0]
which is supposed to be a 3D unit vector. In order to have a complete (valid) tileset where only this property is wrong, there have to be files where ...

  • this unit vector has only 2 components
  • this unit vector has 4 components
  • this unit vector contains a "NOT_A_NUMBER" entry
  • this unit vector contains a value like -2.0
  • ...

and each of these cases implies a ~5-6KB tileset file.

It would/could make more sense to have each of these cases in one file that contains only that unit vector, and directly call the NgaGpmValidator.validateUnitVector function on the objects that are read from these files. The difficulty there would be to ensure proper coverage: Is this validation function really called for the root.extensions["NGA_gpm"].masterRecord.collectionRecordList[0].sensorRecords[0].collectionUnitRecords[0].lsrAxisUnitVectors[0], or is it only called for other unit vectors?

Alternatively, the specs could read a valid tileset , and then "destroy" it for the test, as in

const tileset = read(...);
tileset.extensions["NGA_gpm"].masterRecord.collectionRecordList[0].sensorRecords[0].collectionUnitRecords[0].lsrAxisUnitVectors[0] = "NOT_A_NUMBER!";
expect(validationOf(tileset)).toCause("TYPE_MISMATCH"); // Not a number!

One could argue that it can come in handy to have an actual file that contains a specific issue, but in doubt, one could just add some
writeFile(tileset, "brokenTileset.json");
to dump out the "broken" tileset on demand.


Another point is how to generate good coverage to begin with (regardless of whether the specs are stored in files or not). There are some tools out there that can generate test data. But they often only generate valid test data (which is the opposite of what we need), or they don't offer fine control over what is broken in the files, or they are outdated/unmaintained. (For example, something like https://github.com/bojand/json-schema-test-data-generator - but at least, this looks like something that could be good starting point...)

Some degrees of freedom are obvious: When a number has a minimum/maxium of 0.1/0.9, respectively, then there should be test cases for 0.099 and 0.901, for example. For strings and arrays, the tests should also include cases where the value is "slightly off".

But some aspects make this automated generation difficult. For example, the GPM schema defines several structures like

      "oneOf": [
        {
          "properties": {
            "interpolationMode": {
              "const": "nearestNeighbor"
            }
          },
          "required": [
            "interpolationMode"
          ],
          "not": {
            "required": [
              "interpNumPosts",
              "dampeningParam"
            ]
          }
        },
        {
          "properties": {
            "interpolationMode": {
              "const": "IDW"
            }
          },
          "required": [
            "interpolationMode",
            "interpNumPosts",
            "dampeningParam"
          ]
        }
      ]

Meaning that the whole object structure depends on the value of a certain (enum-valued) property. I noticed that the "type constraints" of JSON schema (oneOf/anyOf/not) are challenging for many libraries that otherwise handle the schemas pretty well.

On top of that, there are some constraints that can not be derived from the schema itself. One is the aforementioned unitVector: It should have unit length. But that's not part of the schema.


I might try out some further approaches here. I'd probably do that in my spare time, and therefore, based on https://github.com/javagl/JsonModelGen or (more likely) https://github.com/networknt/json-schema-validator . What's frustrating here is that if we had such a tool, the many hours of work that went into the specs until now could be accomplished in a few minutes. But I'm too much aware of https://xkcd.com/1319/ to try and casually develop something like this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant