Revamp how STIX content customization is detected and enforced #503

chisholm · 2021-04-09T16:27:58Z

This PR shows work done to improve the way customization is handled. One of the main changes is that STIX objects no longer maintain an internal _allow_custom flag. In order to enforce that there is no customization (when allow_custom=False), one needs to know if customization is present. Just because customizations were deemed okay at construction time (allow_custom=True) does not mean they actually exist. So instead, objects are given a has_custom property, which directly addresses the question and makes customization detection in sub-objects possible.

The responsibility for both enforcing customization preferences and detecting the presence of customizations lies with the property clean() methods. Before, those methods could not easily do their enforcement job because they were not told what the allow_custom preference was. There was some very ugly hackage in places where that became a problem, which is now removed. In this PR, the clean() methods are now passed the allow_custom preference. They also now return not only the cleaned value, but also a boolean flag indicating whether customization was detected (which of course needs to be False if customization was not allowed!).

Enforcement has been tightened up in certain places. This will catch customization in places where it didn't before, and in fact revealed some unit test bugs which have been fixed. For example, open-vocabs are now properly enforced. There is a new OpenVocabProperty class designed specifically for this, and STIX object classes have been changed to use it. All of the enum and vocab values from the STIX spec have been added to special vocab modules for both 2.0 and 2.1; they are the enforcement criteria now. The difference between an enum and open vocab is that no customization is ever allowed with enums, as that would result in non-compliant content; open vocabs admit values not listed in the spec, which are considered a customization and will be enforced accordingly. Another example is that a customization in an extension will in all cases be caught and cause an error if allow_custom=False in the parent object. The state of "customization" ought to always propagate upward. An object has been customized if custom content occurs anywhere in the object.

Code dealing with hashes has been moved to a new module to remove clutter from the properties module. Hash types are now properly enforced for each spec version, which are different in 2.0 and 2.1. It supports some flexibility in how hash algorithms are named in user content, while ensuring that serialized output uses the spec-defined spelling/casing (which also differs between STIX 2.0 and 2.1).

Public API has mostly not changed, but user code which uses non-public APIs might break (e.g. if it called a clean() method). Some property constructor args changed (e.g. allow_custom was removed since it doesn't make sense there). Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content.

This PR includes changes which would address issue #501 . The way in which ReferenceProperty does customization enforcement is a bit more sophisticated. When a whitelist constraint on generic STIX type categories is given for this property and allow_custom=True during cleaning, the whitelist is automatically converted to a blacklist on the complementary categories. This effectively weakens the criteria with respect to unregistered custom types: it ensures that types known to be incorrect are not given, as opposed to ensuring that types known to be correct are given. An unregistered custom type is not known to be incorrect (the library knows nothing about it if it has not been registered), so it will satisfy the weakened criteria and be allowed.

Fixes #333.
Fixes #433.
Fixes #498.
Fixes #501.

thought it should, but forgot to fix Report to use ReferenceProperty in the way I thought it should! Oops. Added some tests to ensure Report is working property with custom ID types in object_refs.

work when a whitelist of generic category types is used. Disallow hybrid constraints (both generic and specific at the same time). Add more unit tests.

to satisfy type constraint: empty whitelist. It would be silly for anyone to do that, but I should check just in case I guess.

assert prop.clean(...) doesn't test well enough since clean() methods on this branch produce 2-tuples, and you should test what's in the tuple, not just that it returned something non-empty. So I fixed it in several places to test the tuple contents.

constraints (i.e. both generic type categories and specific types). Also: - more expansion/refinement of reference property unit tests - bugfix: SROs are in OBJ_MAP too, it's not just SDOs! Oops... - pre-commit stylistic fixes

slipped through due to a rather complex rebase... not too surprising I missed it.

a new OpenVocabProperty class. It also requires a redesign of HashesProperty and redoes general library support for hash algorithms.

to check key lengths. Added some unit tests for hash keys. Also added a library hash support test module I'd forgotten to add before.

various STIX objects. Also pre-commit stylistic fixes...

glitch? Something else I overlooked? Not sure.

available after the rebase. This simplifies the implementation. Also made utils.to_enum() a module public function, since I needed to use it outside that module. Misc pre-commit stylistic fixes.

objects, to structure object_refs type requirements as an empty blacklist, instead of a whitelist. I think it was originally necessary due to the older implementation of ReferenceProperty which was in place at the time. With the implementation change to invert whitelists with generics to blacklists when allow_custom=True, a "full" whitelist is internally converted to an empty blacklist anyway, so it winds up being the same thing. But I think the full whitelist looks better in the code, so I prefer that to the empty blacklist.

the git action uses a more recent version of pre-commit than travis did, and the newer pre-commit requires different things than the old one did.

stix2/v20/vocab.py

stix2/base.py

clenk · 2021-04-14T15:06:34Z

As we discussed elsewhere, let's change this PR to allow custom open vocab values again for now. In a future version we'll work out more granular customization settings to support checking them.

properties are now always accepted and not considered customizations (has_custom will be False). Just commented out the enforcement code and xfail'd the unit tests, which makes it easy to reinstate these behaviors later. It was decided this is too likely to break user code, and we don't currently have a way to disallow customizations in other places while still allowing custom open vocab values. Safest to always allow custom open vocab values, for now.

emmanvg · 2021-04-15T14:15:03Z

The other general comment was to inspect the documentation/ipynb guides to check for any content that needs updating based on these changes.

adulau · 2021-04-15T14:24:58Z

Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content. What's the background for such change in the reference implementation? Can you point me to the standard where this is a required change?

clenk · 2021-04-15T16:31:18Z

@adulau the reason for these changes is not anything in the standard. There have been inconsistencies with this library's handling of custom content that has led to several issues, such as #501, #498, #433, #333. That's what we're trying to address here. But as part of that, some things have changed, such as the parameters and return values of the clean() method. Some content that gave custom content errors before in a nested object or reference properties are now allowed.

chisholm · 2021-04-15T19:27:43Z

Having an allow_custom config setting was a software design decision, not a requirement from the specification. It was thought to be desirable to allow users to cause the library to produce errors when custom content is encountered, when they want to prevent it. It can also be useful for catching typos and mistakes. That design decision is much older than this PR, and has been in the library since the beginning as far as I know. This PR is about improving how that system works.

I mentioned in the PR summary that it could cause new errors, which sounds alarming; maybe I should also have been clearer that it can resolve errors too :) The last paragraph describes one such case.

codecov-commenter · 2021-06-11T18:21:07Z

Codecov Report

Merging #503 (5cce864) into master (2743b90) will decrease coverage by 2.34%.
The diff coverage is 58.04%.

@@            Coverage Diff             @@
##           master     #503      +/-   ##
==========================================
- Coverage   89.47%   87.12%   -2.35%     
==========================================
  Files         147      151       +4     
  Lines       16599    17702    +1103     
==========================================
+ Hits        14852    15423     +571     
- Misses       1747     2279     +532

Impacted Files	Coverage Δ
stix2/test/test_workbench.py	`100.00% <ø> (ø)`
stix2/test/v20/conftest.py	`100.00% <ø> (ø)`
stix2/test/v20/test_datastore_filters.py	`100.00% <ø> (ø)`
stix2/test/v20/test_datastore_memory.py	`100.00% <ø> (ø)`
stix2/test/v20/test_pickle.py	`100.00% <ø> (ø)`
stix2/test/v20/test_versioning.py	`100.00% <ø> (ø)`
stix2/test/v21/conftest.py	`100.00% <ø> (ø)`
stix2/test/v21/test_datastore_filters.py	`100.00% <ø> (ø)`
stix2/test/v21/test_datastore_memory.py	`100.00% <ø> (ø)`
stix2/test/v21/test_indicator.py	`100.00% <ø> (ø)`
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2743b90...5cce864. Read the comment docs.

chisholm added 16 commits March 31, 2021 15:21

Initial revamp of customization enforcement and detection.

d2f960f

Fix Report SDO: I'd fixed ReferenceProperty to work the way I

387ce7e

thought it should, but forgot to fix Report to use ReferenceProperty in the way I thought it should! Oops. Added some tests to ensure Report is working property with custom ID types in object_refs.

Further ReferenceProperty refinements: make allow_custom=True

c7dd58e

work when a whitelist of generic category types is used. Disallow hybrid constraints (both generic and specific at the same time). Add more unit tests.

Add a check in ReferenceProperty constructor for an impossible

03c265c

to satisfy type constraint: empty whitelist. It would be silly for anyone to do that, but I should check just in case I guess.

Add support in ReferenceProperty for hybrid object type

414694d

constraints (i.e. both generic type categories and specific types). Also: - more expansion/refinement of reference property unit tests - bugfix: SROs are in OBJ_MAP too, it's not just SDOs! Oops... - pre-commit stylistic fixes

Pre-commit changes. Also remove a dupe test function which

62ee2b2

slipped through due to a rather complex rebase... not too surprising I missed it.

Add proper customization enforcement for open vocabs. This adds

c590de8

a new OpenVocabProperty class. It also requires a redesign of HashesProperty and redoes general library support for hash algorithms.

Add back spec_version kwarg to HashesProperty, since it was used

a8a6559

to check key lengths. Added some unit tests for hash keys. Also added a library hash support test module I'd forgotten to add before.

Add back usage of spec_version where HashesProperty is used for

c8c4e89

various STIX objects. Also pre-commit stylistic fixes...

Make a minor post-rebase fix

6f60bed

Fixed some more silliness in properties.py... was it a rebase

c8b9eab

glitch? Something else I overlooked? Not sure.

Clean up ReferenceProperty.clean(), to use new APIs which are

0ddf0d2

available after the rebase. This simplifies the implementation. Also made utils.to_enum() a module public function, since I needed to use it outside that module. Misc pre-commit stylistic fixes.

Pre-commit stylistic fix

82148aa

Another pre-commit stylistic fix. Necessary since apparently

3312890

the git action uses a more recent version of pre-commit than travis did, and the newer pre-commit requires different things than the old one did.

emmanvg reviewed Apr 14, 2021

View reviewed changes

stix2/v20/vocab.py Outdated Show resolved Hide resolved

emmanvg reviewed Apr 14, 2021

View reviewed changes

stix2/base.py Show resolved Hide resolved

chisholm added 2 commits April 14, 2021 12:48

pre-commit stylistic fix

4dc8e49

emmanvg added this to the 3.0.0 milestone Apr 15, 2021

emmanvg added the enhancement label Apr 15, 2021

chisholm and others added 3 commits April 15, 2021 22:06

Add variables for each individual enum value in the vocab modules.

2b06487

Improve error message for invalid reference types

7209346

Test invalid_types ReferenceProperty with custom

5cce864

clenk merged commit 646aaa3 into oasis-open:master Jun 11, 2021

This was referenced Jun 11, 2021

Note SDOs do not work on custom STIX types #501

Closed

A Custom Object in object_refs of a Report Object. #333

Closed

allow_custom is not inherited to common properties #498

Closed

chisholm deleted the custom_revamp branch June 12, 2021 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp how STIX content customization is detected and enforced #503

Revamp how STIX content customization is detected and enforced #503

chisholm commented Apr 9, 2021 •

edited by clenk

Loading

clenk commented Apr 14, 2021

emmanvg commented Apr 15, 2021 •

edited

Loading

adulau commented Apr 15, 2021

clenk commented Apr 15, 2021

chisholm commented Apr 15, 2021

codecov-commenter commented Jun 11, 2021 •

edited

Loading

Revamp how STIX content customization is detected and enforced #503

Revamp how STIX content customization is detected and enforced #503

Conversation

chisholm commented Apr 9, 2021 • edited by clenk Loading

clenk commented Apr 14, 2021

emmanvg commented Apr 15, 2021 • edited Loading

adulau commented Apr 15, 2021

clenk commented Apr 15, 2021

chisholm commented Apr 15, 2021

codecov-commenter commented Jun 11, 2021 • edited Loading

Codecov Report

chisholm commented Apr 9, 2021 •

edited by clenk

Loading

emmanvg commented Apr 15, 2021 •

edited

Loading

codecov-commenter commented Jun 11, 2021 •

edited

Loading