Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp how STIX content customization is detected and enforced #503

Merged
merged 21 commits into from
Jun 11, 2021

Conversation

chisholm
Copy link
Contributor

@chisholm chisholm commented Apr 9, 2021

This PR shows work done to improve the way customization is handled. One of the main changes is that STIX objects no longer maintain an internal _allow_custom flag. In order to enforce that there is no customization (when allow_custom=False), one needs to know if customization is present. Just because customizations were deemed okay at construction time (allow_custom=True) does not mean they actually exist. So instead, objects are given a has_custom property, which directly addresses the question and makes customization detection in sub-objects possible.

The responsibility for both enforcing customization preferences and detecting the presence of customizations lies with the property clean() methods. Before, those methods could not easily do their enforcement job because they were not told what the allow_custom preference was. There was some very ugly hackage in places where that became a problem, which is now removed. In this PR, the clean() methods are now passed the allow_custom preference. They also now return not only the cleaned value, but also a boolean flag indicating whether customization was detected (which of course needs to be False if customization was not allowed!).

Enforcement has been tightened up in certain places. This will catch customization in places where it didn't before, and in fact revealed some unit test bugs which have been fixed. For example, open-vocabs are now properly enforced. There is a new OpenVocabProperty class designed specifically for this, and STIX object classes have been changed to use it. All of the enum and vocab values from the STIX spec have been added to special vocab modules for both 2.0 and 2.1; they are the enforcement criteria now. The difference between an enum and open vocab is that no customization is ever allowed with enums, as that would result in non-compliant content; open vocabs admit values not listed in the spec, which are considered a customization and will be enforced accordingly. Another example is that a customization in an extension will in all cases be caught and cause an error if allow_custom=False in the parent object. The state of "customization" ought to always propagate upward. An object has been customized if custom content occurs anywhere in the object.

Code dealing with hashes has been moved to a new module to remove clutter from the properties module. Hash types are now properly enforced for each spec version, which are different in 2.0 and 2.1. It supports some flexibility in how hash algorithms are named in user content, while ensuring that serialized output uses the spec-defined spelling/casing (which also differs between STIX 2.0 and 2.1).

Public API has mostly not changed, but user code which uses non-public APIs might break (e.g. if it called a clean() method). Some property constructor args changed (e.g. allow_custom was removed since it doesn't make sense there). Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content.

This PR includes changes which would address issue #501 . The way in which ReferenceProperty does customization enforcement is a bit more sophisticated. When a whitelist constraint on generic STIX type categories is given for this property and allow_custom=True during cleaning, the whitelist is automatically converted to a blacklist on the complementary categories. This effectively weakens the criteria with respect to unregistered custom types: it ensures that types known to be incorrect are not given, as opposed to ensuring that types known to be correct are given. An unregistered custom type is not known to be incorrect (the library knows nothing about it if it has not been registered), so it will satisfy the weakened criteria and be allowed.

Fixes #333.
Fixes #433.
Fixes #498.
Fixes #501.

chisholm added 16 commits March 31, 2021 15:21
thought it should, but forgot to fix Report to use
ReferenceProperty in the way I thought it should!  Oops.
Added some tests to ensure Report is working property with
custom ID types in object_refs.
work when a whitelist of generic category types is used.
Disallow hybrid constraints (both generic and specific at the
same time).  Add more unit tests.
to satisfy type constraint: empty whitelist.  It would be silly
for anyone to do that, but I should check just in case I guess.
    assert prop.clean(...)

doesn't test well enough since clean() methods on this branch
produce 2-tuples, and you should test what's in the tuple, not
just that it returned something non-empty.  So I fixed it in
several places to test the tuple contents.
constraints (i.e. both generic type categories and specific
types).  Also:
- more expansion/refinement of reference property unit tests
- bugfix: SROs are in OBJ_MAP too, it's not just SDOs!  Oops...
- pre-commit stylistic fixes
slipped through due to a rather complex rebase... not too
surprising I missed it.
a new OpenVocabProperty class.  It also requires a redesign of
HashesProperty and redoes general library support for hash
algorithms.
to check key lengths.  Added some unit tests for hash keys.
Also added a library hash support test module I'd forgotten to
add before.
various STIX objects.  Also pre-commit stylistic fixes...
glitch?  Something else I overlooked?  Not sure.
available after the rebase.  This simplifies the implementation.
Also made utils.to_enum() a module public function, since I
needed to use it outside that module.  Misc pre-commit stylistic
fixes.
objects, to structure object_refs type requirements as an empty
blacklist, instead of a whitelist.  I think it was originally
necessary due to the older implementation of ReferenceProperty
which was in place at the time.  With the implementation change
to invert whitelists with generics to blacklists when
allow_custom=True, a "full" whitelist is internally converted to
an empty blacklist anyway, so it winds up being the same thing.
But I think the full whitelist looks better in the code, so I
prefer that to the empty blacklist.
the git action uses a more recent version of pre-commit than
travis did, and the newer pre-commit requires different things
than the old one did.
stix2/v20/vocab.py Outdated Show resolved Hide resolved
stix2/base.py Show resolved Hide resolved
@clenk
Copy link
Contributor

clenk commented Apr 14, 2021

As we discussed elsewhere, let's change this PR to allow custom open vocab values again for now. In a future version we'll work out more granular customization settings to support checking them.

properties are now always accepted and not considered
customizations (has_custom will be False).  Just commented out
the enforcement code and xfail'd the unit tests, which makes it
easy to reinstate these behaviors later.  It was decided this
is too likely to break user code, and we don't currently have
a way to disallow customizations in other places while still
allowing custom open vocab values.  Safest to always allow
custom open vocab values, for now.
@emmanvg
Copy link
Contributor

emmanvg commented Apr 15, 2021

The other general comment was to inspect the documentation/ipynb guides to check for any content that needs updating based on these changes.

@adulau
Copy link
Contributor

adulau commented Apr 15, 2021

Customization enforcement behavior has changed, so code/content which worked before might now trigger errors about custom content. What's the background for such change in the reference implementation? Can you point me to the standard where this is a required change?

@emmanvg emmanvg added this to the 3.0.0 milestone Apr 15, 2021
@clenk
Copy link
Contributor

clenk commented Apr 15, 2021

@adulau the reason for these changes is not anything in the standard. There have been inconsistencies with this library's handling of custom content that has led to several issues, such as #501, #498, #433, #333. That's what we're trying to address here. But as part of that, some things have changed, such as the parameters and return values of the clean() method. Some content that gave custom content errors before in a nested object or reference properties are now allowed.

@chisholm
Copy link
Contributor Author

Having an allow_custom config setting was a software design decision, not a requirement from the specification. It was thought to be desirable to allow users to cause the library to produce errors when custom content is encountered, when they want to prevent it. It can also be useful for catching typos and mistakes. That design decision is much older than this PR, and has been in the library since the beginning as far as I know. This PR is about improving how that system works.

I mentioned in the PR summary that it could cause new errors, which sounds alarming; maybe I should also have been clearer that it can resolve errors too :) The last paragraph describes one such case.

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #503 (5cce864) into master (2743b90) will decrease coverage by 2.34%.
The diff coverage is 58.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #503      +/-   ##
==========================================
- Coverage   89.47%   87.12%   -2.35%     
==========================================
  Files         147      151       +4     
  Lines       16599    17702    +1103     
==========================================
+ Hits        14852    15423     +571     
- Misses       1747     2279     +532     
Impacted Files Coverage Δ
stix2/test/test_workbench.py 100.00% <ø> (ø)
stix2/test/v20/conftest.py 100.00% <ø> (ø)
stix2/test/v20/test_datastore_filters.py 100.00% <ø> (ø)
stix2/test/v20/test_datastore_memory.py 100.00% <ø> (ø)
stix2/test/v20/test_pickle.py 100.00% <ø> (ø)
stix2/test/v20/test_versioning.py 100.00% <ø> (ø)
stix2/test/v21/conftest.py 100.00% <ø> (ø)
stix2/test/v21/test_datastore_filters.py 100.00% <ø> (ø)
stix2/test/v21/test_datastore_memory.py 100.00% <ø> (ø)
stix2/test/v21/test_indicator.py 100.00% <ø> (ø)
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2743b90...5cce864. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants