-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YAML loading and dumping improvements #62932
Conversation
This PR is marked as draft until I complete the following tasks:
|
3943125
to
c0bc1a7
Compare
I believe this is ready to be reviewed now. |
9951e47
to
de0e1cc
Compare
assert isinstance(dataset["spam"], collections.OrderedDict) | ||
assert isinstance(dataset["spam"]["foo"], collections.OrderedDict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to trigger a full test run on this PR, simply to find out if there are any other tests that may fail due to these changes.
re-run full all |
FWIW I'm on board with this idea! I don't yet know if I'm on board with the implementation - as @MKLeb implied with the full test run, this is a pretty intensive change with (potentially) broad repercussions... hopefully all of them good! But I'm sure someone out there is relying on some backwards funky behavior 🙃 I would love if we could bridge the existing behavior and this with like... a feature flag and deprecation warning, like...
That way folks could opt-in to the new behavior, and if they have a bunch of problems then they can file some bugs and still have a better fallback than just downgrading their salt installation. In any case, I'm sure a lot of us would feel better with an in-depth code review! |
I can do that. One thing: There is already an option for legacy YAML behavior: I'm tempted to make the option a version selector rather than a boolean so that the YAML behavior can be changed again in the future without much fanfare. Maybe something like from salt.version import SaltStackVersion
want_v = __opts__.get("override_yaml_parser_version")
if want_v:
want_v = SaltStackVersion.parse(want_v)
else:
want_v = SaltStackVersion.current_release()
if want_v < SaltStackVersion(3006):
do_old_stuff()
else:
do_this_new_stuff() Note that the new behavior would be the default behavior, so if it breaks a user they would have to take action to switch to the old behavior. Thoughts? |
If we think this change (or a future change) is too risky to make the default right away, we could predate the version and add a warning: if want_v < SaltStackVersion(3007):
if not __opts__.get("override_yaml_parser_version"):
salt.utils.versions.warn_until(3007, "Salt's YAML parsing behavior will change in v3007.0; see xyz for details")
do_old_stuff()
else:
do_this_new_stuff() |
re-run full all |
I set this to draft until all of the tasks in the PR description are done. |
@waynew I'm trying to figure out how to coax the loader to set |
@MKLeb Thanks, though it looks like |
@s0undt3ch Any ideas here? |
Not sure if that's helpful, but you can always manually load Lines 291 to 303 in 327664a
import salt.config
import salt.syspaths
from pathlib import Path
__opts__ = None
def my_func():
global __opts__
if __opts__ is None:
__opts__ = salt.config.minion_config(Path(salt.syspaths.CONFIG_DIR) / "minion")
# in case you need grains in __opts__, see the linked example
# ... do stuff |
This does not change the behavior, but it does simplify the code.
The `salt.loader` module is intentially imported late in `salt.config.call_id_function()` to avoid a circular import error. However, `salt.loader` is also used in `salt.config.mminion_config()`, so import it there as well. This avoids an error if `mminion_config()` is called before `call_id_function()`, or if the import statement in `call_id_function()` is removed in the future.
The same YAML value can be represented in different ways, so tests that compare a generated YAML string with a manually typed string are fragile. Use `salt.utils.yaml` to generate the reference YAML so that the implementation of the YAML dumper can change without breaking the tests.
The delayed `import` statements match `import` statements at the top of the file so they are effectively no-ops.
This commit is a prerequisite to adding version-selected behavior changes.
This will make it possible for users to switch to an older YAML loader/dumper behavior in case a change breaks something. Note that several top-of-file imports were moved into functions to avoid circular import errors.
I believe this was the original intention. Even if it was not, the symmetry with `salt.utils.yaml.save_dump()` makes `dump()` less surprising now, and it makes it easier to introduce representer changes that affect both `dump()` and `safe_dump()`.
Before, the indentation would only be increased if the compiled module `yaml.CSafeDumper` did not exist. Now it is indented even if it does. Behavior before (assuming `yaml.CSafeDumper` exists): ```pycon >>> import salt.utils.yaml as y >>> print(y.dump({"foo": ["bar"]}, Dumper=y.IndentedSafeOrderedDumper, default_flow_style=False)) foo: - bar ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> print(y.dump({"foo": ["bar"]}, Dumper=y.IndentedSafeOrderedDumper, default_flow_style=False)) foo: - bar ```
The `SaltYamlSafeLoader.add_constructor()` method is a class method, not an instance method. Therefore, any call to that method in an instance method (such as `__init__`) affects all future instances, not just the current instance (`self`). That's not a problem if the registration calls simply re-register the same constructor functions over and over, but that was not the case here: different functions were registered depending on the value of the `dictclass` parameter to `__init__`. The net effect was that an `!!omap` node was correctly processed as a sequence of single-entry maps (and a list of (key, value) tuples returned) until the first time a SaltYamlSafeLoader was constructed with a non-`dict` class. After that, every `!!omap` node was always incorrectly processed as a map node regardless of `dictclass`. Now `!!omap` processing is performed as originally intended: * If the `dictclass` parameter is `dict` (the default), the behavior when loading a `!!map` or `!!omap` node is unchanged from the base class's behavior. * If the `dictclass` parameter is not `dict`: * `!!map` nodes are loaded like they are in the base class except the custom class is used instead of `dict`. * `!!omap` nodes are loaded the same as `!!map` nodes. (This is a bug because an `!!omap` node is a sequence node of single-valued map nodes, not an ordinary map node. A future commit will fix this.) Behavior before: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] >>> y.SaltYamlSafeLoader("", dictclass=collections.OrderedDict) # created for side-effect only <salt.utils.yamlloader.SaltYamlSafeLoader object at 0xc31a10> >>> y.load("!!omap [{foo: bar}, {baz: bif}]") # exact same as before Traceback (most recent call last): File "<stdin>", line 1, in <module> File "salt/utils/yamlloader.py", line 159, in load return yaml.load(stream, Loader=Loader) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load return loader.get_single_data() File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "salt/utils/yamlloader.py", line 45, in construct_yaml_map value = self.construct_mapping(node) File "salt/utils/yamlloader.py", line 56, in construct_mapping raise ConstructorError( yaml.constructor.ConstructorError: expected a mapping node, but found sequence in "<unicode string>", line 1, column 1 ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] >>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "salt/utils/yamlloader.py", line 42, in construct_yaml_omap return (yield from self.construct_yaml_map(node)) File "salt/utils/yamlloader.py", line 36, in construct_yaml_map value = self.construct_mapping(node) File "salt/utils/yamlloader.py", line 52, in construct_mapping raise ConstructorError( yaml.constructor.ConstructorError: expected a mapping node, but found sequence in "<unicode string>", line 1, column 1 >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] ``` This commit also adds a unit test for the `dictclass` parameter, though it's important to note that the new test is not a regression test for the bug fixed by this commit. (I don't think it would be worthwhile to write a regression test because the test code would be complicated and unreadable.)
YAML nodes tagged with `!!timestamp` nodes now become Python `datetime` objects rather than strings. To preserve compatibility with existing YAML files, timestamp-like strings without `!!timestamp` are still loaded as strings. Behavior before: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> y.load("'2022-10-21T18:16:03.1-04:00'") '2022-10-21T18:16:03.1-04:00' >>> y.load("!!timestamp '2022-10-21T18:16:03.1-04:00'") '2022-10-21T18:16:03.1-04:00' ``` Behavior after: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> y.load("'2022-10-21T18:16:03.1-04:00'") '2022-10-21T18:16:03.1-04:00' >>> y.load("!!timestamp '2022-10-21T18:16:03.1-04:00'") datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))) ```
Before, `!!omap` nodes were incorrectly assumed to be mapping nodes when `dictclass` was not `dict`. Now they are correctly processed as sequences of single-entry mappings. The resulting Python object has type `dictclass` (usually `collections.OrderedDict` or a subclass). This commit does not change the behavior when `dictclass` is `dict`; loading an `!!omap` node still returns a list of (key, value) tuples (PyYAML's default behavior). I consider that to be a bug in PyYAML, so a future commit may change the behavior in the `dict` case to match the non-`dict` behavior. (This commit uses `dictclass` for the non-`dict` case to match what appears to be the original intention.) Behavior before: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] >>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 60, in construct_document for dummy in generator: File "salt/utils/yamlloader.py", line 42, in construct_yaml_omap return (yield from self.construct_yaml_map(node)) File "salt/utils/yamlloader.py", line 36, in construct_yaml_map value = self.construct_mapping(node) File "salt/utils/yamlloader.py", line 52, in construct_mapping raise ConstructorError( yaml.constructor.ConstructorError: expected a mapping node, but found sequence in "<unicode string>", line 1, column 1 ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] >>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data() OrderedDict([('foo', 'bar'), ('baz', 'bif')]) ``` Relevant bug: saltstack#12161
Now the behavior is consistent regardless of the value of the Loader's `dictclass` constructor parameter. Starting with Python 3.6, Python `dict` objects always iterate in insertion order, so iteration order was already guaranteed when using an ordinary YAML map. However, `!!omap` provides a stronger guarantee to users (plain map iteration order can be thought of as a Salt implementation detail that might change in the future), and it allows them to make their `.sls` files self-documenting. For example, instead of: ```yaml my_pillar_data: key1: val1 key2: val2 ``` users can now do: ```yaml my_pillar_data: !!omap - key1: val1 - key2: val2 ``` to make it clear to readers that the entries are intended to be processed in order. Behavior before: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") [('foo', 'bar'), ('baz', 'bif')] >>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data() OrderedDict([('foo', 'bar'), ('baz', 'bif')]) ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> import collections >>> y.load("!!omap [{foo: bar}, {baz: bif}]") OrderedDict([('foo', 'bar'), ('baz', 'bif')]) >>> y.SaltYamlSafeLoader("!!omap [{foo: bar}, {baz: bif}]", dictclass=collections.OrderedDict).get_single_data() OrderedDict([('foo', 'bar'), ('baz', 'bif')]) ``` Relevant bug: saltstack#12161
My main motivation for adding this is to facilitate testing, though it also gives module authors greater flexibility (in particular: a `tuple` can be a `dict` key, but a `list` cannot because it is not hashable), and I don't see a strong reason why this shouldn't be added. Behavior before: ```pycon >>> import salt.utils.yaml as y >>> y.load("!!python/tuple [foo, bar]") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "salt/utils/yamlloader.py", line 159, in load return yaml.load(stream, Loader=Loader) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load return loader.get_single_data() File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data return self.construct_document(node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 55, in construct_document data = self.construct_object(node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 100, in construct_object data = constructor(self, node) File "venv/lib/python3.8/site-packages/yaml/constructor.py", line 427, in construct_undefined raise ConstructorError(None, None, yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple' in "<unicode string>", line 1, column 1 ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> y.load("!!python/tuple [foo, bar]") ('foo', 'bar') ```
Override the implicit resolver for timestamp strings so that `datetime.datetime` objects are always written with an accompaying `!!timestamp` tag. This is a behavior change: Any users that depend on a `datetime` becoming a plain YAML string (which would be read in as a `str` object) must convert the object to string themselves. This change preserves the semantics of the object, and it preserves round-trip identity. Behavior before: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> print(y.dump(datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(hours=-4))))) 2022-10-21 18:16:03.100000-04:00 ``` Behavior after: ```pycon >>> import datetime >>> import salt.utils.yaml as y >>> print(y.dump(datetime.datetime(2022, 10, 21, 18, 16, 3, 100000, tzinfo=datetime.timezone(datetime.timedelta(hours=-4))))) !!timestamp 2022-10-21 18:16:03.100000-04:00 ```
Behavior before: ```pycon >>> from salt.utils.odict import OrderedDict >>> import salt.utils.yaml as y >>> import collections >>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) !!python/object/apply:collections.OrderedDict - - - foo - bar >>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) NULL ``` Behavior after: ```pycon >>> from salt.utils.odict import OrderedDict >>> import salt.utils.yaml as y >>> import collections >>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar ```
Before, an `OrderedDict` object was represented as a plain YAML mapping. Now it is represented as an `!!omap` node, which is a sequence of single-valued mappings. This is a behavior change: Any users that depend on an `OrderedDict` becoming a plain YAML mapping (which would be read in as a `dict` object) must first convert the `OrderedDict` to `dict`. This change preserves the semantics of the object, and it preserves round-trip identity. Behavior before: ```pycon >>> from salt.utils.odict import OrderedDict >>> import salt.utils.yaml as y >>> import collections >>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar >>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) foo: bar ``` Behavior after: ```pycon >>> from salt.utils.odict import OrderedDict >>> import salt.utils.yaml as y >>> import collections >>> print(y.dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) !!omap - foo: bar >>> print(y.safe_dump(OrderedDict([("foo", "bar")]), default_flow_style=False)) !!omap - foo: bar >>> print(y.dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) !!omap - foo: bar >>> print(y.safe_dump(collections.OrderedDict([("foo", "bar")]), default_flow_style=False)) !!omap - foo: bar ```
Before, `!!python/tuple` was only used for `OrderedDumper`. Now it is also used for `SafeOrderedDumper` and `IndentedSafeOrderedDumper`. This is a behavior change: Any users that depended on a `tuple` becoming a plain YAML sequence (which would be read in as a `list` object) must first convert the `tuple` to `list`. This change preserves the semantics of the object, and it preserves round-trip identity. Preserving round-trip identity is particularly important if the tuple is used as a key in a `dict` because `list` objects are not hashable. Behavior before: ```pycon >>> import salt.utils.yaml as y >>> print(y.dump(("foo", "bar"), default_flow_style=False)) !!python/tuple - foo - bar >>> print(y.safe_dump(("foo", "bar"), default_flow_style=False)) - foo - bar ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> print(y.dump(("foo", "bar"), default_flow_style=False)) !!python/tuple - foo - bar >>> print(y.safe_dump(("foo", "bar"), default_flow_style=False)) !!python/tuple - foo - bar ```
Failure to represent an object is a bug so it should be noisy, not quietly produce `NULL`. The representer this commit disables was added in commit 2e2ce5f. Behavior before: ```pycon >>> import salt.utils.yaml as y >>> y.safe_dump(type("GeneratedObjectSubclass", (), {})) 'NULL\n' ``` Behavior after: ```pycon >>> import salt.utils.yaml as y >>> y.safe_dump(type("GeneratedObjectSubclass", (), {})) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "salt/utils/yamldumper.py", line 270, in safe_dump return dump(data, stream, Dumper=SafeOrderedDumper, **kwargs) File "salt/utils/yamldumper.py", line 261, in dump return yaml.dump(data, stream, **kwargs) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 253, in dump return dump_all([data], stream, Dumper=Dumper, **kwds) File "venv/lib/python3.8/site-packages/yaml/__init__.py", line 241, in dump_all dumper.represent(data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 27, in represent node = self.represent_data(data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 58, in represent_data node = self.yaml_representers[None](self, data) File "venv/lib/python3.8/site-packages/yaml/representer.py", line 231, in represent_undefined raise RepresenterError("cannot represent an object", data) yaml.representer.RepresenterError: ('cannot represent an object', <class '__main__.GeneratedObjectSubclass'>) ```
Closing this due to inactivity. Anyone should feel free to re-open it if they want to see it through to the end in one release cycle. |
What does this PR do?
Introduces a new
yaml_compatibility
option that controls the behavior of YAML loading/dumping. If a user wants to revert to the YAML behavior from a previous version of Salt, or preview upcoming changes to YAML behavior, they can set it to the desired version of Salt. For example, settingyaml_compatibility
to 3006 will pin the current behavior, and setting it to 3007 allows the user to preview the behavior that will become the default in Salt v3007.In addition, the following behavior changes take effect immediately:
dict
class to thesalt.utils.yaml.SaltYamlSafeLoader
constructor no longer causes all future!!omap
nodes to raise an exception when loaded.!!python/tuple
is now supported, and produces a Pythontuple
object. Before, such nodes would raise an exception. (This change does not break backwards compatibility, which is why it takes effect immediately.)After v3006 is released (
salt.version.SaltVersionsInfo.SULFUR.released
is changed toTrue
), the following behavior changes will automatically take effect. These can be previewed now by settingyaml_compatibility
to 3007.salt.utils.yaml.dump()
will default tosalt.utils.yaml.OrderedDumper
rather than PyYAML'syaml.Dumper
class. (I believe this was the original intention, and nowdump()
's behavior is symmetric withsafe_dump()
).salt.utils.yaml.IndentedSafeOrderedDumper
list indentation (was broken only whenyaml.CSafeDumper
exists).!!timestamp
node will produces a Pythondatetime.datetime
object (currently they are always loaded as strings).!!omap
nodes (currently they are sometimes incorrectly parsed as mapping nodes rather than sequence nodes containing mapping nodes).!!omap
node will always produce acollections.OrderedDict
object. Currently, it sometimes produces a list of (key, value) tuples and sometimes raises an exception.datetime.datetime
object to YAML now explicitly tags the node with!!timestamp
.collections.OrderedDict
object to YAML now consistently produces a YAML!!omap
node, which is a sequence of single-valued mappings. Before, the behavior varied depending on the specific object type (collections.OrderedDict
vs.salt.utils.odict.OrderedDict
) and the dump function (salt.utils.yaml.dump()
vs.salt.utils.yaml.safe_dump()
).tuple
object to YAML now consistently produces a YAML sequence node tagged with!!python/tuple
. Before, onlysalt.utils.yaml.dump()
would tag the node with!!python/tuple
;salt.utils.yaml.safe_dump()
did not.NULL
node.Clean-ups and code health:
salt.utils.yaml
unit tests to pytest.What issues does this PR fix or reference?
#12161 (can be closed even without this PR because the bug no longer exists, although this PR adds a regression test in the form of an integration test)
Behavior Changes
IndentedSafeOrderedDumper
:Before (assuming
yaml.CSafeDumper
exists):After:
This fix required a switch to the pure Python
yaml.SafeDumper
class (forIndentedSafeOrderedDumper
only), which will be slower thanyaml.CSafeDumper
.Loading YAML
!!omap
nodes to Python:Before:
After:
Loading YAML
!!timestamp
nodes to Python:Before:
After:
Loading YAML
!!python/tuple
nodes to Python:Before:
After:
Dumping Python
datetime.datetime
objects to YAML:Before:
After:
Dumping Python
OrderedDict
objects to YAML:Before:
After:
Dumping Python
tuple
objects to YAML:Before:
After:
Dumping unsupported objects:
Behavior before:
Behavior after:
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
Commits signed with GPG?
No
Tasks