Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyYAML 4.1 changes "safe" in more ways than immediately obvious #187

Closed
cdent opened this issue Jun 28, 2018 · 6 comments
Closed

PyYAML 4.1 changes "safe" in more ways than immediately obvious #187

cdent opened this issue Jun 28, 2018 · 6 comments

Comments

@cdent
Copy link

cdent commented Jun 28, 2018

It's clear from the changelog that pyyaml 4.x now defaults to a style of safe_load for loading. This is probably good. However, it also appears that the definition of "safe" has changed somewhat.

In cdent/gabbi#252 the 'safe' related tests work differently depending on whether >4 or <4 is used. As currently written they pass with 4 and fail with 3. The difference seems to be that "safe" in 3 and 4 mean different things:

  • in 4, safe will load custom tags that are defined in the same process, but not python/object, and unsafe will not load custom tags, but will load python/object
  • in 3, custom tags only load in unsafe, and python/object, neither in safe

The branch on that pull request can demonstrate the problem with different PyYAML versions. And master in the same repo will as well.

However, I have no confidence that I'm parsing what's going on properly at all, so I need to come up with a minimal test case, which I'll try to do real soon, but I first wanted to get this written down in case there is something obviously wrong in either my code or in PyYAML.

I will followup to this with the MTC, ASAP. Sorry for dropping noise like this, but needed to dump state.

@cdent
Copy link
Author

cdent commented Jun 28, 2018

Here's a gist which demonstrates the different behaviors: https://gist.github.com/cdent/130181f02f7cb2737f81fef2ebffc8d3

@cdent
Copy link
Author

cdent commented Jun 28, 2018

To clarify the issues here there are basically two separate problems:

  • In 4.1 danger_load (or python_load in the referenced ingy path) should not be failing when trying to use !IsNAN and !NanChecker. It should load everything. This is a pretty clear cut bug, it seems.

  • In 4.1 safe_load (and now load which is the same) the definition of safe has changed to "things which I have defined myself are safe". This is a) different from 3.12, b) potentially harder to reason about than the behavior in 3.12 which translated to something like "don't load stuff that is code". One could argue that the new behavior is more mature, and more in keeping what someone wants to be able to achieve with "safe" but it is a change that needs to be highlighted.

@prisaparoy
Copy link

This is a problem for me as well
safe load working for custom tags, but not python objects:
eg:

>>> s="""
... redshift-snapshot: !cl_pyspark.connector.redshift
...     host:     redshift-prod.myapp.net
...     user:     readonly
...     password: password
...     database: snowplow
...     s3_temp:  s3a://temp-space
... """
>>> yaml.load(s)
{'redshift-snapshot': RedshiftConnector[readonly@redshift-prod.myapp.net:5439/snowplow, s3_temp=s3a://temp-space/]}
>>>

s="""
... schema: !!python/object:pyspark.sql.types.StructType
...         fields:
...           - !!python/object:pyspark.sql.types.StructField { name: from_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
...           - !!python/object:pyspark.sql.types.StructField { name: to_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
... """
>>> yaml.load(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 74, in load
    return loader.get_single_data()
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 133, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 414, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pyspark.sql.types.StructType'
  in "<string>", line 2, column 9:
    schema: !!python/object:pyspark.sql.type ...
^


danger load works for python objects, but not for custom tags

s="""
... schema: !!python/object:pyspark.sql.types.StructType
...         fields:
...           - !!python/object:pyspark.sql.types.StructField { name: from_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
...           - !!python/object:pyspark.sql.types.StructField { name: to_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
... """

>>> yaml.danger_load(s)
{'schema': StructType(List(StructField(from_id,StringType,false),StructField(to_id,StringType,false)))}
>>>


>>> s="""
... redshift-snapshot: !cl_pyspark.connector.redshift
...     host:     redshift-prod.myapp.net
...     user:     readonly
...     password: password
...     database: snowplow
...     s3_temp:  s3a://temp-space/
... """
>>> yaml.danger_load(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 101, in danger_load
    return load(stream, DangerLoader)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 74, in load
    return loader.get_single_data()
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 133, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 414, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!cl_pyspark.connector.redshift'
  in "<string>", line 2, column 20:
    redshift-snapshot: !cl_pyspark.connector.redshift
                       ^

@perlpunk
Copy link
Member

So I have been experimenting a bit with PythonLoader to adress the first issue, @ingydotnet joined.

To make this work with the current code, one would need to set the loader explicitly. @cdent, I took your gist, and if I change it like this, everything works like expected:

class NanChecker(YAMLObject):
    yaml_tag = u'!NanChecker'
    yaml_loader = PythonLoader
...
add_constructor(u'!IsNAN', lambda loader, node: NanChecker(), PythonLoader)

Working on release/4.2 branch.

@perlpunk
Copy link
Member

perlpunk commented Jun 30, 2018

I wrote up an example of the different behaviour of safe_load and SafeLoader.
This needs to be fixed, because safe_load can be quite unsafe now.

# examples/safe.py
import yaml
import os

# we add a constructor that is potentially dangerous
def myconstructor(loader, node):
    value = loader.construct_scalar(node)
    os.system("echo hello " + value)
    return value

yaml.add_constructor(u'!hello', myconstructor)

string = "msg: !hello world"

print("============ Loading trusted YAML")

data = yaml.load(string)
print(data)

# somewhere else in your application
unsafe_yaml = "msg: !hello evil"

print("============ Loading untrusted YAML")
try:
    safe_data = yaml.safe_load(unsafe_yaml)
    print(safe_data)
except yaml.constructor.ConstructorError as err:
    print("Could not load unsafe_yaml:")
    print(format(err))

% git co d856c20 # before #74
% python examples/safe.py
hello world
============ Loading trusted YAML
{'msg': u'world'}
============ Loading untrusted YAML
Could not load unsafe_yaml:
could not determine a constructor for the tag '!hello'
  in "<string>", line 1, column 6:
    msg: !hello evil
         ^

% git checkout master
% python examples/safe.py
hello world
hello evil
============ Loading trusted YAML
{'msg': u'world'}
============ Loading untrusted YAML
{'msg': u'evil'}

@ingydotnet
Copy link
Member

tanaypf9 pushed a commit to tanaypf9/pf9-requirements that referenced this issue May 20, 2024
Patch Set 1: Code-Review-1

There are some other issues with 4.x as well, see yaml/pyyaml#187

So I reckon we need to hold on this for at least a while.

Patch-set: 1
Reviewer: Gerrit User 11564 <11564@4a232e18-c5a9-48ee-94c0-e04e7cca6543>
Label: Code-Review=-1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants