Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/xml: round-trip safe mode [freeze exception] #43168

Closed
FiloSottile opened this issue Dec 14, 2020 · 26 comments
Closed

proposal: encoding/xml: round-trip safe mode [freeze exception] #43168

FiloSottile opened this issue Dec 14, 2020 · 26 comments

Comments

@FiloSottile
Copy link
Contributor

FiloSottile commented Dec 14, 2020

Background

Juho Nurminen of Mattermost reported multiple parsing issues in encoding/xml that lead to round-trip mismatches: parsing the output of encoding a list of tokens does not produce the same list of tokens.

Some of those issues are fixed by CLs linked to this issue, but encoding/xml does not provide such a security property, was not designed to, and we don’t believe that fixing the issues we know about is sufficient to guarantee it (nor that the round-trip guarantee is necessarily sufficient for all applications).

Indeed, the investigation kept surfacing different issues impacting round-trip stability, and we are aware of at least one variant affecting the Token API that is neither fixed by an open CL nor detected by the researchers’ validator (which we are told only aims to protect RawToken uses).

Unfortunately, XML-DSig and SAML implementations came to rely on round-trip stability, if not on perfect spec alignment, for their security. This is not entirely surprising, because XML-DSig and SAML are fundamentally fragile designs, but they are also critical enough in the ecosystem that we should try to find a way to let them operate safely.

Regardless of the outcome of this proposal, applications should avoid relying on complex parsers agreeing on how to interpret inputs for their security when possible, and should minimize transfers and modifications between the time of data validation and the time of use.

We’d like to thank Mattermost for reporting these issues, for working with us through the investigation, and for coordinating the disclosure of the issues to the downstream libraries so they could mitigate in advance.

If you use an affected SAML library, please refer to the following security advisories:

Proposal

Hopefully, applications need this kind of security property only when doing limited modifications of a document (for example, extracting a part to forward). If that’s the case, we can try to find a limited subset of the encoding/xml functionality for which we can effectively test that the outputs parse unambiguously.

(We tried fuzzing the full featureset, but the fuzzer found a lot of round-trip changes which are semantically correct, but hard to validate automatically, and changing those behaviors would probably be a breaking change.)

As a first suggestion, I propose adding an IgnoreNamespaces field to Decoder, which when set disables all namespace processing. Names with colons are not split and are copied whole into Name.Local.

We have a prototype of this and it survived extensive fuzzing (with the only caveat that CDATA sections end up merged with surrounding text, as the encoding/xml data structures don’t preserve escaping information).

Open questions

Is this API actually usable by the applications that need it?

Is round-trip stability sufficient, or do all those applications actually need perfect alignment with the spec on how the output is parsed? In other words, if the output is generated by Go but parsed by libxml and they disagree on the list of tokens, is that an issue? If yes, this is a much harder property to provide. HTTP/1 parsers have been struggling with it for decades, as shown by the constant stream of request smuggling vulnerabilities across the ecosystem, and HTTP/1 is arguably a much simpler protocol.

@golang golang locked and limited conversation to collaborators Dec 14, 2020
@FiloSottile FiloSottile changed the title [reserved] proposal: encoding/xml: round-trip safe mode [freeze exception] Dec 14, 2020
@gopherbot gopherbot added this to the Proposal milestone Dec 14, 2020
@golang golang unlocked this conversation Dec 14, 2020
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/277893 mentions this issue: encoding/xml: replace comments inside directives with a space

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/277892 mentions this issue: encoding/xml: handle leading, trailing, or double colons in names

@josharian
Copy link
Contributor

If the goal is extracting a subset of the larger message, that could be obtained with complete confidence by exposing a way to get the byte range corresponding to an element. Then you’d parse, discover the slice that contains the part you need, and slice the original doc for the new content.

@DemiMarie
Copy link
Contributor

Is round-trip stability sufficient, or do all those applications actually need perfect alignment with the spec on how the output is parsed? In other words, if the output is generated by Go but parsed by libxml and they disagree on the list of tokens, is that an issue? If yes, this is a much harder property to provide. HTTP/1 parsers have been struggling with it for decades, as shown by the constant stream of request smuggling vulnerabilities across the ecosystem, and HTTP/1 is arguably a much simpler protocol.

At the very least, Go needs to preserve round-trip stability while processing namespaces correctly. Mattermost’s blog post agrees with me on this: SAML requires correct namespace processing. Furthermore, at least some of the vulnerabilities are due to accepting invalid XML, which encoding/xml simply should not do. If this cannot be changed because of the Go 1 stability guarantee, then the old functions should be deprecated.

We can guarantee round-tripping of arbitrarily complex XML by having encoding/xml produce tokens that include the bytes from which they were parsed. Serializing these tokens would therefore be merely a matter of concatenation. This is a variant of what @josharian suggested.

@keshlam
Copy link

keshlam commented Dec 15, 2020

Re "invalid XML" -- Please be careful about terminology. XML validity is defined against schemas, and there are many legitimate use cases for processing XML which has not been validated. If the library is accepting ill-formed XML without complaint, or not detecting invalid XML when validation has been requested and a schema provided, that's a problem.

I would recommend trying to avoid adding cost to the process except when actually needed. Offering the option of byte-for-byte roundtripping may make sense; making every user pay the overhead for that probably does not.

Note that XML-DSig 2.0 explicitly references XML Canonicalization as a way to resolve some of the fragilities. I believe that even when XML-DSig 1.0 was announced, the community was very aware that it was stable only for strictly canonicalized documents.

@DemiMarie
Copy link
Contributor

Re "invalid XML" -- Please be careful about terminology. XML validity is defined against schemas, and there are many legitimate use cases for processing XML which has not been validated. If the library is accepting ill-formed XML without complaint, or not detecting invalid XML when validation has been requested and a schema provided, that's a problem.

“invalid” was the terminology used in the CL that changed handling of incorrectly placed colons. I believe that XML is in fact ill-formed.

I would recommend trying to avoid adding cost to the process except when actually needed. Offering the option of byte-for-byte roundtripping may make sense; making every user pay the overhead for that probably does not.

This needs to be clear in the documentation, with explicit examples that can be used for test-cases. I believe we should guarantee that if XML is serialized and then parsed, it produces the same token stream.

Note that XML-DSig 2.0 explicitly references XML Canonicalization as a way to resolve some of the fragilities. I believe that
even when XML-DSig 1.0 was announced, the community was very aware that it was stable only for strictly canonicalized documents.

Can encoding/xml guarantee that canonicalized XML does round-trip successfully? encoding/xml should also include canonicalization routines.

@keshlam
Copy link

keshlam commented Dec 15, 2020

In SAX, at least, same token string is explicitly not guaranteed. Outputting text content, for example, can occur in chunks of whatever size is convenient for the program; reading them back in will occur in chunks of whatever sizes happen to naturally fit the parser's buffer size and the data's alignment with that. If you want token-level standardization from SAX, you need a canonicalizer layer as part of the parse operation.

Similarly, a DOM parser may or may not retain the distinction between text nodes and CDATA nodes. That's a syntactic feature, not a semantic one. And while a parser typically does deliver contiguous text as a single text node, the DOM model is perfectly happy with consecutive text nodes and that detail is not retained during serialization.

XML is defined in terms of the XML Data Model, not a token stream. A difference which makes no semantic difference should make no semantic difference.

@DemiMarie
Copy link
Contributor

In that case, we should include canonicalization in encoding/xml.

@FiloSottile
Copy link
Contributor Author

What we are trying to figure out is if a subset of the encoding/xml functionality (for example, just a namespace-naive tokenizer) would be useful for SAML implementations (and other applications that need more security properties than we provide), so we can upgrade the security guarantees for that subset.

If SAML and XML-DSig need a superset of not only the security properties, but also of the features of encoding/xml, then it sounds like they definitely can't and shouldn't be implemented on top of encoding/xml.

The feedback I am getting from other discussion venues is that XML-DSig might be complex enough to deserve a dedicated XML tokenizer implementation. Of the three affected libraries, one is discussing deprecating SAML support (dexidp/dex#1884), and the maintainer of the other is saying that they don't have confidence in the protocol safety.

If the conclusion is that a safe SAML implementation needs a new library that is either a substantial extension (as opposed to a subset) or a breaking change of encoding/xml, that should be (or at least start as) a third-party module. Implementing canonicalization and offering strict security guarantees for it is a major project, not a security fix.

/cc the maintainers of some major involved downstreams, although we want to hear from anyone who has code that relies on security properties that encoding/xml does not provide: @ericchiang @russellhaering @crewjam @beevik

@timbray
Copy link

timbray commented Dec 15, 2020

Hi, I've written a successful XML parser (in Java) and am also the idiot who someone convinced (20+ years ago) that Namespaces were a good idea and edited that spec. I'm also somewhat Go-competent. I'm pretty sure that something that took a blob of bytes that claimed to be XML, puked on anything not well-formed, and got the namespace semantics right, would make SAML & DSIG happy - although I don't know their Go implementations & thus their API requirements.

It seems plausible that fixing this would, per Go culture, require new APIs. BTW the same thing happened in Ruby, there was a native implementation that was severely broken and then someone wrapped one of the C parsers in a new set of APIs which is what everyone uses now.

Anyhow, purpose of this note is to volunteer to help. Not sure I'm up for doing another XML parser from scratch unless someone's willing to pay for it, but I think I understand the issues.

@tqbf
Copy link

tqbf commented Dec 15, 2020

Just to chime in, as another Golang SAML implementor and assessor:

I'm alarmed at the idea of trying to bend encoding/xml into part of a SAML/DSIG stack. It was quickly apparent to me when I implemented my SAML IdP that encoding/xml's namespacing support wasn't suitable for DSIG, which is heavily dependent on namespacing, and that its output didn't give me enough control to do canonicalized output.

I've reached the conclusion that for SAML, you really want a defensively-written purpose-built minimal (as minimal as you can get with DSIG) XML; as Advent of Code is proving, I'm not an especially excellent programmer, but the XML parsing required to do SAML was just a few hundred lines of code, and it's intimately woven into DSIG c14n, which is code you have to write anyways to do DSIG.

DSIG is quite probably the scariest cryptography standard in common use, scarier even than X.509, and if the Go standard library provides tools for DSIG like c14n (a source of security bugs in other systems!), the standard library maintainers own the security implications for that.

This makes much more sense as a third-party package, one that doesn't have to make any compromises with the other uses XML is put to.

@DemiMarie
Copy link
Contributor

@tqbf (a) love your username, (b) is your code open source?

@keshlam
Copy link

keshlam commented Dec 16, 2020 via email

@rsc
Copy link
Contributor

rsc commented Dec 16, 2020

Thanks for jumping in, @timbray. Can you say a little more about:

I'm pretty sure that something that took a blob of bytes that claimed to be XML, puked on anything not well-formed, and got the namespace semantics right, would make SAML & DSIG happy - although I don't know their Go implementations & thus their API requirements.

What does "got the namespace semantics right" mean to you? The current encoding/xml tokenizer tried to do this, making it so that readers would see the "actual namespace semantics" and not the raw syntax details like which shortening prefixes were being used. To take the W3C example, given <bk:book xmlns:bk='urn:loc.gov:books'>, the current tokenizer passes up an element with xml.Name{Space: "urn:loc.gov:books", Local: "book"}. You can't even see the bk, which of course can be written using any other variable name and is supposed to have the same meaning. Is this what you meant by "got the namespace semantics right", or something else?

I made two mistakes in the tokenizer, both fixable without new API:

  1. The tokenizer parses <:foo> the same as <foo>, making it roundtrip to <foo>. That's easy to reject or preserve instead.
  2. The tokenizer doesn't reject elements with multiple namespaces, like <book xmlns="something" xmlns="something-else">. I think we can reject those.

The IgnoreNamespaces proposal in this issue goes much further, passing up bk:book instead. I am honestly not convinced that's right, because programs will then hard-code bk and end up mishandling namespaces. It seems like if a SAML library wanted to use this, it would then have to reimplement the namespace tracking to get the semantics right. And I think many use cases would use IgnoreNamespaces rather than do the right thing with the Name.Space URLs, leading to more incorrect processing of XML than correct processing.

But again, I'm interested to hear what the experts think about what "right" means. Thanks.

I am also fully on board with the comments above that putting encoding/xml inside your security perimeter is a mistake, so I'm a bit reluctant to make changes that are only useful for SAML libraries. But again it seems like even for SAML, IgnoreNamespaces is a mistake.

@rogpeppe
Copy link
Contributor

FWIW a few years ago I fixed a bunch of namespace issues in the encoding/xml package. There are indeed many broken things - at least there were when I last looked. The fixes were merged into the standard library but broke some tests inside Google, so the changes were reverted pending a proper proposal.

If I'd ever had cause to use XML since then, I'd probably have made the effort to go through the whole thing again, but I haven't, I'm afraid :)

A fork of the fixed package still lives at github.com/juju/xml if anyone is interested to dig it out again. The main motivation I originally had for the fixes was to be able to correctly round-trip XML.

@timbray
Copy link

timbray commented Dec 16, 2020

@rsc - Your description of the semantics with "bk" and urn:loc.gov:books sounds right to me. IgnoreNamespaces sounds completely crazy to me. You might not like the namespace design (lots of people don't) but you can't ignore them if an XML doc has them.

I'd think that a namespace-sensitive API being used to process an XML doc should return every element & attribute name as a (namespace, name) pair, with the namespace either being the URI or nil.

But I actually haven't yet understood what the bug is, they just say "instability".

@keshlam
Copy link

keshlam commented Dec 17, 2020 via email

@rogpeppe
Copy link
Contributor

For the record, here are some of the CLs that fixed some of the problems that are in encoding/xml (they were all reverted later):

https://go-review.googlesource.com/c/go/+/2660/
https://go-review.googlesource.com/c/go/+/5910/
https://go-review.googlesource.com/c/go/+/6927/
https://go-review.googlesource.com/c/go/+/11635/

With those changes applied, namespace prefixes were not preserved on round-tripping (I think that would be very hard to do while preserving backward compatibility), but namespace semantics were.

@keshlam
Copy link

keshlam commented Dec 29, 2020

  1. The tokenizer parses <:foo> the same as <foo>, making it roundtrip to <foo>. That's easy to reject or preserve instead.
  2. The tokenizer doesn't reject elements with multiple namespaces, like <book xmlns="something" xmlns="something-else">.
    I think we can reject those.

Anything claiming to be an XML parser should reject both.

The core XML syntax says you can't have multiple attributes on an element with the same name, and syntactically XML namespace declarations are attributes. This would be an ill-formed document, and the author should be made to fix it. (https://www.w3.org/TR/xml/#uniqattspec)

XML reserved the colon character in attribute and element names for use by namespaces. A qname may not start with a colon, since colon can't appear unless there is a prefix, the prefix is a NCName, and an NCName can't be empty (it must at least have the NameStartChar). (https://www.w3.org/TR/xml-names/#ns-qualnames and drill down from there.)

@rsc
Copy link
Contributor

rsc commented Jan 6, 2021

Anything claiming to be an XML parser should reject both.

We agree with you on that. We can try that for Go 1.17. (But that's not this issue.)

For this issue, it sounds like the general consensus is that we should not add IgnoreNamespaces.

That is, this seems like a likely decline.

@rsc
Copy link
Contributor

rsc commented Jan 6, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@FiloSottile
Copy link
Contributor Author

@rsc what about @rogpeppe’s CLs? With those added, encoding/xml should round-trip.

Those too would have to go in Go 1.17 because they significantly change the output.

@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Jan 13, 2021
gopherbot pushed a commit that referenced this issue Mar 15, 2021
Before this change, <:name> would parse as <name>, which could cause
issues in applications that rely on the parse-encode cycle to
round-trip. Similarly, <x name:=""> would parse as expected but then
have the attribute dropped when serializing because its name was empty.
Finally, <a:b:c> would parse and get serialized incorrectly. All these
values are invalid XML, but to minimize the impact of this change, we
parse them whole into Name.Local.

This issue was reported by Juho Nurminen of Mattermost as it leads to
round-trip mismatches. See #43168. It's not being fixed in a security
release because round-trip stability is not a currently supported
security property of encoding/xml, and we don't believe these fixes
would be sufficient to reliably guarantee it in the future.

Fixes CVE-2020-29509
Fixes CVE-2020-29511
Updates #43168

Change-Id: I68321c4d867305046f664347192948a889af3c7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/277892
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
gopherbot pushed a commit that referenced this issue Mar 15, 2021
A Directive (like <!ENTITY xxx []>) can't have other nodes nested inside
it (in our data structure representation), so there is no way to
preserve comments. The previous behavior was to just elide them, which
however might change the semantic meaning of the surrounding markup.
Instead, replace them with a space which hopefully has the same semantic
effect of the comment.

Directives are not actually a node type in the XML spec, which instead
specifies each of them separately (<!ENTITY, <!DOCTYPE, etc.), each with
its own grammar. The rules for where and when the comments are allowed
are not straightforward, and can't be implemented without implementing
custom logic for each of the directives.

Simply preserving the comments in the body of the directive would be
problematic, as there can be unmatched quotes inside the comment.
Whether those quotes are considered meaningful semantically or not,
other parsers might disagree and interpret the output differently.

This issue was reported by Juho Nurminen of Mattermost as it leads to
round-trip mismatches. See #43168. It's not being fixed in a security
release because round-trip stability is not a currently supported
security property of encoding/xml, and we don't believe these fixes
would be sufficient to reliably guarantee it in the future.

Fixes CVE-2020-29510
Updates #43168

Change-Id: Icd86c75beff3e1e0689543efebdad10ed5178ce3
Reviewed-on: https://go-review.googlesource.com/c/go/+/277893
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
ydnar added a commit to ydnar/go that referenced this issue Sep 27, 2021
ydnar added a commit to ydnar/go that referenced this issue Sep 27, 2021
I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
ydnar added a commit to ydnar/go that referenced this issue Sep 27, 2021
ydnar added a commit to ydnar/go that referenced this issue Sep 27, 2021
I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
ydnar added a commit to ydnar/go that referenced this issue Oct 12, 2021
All issues of golang#13400 which are not new functionalities have fixes.
There are minor incompatibilities between them due to the handling of
prefixes. Duplicating a prefix or an namespace is invalid XML.
This is now avoided. XML produced is always valid.

Tests have been added for each fix and example and previous tests
fixed as output is already more compact is some cases.

encoding/xml: fix duplication of namespace tags by encoder (golang#7535)

A tag prefix identifies the name space of the tag and not the default name
space like xmlns="...". Writing the prefix is incorrect when it is bound
to a name space using the standard xmlns:prefix="..." attribute.
This fix skips this print.

Consequences are
 - duplication is avoided in line with name space standard in reference.
 - the _xmlns declaration does not appear anymore. To keep the previous
behaviour, the prefix is printed in all other cases. Token now always
produces well-formed XML except when the explicit name space collides
with the prefix.

Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has
be removed in all wants of tests. In some cases, useless declarations like
xmlns:x="x" are still added in line with previous behavior.

encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431)

MarshalIndent and Marshal share code. When prefix and indent are
empty, the behavior is like Marshal when it should have a minimal
indent, i.e. new line as in documentation.

A boolean is added to the local printer struct which defaults to false.
It is set to true only when MarshalIndent is used and prefix and indent
are empty.

encoding/xml: fix overriding by empty namespace (golang#7113)

The namespace defined by xmlns="value" can be overridden in
every included tag including by the empty namespace xmlns="".
Empty namespace is not authorized with a prefix (xmlns:ns="").

Method to calculate indent of XML handles depth of tag and
its associated namespace. The fix leaves the method active even
when no indent is required.

An XMLName field in a struct means that namespace must be
enforced even if empty. This occurs only on an inner tag as an
overwrite of any non-empty namespace of its outer tag.

To obtain the xmlns="" required, an attribute is added.

encoding/xml: fix panic on embedded unexported XMLName (golang#10538)

By laws of reflection, unexported fields are unreachable by .Value.
XMLName are allowed at any level of an inner struct but the struct
may not be reachable. If XMLName field is found without name,
it cannot be exported and the value is discarded like other fields.
Some comments have been to underline where the issue arises.
Another XMLName test was incorrectly set up and is fixed.
Various cases added in a specific test.

encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497)

Encoding/xml is using type "typinfo" which provides its own "value" method
using the reflection package. It fails to check for the documented possible
panics of the reflection methods. It is impossible to fix the method as the
parameter is also using reflection.

The fix is to discard anonymous structs which have no value anyway.
Encoder/xml documentation already mentions that fields are accessed using
the reflection package. A relevant test is added.

encoding/xml: fix closing tag failure (golang#20685)

Push/pop of elements name must be done using the eventual prefix together
with the tag name. The operation is moved before the translation of prefix
into its URI. One end element of a test is fixed as expecting the last used
namespace is incorrect.
After closing a tag using a namespace, the valid namespace must be taken
from the opening tag.

encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724)

Comparing namespaces of fields was missing and false conflicts were detected.

encoding/xml: fix invalid empty namespace without prefix (golang#8068)

Empty namespace is allowed only if it has no prefix. An error message is now
returned if a prefix exists. A similar case when no prefix is provided,
thus with syntax xmlns:="..." is also rejected.

encoding/xml: fix normalization of attributes values (golang#20614)

The attribute value was read as text. The existing attribute reader logic is
fixed as an attribute may have a namespace or only a prefix. Other
possibilities have been removed.

To keep the behavior of raw token which allows many faults in attributes list,
error handling is heavily using the Strict parameter of the decoder.
Specific tests have been added including list of attributes.

To keep the behavior or unmarshal, escaped characters handling has been
added but it is not symmetrical to Marshal for quotes but follows XML
specification.

encoding/xml: fix absence of detection of another : in qualified names (golang#20396)

The occurrence of second : in space and tag name is rejected.

Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685

Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165

Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com>

encoding/xml: fix printing of namespace prefix in tag names

Prefix displays in XML when defined in tag names.
The URL of the name space is also returned for an End Token as
documentation requires to improve idempotency of Marshal/Unmarshal.

Translating the prefix is popping the NS which was unavailable
for translation. Translate for an End Token has been moved inside
the pop element part.

Fixes golang#9519

Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60

encoding/xml: restore changes lost in merge

See golang#43168.

encoding/xml: gofmt

encoding/xml: minimalIndent -> minIndent to shrink diff

encoding/xml: gofmt

encoding/xml: revert changes to (*Decoder).attrval from master

encoding/xml: restore (*printer).writeIndent to master

encoding/xml: remove comments that don’t change functionality of tests
ydnar added a commit to ydnar/go that referenced this issue Oct 12, 2021
encoding/xml: disable 3/4 tests for golang#43168

I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/355353 mentions this issue: encoding/xml: support xmlns prefixes

ydnar added a commit to ydnar/go that referenced this issue Oct 12, 2021
All issues of golang#13400 which are not new functionalities have fixes.
There are minor incompatibilities between them due to the handling of
prefixes. Duplicating a prefix or an namespace is invalid XML.
This is now avoided. XML produced is always valid.

Tests have been added for each fix and example and previous tests
fixed as output is already more compact is some cases.

encoding/xml: fix duplication of namespace tags by encoder (golang#7535)

A tag prefix identifies the name space of the tag and not the default name
space like xmlns="...". Writing the prefix is incorrect when it is bound
to a name space using the standard xmlns:prefix="..." attribute.
This fix skips this print.

Consequences are
 - duplication is avoided in line with name space standard in reference.
 - the _xmlns declaration does not appear anymore. To keep the previous
behaviour, the prefix is printed in all other cases. Token now always
produces well-formed XML except when the explicit name space collides
with the prefix.

Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has
be removed in all wants of tests. In some cases, useless declarations like
xmlns:x="x" are still added in line with previous behavior.

encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431)

MarshalIndent and Marshal share code. When prefix and indent are
empty, the behavior is like Marshal when it should have a minimal
indent, i.e. new line as in documentation.

A boolean is added to the local printer struct which defaults to false.
It is set to true only when MarshalIndent is used and prefix and indent
are empty.

encoding/xml: fix overriding by empty namespace (golang#7113)

The namespace defined by xmlns="value" can be overridden in
every included tag including by the empty namespace xmlns="".
Empty namespace is not authorized with a prefix (xmlns:ns="").

Method to calculate indent of XML handles depth of tag and
its associated namespace. The fix leaves the method active even
when no indent is required.

An XMLName field in a struct means that namespace must be
enforced even if empty. This occurs only on an inner tag as an
overwrite of any non-empty namespace of its outer tag.

To obtain the xmlns="" required, an attribute is added.

encoding/xml: fix panic on embedded unexported XMLName (golang#10538)

By laws of reflection, unexported fields are unreachable by .Value.
XMLName are allowed at any level of an inner struct but the struct
may not be reachable. If XMLName field is found without name,
it cannot be exported and the value is discarded like other fields.
Some comments have been to underline where the issue arises.
Another XMLName test was incorrectly set up and is fixed.
Various cases added in a specific test.

encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497)

Encoding/xml is using type "typinfo" which provides its own "value" method
using the reflection package. It fails to check for the documented possible
panics of the reflection methods. It is impossible to fix the method as the
parameter is also using reflection.

The fix is to discard anonymous structs which have no value anyway.
Encoder/xml documentation already mentions that fields are accessed using
the reflection package. A relevant test is added.

encoding/xml: fix closing tag failure (golang#20685)

Push/pop of elements name must be done using the eventual prefix together
with the tag name. The operation is moved before the translation of prefix
into its URI. One end element of a test is fixed as expecting the last used
namespace is incorrect.
After closing a tag using a namespace, the valid namespace must be taken
from the opening tag.

encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724)

Comparing namespaces of fields was missing and false conflicts were detected.

encoding/xml: fix invalid empty namespace without prefix (golang#8068)

Empty namespace is allowed only if it has no prefix. An error message is now
returned if a prefix exists. A similar case when no prefix is provided,
thus with syntax xmlns:="..." is also rejected.

encoding/xml: fix normalization of attributes values (golang#20614)

The attribute value was read as text. The existing attribute reader logic is
fixed as an attribute may have a namespace or only a prefix. Other
possibilities have been removed.

To keep the behavior of raw token which allows many faults in attributes list,
error handling is heavily using the Strict parameter of the decoder.
Specific tests have been added including list of attributes.

To keep the behavior or unmarshal, escaped characters handling has been
added but it is not symmetrical to Marshal for quotes but follows XML
specification.

encoding/xml: fix absence of detection of another : in qualified names (golang#20396)

The occurrence of second : in space and tag name is rejected.

Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685

Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165

Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com>

encoding/xml: fix printing of namespace prefix in tag names

Prefix displays in XML when defined in tag names.
The URL of the name space is also returned for an End Token as
documentation requires to improve idempotency of Marshal/Unmarshal.

Translating the prefix is popping the NS which was unavailable
for translation. Translate for an End Token has been moved inside
the pop element part.

Fixes golang#9519

Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60

encoding/xml: restore changes lost in merge

See golang#43168.

encoding/xml: gofmt

encoding/xml: minimalIndent -> minIndent to shrink diff

encoding/xml: gofmt

encoding/xml: revert changes to (*Decoder).attrval from master

encoding/xml: restore (*printer).writeIndent to master

encoding/xml: remove comments that don’t change functionality of tests

encoding/xml: use t.Errorf instead of fmt.Errorf

encoding/xml: fix test case for golang#11496
ydnar added a commit to ydnar/go that referenced this issue Oct 12, 2021
encoding/xml: disable 3/4 tests for golang#43168

I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
ydnar added a commit to ydnar/go that referenced this issue Oct 14, 2021
All issues of golang#13400 which are not new functionalities have fixes.
There are minor incompatibilities between them due to the handling of
prefixes. Duplicating a prefix or an namespace is invalid XML.
This is now avoided. XML produced is always valid.

Tests have been added for each fix and example and previous tests
fixed as output is already more compact is some cases.

encoding/xml: fix duplication of namespace tags by encoder (golang#7535)

A tag prefix identifies the name space of the tag and not the default name
space like xmlns="...". Writing the prefix is incorrect when it is bound
to a name space using the standard xmlns:prefix="..." attribute.
This fix skips this print.

Consequences are
 - duplication is avoided in line with name space standard in reference.
 - the _xmlns declaration does not appear anymore. To keep the previous
behaviour, the prefix is printed in all other cases. Token now always
produces well-formed XML except when the explicit name space collides
with the prefix.

Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has
be removed in all wants of tests. In some cases, useless declarations like
xmlns:x="x" are still added in line with previous behavior.

encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431)

MarshalIndent and Marshal share code. When prefix and indent are
empty, the behavior is like Marshal when it should have a minimal
indent, i.e. new line as in documentation.

A boolean is added to the local printer struct which defaults to false.
It is set to true only when MarshalIndent is used and prefix and indent
are empty.

encoding/xml: fix overriding by empty namespace (golang#7113)

The namespace defined by xmlns="value" can be overridden in
every included tag including by the empty namespace xmlns="".
Empty namespace is not authorized with a prefix (xmlns:ns="").

Method to calculate indent of XML handles depth of tag and
its associated namespace. The fix leaves the method active even
when no indent is required.

An XMLName field in a struct means that namespace must be
enforced even if empty. This occurs only on an inner tag as an
overwrite of any non-empty namespace of its outer tag.

To obtain the xmlns="" required, an attribute is added.

encoding/xml: fix panic on embedded unexported XMLName (golang#10538)

By laws of reflection, unexported fields are unreachable by .Value.
XMLName are allowed at any level of an inner struct but the struct
may not be reachable. If XMLName field is found without name,
it cannot be exported and the value is discarded like other fields.
Some comments have been to underline where the issue arises.
Another XMLName test was incorrectly set up and is fixed.
Various cases added in a specific test.

encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497)

Encoding/xml is using type "typinfo" which provides its own "value" method
using the reflection package. It fails to check for the documented possible
panics of the reflection methods. It is impossible to fix the method as the
parameter is also using reflection.

The fix is to discard anonymous structs which have no value anyway.
Encoder/xml documentation already mentions that fields are accessed using
the reflection package. A relevant test is added.

encoding/xml: fix closing tag failure (golang#20685)

Push/pop of elements name must be done using the eventual prefix together
with the tag name. The operation is moved before the translation of prefix
into its URI. One end element of a test is fixed as expecting the last used
namespace is incorrect.
After closing a tag using a namespace, the valid namespace must be taken
from the opening tag.

encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724)

Comparing namespaces of fields was missing and false conflicts were detected.

encoding/xml: fix invalid empty namespace without prefix (golang#8068)

Empty namespace is allowed only if it has no prefix. An error message is now
returned if a prefix exists. A similar case when no prefix is provided,
thus with syntax xmlns:="..." is also rejected.

encoding/xml: fix normalization of attributes values (golang#20614)

The attribute value was read as text. The existing attribute reader logic is
fixed as an attribute may have a namespace or only a prefix. Other
possibilities have been removed.

To keep the behavior of raw token which allows many faults in attributes list,
error handling is heavily using the Strict parameter of the decoder.
Specific tests have been added including list of attributes.

To keep the behavior or unmarshal, escaped characters handling has been
added but it is not symmetrical to Marshal for quotes but follows XML
specification.

encoding/xml: fix absence of detection of another : in qualified names (golang#20396)

The occurrence of second : in space and tag name is rejected.

Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685

Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165

Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com>

encoding/xml: fix printing of namespace prefix in tag names

Prefix displays in XML when defined in tag names.
The URL of the name space is also returned for an End Token as
documentation requires to improve idempotency of Marshal/Unmarshal.

Translating the prefix is popping the NS which was unavailable
for translation. Translate for an End Token has been moved inside
the pop element part.

Fixes golang#9519

Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60

encoding/xml: restore changes lost in merge

See golang#43168.

encoding/xml: gofmt

encoding/xml: minimalIndent -> minIndent to shrink diff

encoding/xml: gofmt

encoding/xml: revert changes to (*Decoder).attrval from master

encoding/xml: restore (*printer).writeIndent to master

encoding/xml: remove comments that don’t change functionality of tests

encoding/xml: use t.Errorf instead of fmt.Errorf

encoding/xml: fix test case for golang#11496

encoding/xml: revert unrelated change in (*Decoder).readName

encoding/xml: remove comments
ydnar added a commit to ydnar/go that referenced this issue Oct 14, 2021
encoding/xml: disable 3/4 tests for golang#43168

I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
ydnar added a commit to ydnar/go that referenced this issue Oct 14, 2021
All issues of golang#13400 which are not new functionalities have fixes.
There are minor incompatibilities between them due to the handling of
prefixes. Duplicating a prefix or an namespace is invalid XML.
This is now avoided. XML produced is always valid.

Tests have been added for each fix and example and previous tests
fixed as output is already more compact is some cases.

encoding/xml: fix duplication of namespace tags by encoder (golang#7535)

A tag prefix identifies the name space of the tag and not the default name
space like xmlns="...". Writing the prefix is incorrect when it is bound
to a name space using the standard xmlns:prefix="..." attribute.
This fix skips this print.

Consequences are
 - duplication is avoided in line with name space standard in reference.
 - the _xmlns declaration does not appear anymore. To keep the previous
behaviour, the prefix is printed in all other cases. Token now always
produces well-formed XML except when the explicit name space collides
with the prefix.

Made prefix handling "xmlns="space" xmlns:_xmlns="xmlns" _..." has
be removed in all wants of tests. In some cases, useless declarations like
xmlns:x="x" are still added in line with previous behavior.

encoding/xml: fix unexpected behavior of encoder.Indent("", "") (golang#13185, golang#11431)

MarshalIndent and Marshal share code. When prefix and indent are
empty, the behavior is like Marshal when it should have a minimal
indent, i.e. new line as in documentation.

A boolean is added to the local printer struct which defaults to false.
It is set to true only when MarshalIndent is used and prefix and indent
are empty.

encoding/xml: fix overriding by empty namespace (golang#7113)

The namespace defined by xmlns="value" can be overridden in
every included tag including by the empty namespace xmlns="".
Empty namespace is not authorized with a prefix (xmlns:ns="").

Method to calculate indent of XML handles depth of tag and
its associated namespace. The fix leaves the method active even
when no indent is required.

An XMLName field in a struct means that namespace must be
enforced even if empty. This occurs only on an inner tag as an
overwrite of any non-empty namespace of its outer tag.

To obtain the xmlns="" required, an attribute is added.

encoding/xml: fix panic on embedded unexported XMLName (golang#10538)

By laws of reflection, unexported fields are unreachable by .Value.
XMLName are allowed at any level of an inner struct but the struct
may not be reachable. If XMLName field is found without name,
it cannot be exported and the value is discarded like other fields.
Some comments have been to underline where the issue arises.
Another XMLName test was incorrectly set up and is fixed.
Various cases added in a specific test.

encoding/xml: fix panic of unmarshaling of anonymous structs (golang#16497)

Encoding/xml is using type "typinfo" which provides its own "value" method
using the reflection package. It fails to check for the documented possible
panics of the reflection methods. It is impossible to fix the method as the
parameter is also using reflection.

The fix is to discard anonymous structs which have no value anyway.
Encoder/xml documentation already mentions that fields are accessed using
the reflection package. A relevant test is added.

encoding/xml: fix closing tag failure (golang#20685)

Push/pop of elements name must be done using the eventual prefix together
with the tag name. The operation is moved before the translation of prefix
into its URI. One end element of a test is fixed as expecting the last used
namespace is incorrect.
After closing a tag using a namespace, the valid namespace must be taken
from the opening tag.

encoding/xml: add check of namespaces to detect field names conflicts (golang#8535, golang#11724)

Comparing namespaces of fields was missing and false conflicts were detected.

encoding/xml: fix invalid empty namespace without prefix (golang#8068)

Empty namespace is allowed only if it has no prefix. An error message is now
returned if a prefix exists. A similar case when no prefix is provided,
thus with syntax xmlns:="..." is also rejected.

encoding/xml: fix normalization of attributes values (golang#20614)

The attribute value was read as text. The existing attribute reader logic is
fixed as an attribute may have a namespace or only a prefix. Other
possibilities have been removed.

To keep the behavior of raw token which allows many faults in attributes list,
error handling is heavily using the Strict parameter of the decoder.
Specific tests have been added including list of attributes.

To keep the behavior or unmarshal, escaped characters handling has been
added but it is not symmetrical to Marshal for quotes but follows XML
specification.

encoding/xml: fix absence of detection of another : in qualified names (golang#20396)

The occurrence of second : in space and tag name is rejected.

Fixes: golang#7113, golang#7535, golang#8068, golang#8535, golang#10538, golang#11431, golang#13185, golang#16497, golang#20396, golang#20614, golang#20685

Change-Id: Ib4a60347a47d23ff59b63307cebb83b71c7c9165

Commit originally authored by: Constantin Konstantinidis <constantinkonstantinidis@gmail.com>

encoding/xml: fix printing of namespace prefix in tag names

Prefix displays in XML when defined in tag names.
The URL of the name space is also returned for an End Token as
documentation requires to improve idempotency of Marshal/Unmarshal.

Translating the prefix is popping the NS which was unavailable
for translation. Translate for an End Token has been moved inside
the pop element part.

Fixes golang#9519

Change-Id: Id7a14d07c106a76a487b5c4e28d9d563fe061c60

encoding/xml: restore changes lost in merge

See golang#43168.

encoding/xml: gofmt

encoding/xml: minimalIndent -> minIndent to shrink diff

encoding/xml: gofmt

encoding/xml: revert changes to (*Decoder).attrval from master

encoding/xml: restore (*printer).writeIndent to master

encoding/xml: remove comments that don’t change functionality of tests

encoding/xml: use t.Errorf instead of fmt.Errorf

encoding/xml: fix test case for golang#11496

encoding/xml: revert unrelated change in (*Decoder).readName

encoding/xml: remove comments

encoding/xml: remove comments

encoding/xml: edit comments
ydnar added a commit to ydnar/go that referenced this issue Oct 14, 2021
encoding/xml: disable 3/4 tests for golang#43168

I’m not sure what the right fix is for this. Malformed namespaces
(leading or trailing colons, more than 1 colon) should result in an error.
@rsc rsc moved this to Declined in Proposals Aug 10, 2022
@rsc rsc added this to Proposals Aug 10, 2022
@golang golang locked and limited conversation to collaborators Oct 12, 2022
@rsc rsc removed this from Proposals Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants