Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider defaulting narrative text to the xhtml namespace #290

Closed
lmsurpre opened this issue Jan 17, 2017 · 4 comments
Closed

Consider defaulting narrative text to the xhtml namespace #290

lmsurpre opened this issue Jan 17, 2017 · 4 comments

Comments

@lmsurpre
Copy link

We are getting a lot of xmlns="" attributes in the narrative text of our output.

There is a related issue where it was recommend to set the namespace of the xhtml when you call the api:
#193

This is not ideal for us because we build up the narrative text from xhtml snippets that we store in our db and we'd prefer to avoid adding xmlns everywhere in our views.

It seems like the narrative text should always be in the xhtml namespace and so maybe the library could assume that? Or maybe if you add a snippet of narrative text with no namespace, it could automatically make that http://www.w3.org/1999/xhtml ?

@wmrutten
Copy link
Contributor

Hi Lee, thank you for reporting this. We have to be very careful with namespaces, as invalid namespaces will prevent correct XML (de-)serialization. So I'm not sure if the library can "assume" anything.

Can you provide a small code example of how you build up the narrative using the API? Maybe we can come up with a workaround.

@lmsurpre
Copy link
Author

I'll try to provide and example soon. In the meantime, a quick question: are there any cases where the elements under narrative text would have a namespace other than http://www.w3.org/1999/xhtml ?
It seems to me like that's not even allowed by the spec, so why support it?

@wmrutten
Copy link
Contributor

Right, other namespaces are not allowed and (IIRC) should be rejected by the API.

@ewoutkramer
Copy link
Member

The behaviour of the SDK here is dubious. The intentions are right, but the implementation is wrong (this is the serializer):

 // The value *should* be valid xhtml - however if people just provide a plain
 // string, lets add a <div> around it.
 if (!value.TrimStart().StartsWith("<div"))
        value = $"<div xmlns='http://www.w3.org/1999/xhtml'>{value}</div>";

 var sanitized = SerializationUtil.SanitizeXml(value);
 XElement xe = XElement.Parse(sanitized);

 // The div should be in the XHTMLNS namespace, correct if it 
 // is not the case.
 xe.Name = XmlNs.XHTMLNS + xe.Name.LocalName;
 parent.Add(xe);

As is visible, this code tries to correct for developer mistakes: if the developer supplies a value that does not start with a div, we'll add one (e.g. the common mistake of starting with <p>, or, when the data comes in from json, just contain text). This works fine.

However when the developer supplies xml content starting with a div but not in the xhtml namespace, we will add the xhtml namespace to the root. And this is a mistake: this is to the root only. So all nested elements in the xml will still be outside the xhtml namespace, and all of them will therefore have the xmlns='' attribute added to them (as Lee reported).

We could fix this by going down all children and applying the xhtml namespace (if there is no namespace).

Should we correct this bug? What we could have done is throw an exception. But the design filosophy of the serializer is to just go on and not do any kind of validation of the inputs. It would clearly be a breaking change (stuff that serialized before now won't serialize anymore).

Can we leave it as is? I think so. It has been there for a while, and there are few complaints. I guess this is because the current sea of xmlns='' attributes trigger people to look at their code - and so they realize they forgot to apply the xhtml namespace. So it functions nicely as a red flag that something is wrong. (and not changing the root namespace at all means we would generate incorrect XHTML - so we can't do that either).

To conclude: if we don't mind breaking peoples current code, we can fix it. But it is likely that the current situation is "good enough". I suggest we leave it as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants