Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode special characters when serializing entity to XML #9

Merged
merged 2 commits into from
Aug 2, 2012
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ public void write(XMLStreamWriter writer) throws XMLStreamException {
writer.writeAttribute("m:type", edmType);
}

String value = edmValueConverter.serialize(edmType, entry.getValue().getValue());
String value = encodeNumericCharacterReference(edmValueConverter.serialize(edmType, entry
.getValue().getValue()));
if (value != null) {
writer.writeCharacters(value);
}
Expand Down Expand Up @@ -327,4 +328,21 @@ private void expect(XMLStreamReader xmlr, int eventType, String localName) throw
xmlr.require(eventType, null, localName);
nextSignificant(xmlr);
}

private String encodeNumericCharacterReference(String value) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that this could be triggered within the XML library itself. For example, that one could say to encode when writing out the XML string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the XML library we use cannot encode nor decode (see my comment below about &#5), then we need to consider using a different XML library.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on my research, numerical character reference does not come free in Jersey library.

if (value == null) {
return null;
}
else {
char[] charArray = value.toCharArray();
StringBuffer stringBuffer = new StringBuffer();
for (int index = 0; index < charArray.length; index++) {
if (charArray[index] < 0x20 || charArray[index] > 0x7f)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave \t, \r, and \n un-encoded? They are allowed as is, and help maintain readability.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode all of them keeps the consistency.

stringBuffer.append("&#").append(Integer.toHexString(charArray[index])).append(";");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that the hex string be padded with leading zeros if the length is 1, or not a power of 2.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://en.wikipedia.org/wiki/Numeric_character_reference, both are valid in sample part of the wiki.

else
stringBuffer.append(charArray[index]);
}
return stringBuffer.toString();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,44 @@ public void insertEntityWorks() throws Exception {
assertEquals(uuid.toString(), result.getEntity().getProperty("test7").getValue().toString());
}

@Test
public void insertEntityEscapeCharactersWorks() throws Exception {
// Arrange
Configuration config = createConfiguration();
TableContract service = TableService.create(config);
//Entity entity = new Entity().setPartitionKey("001").setRowKey("insertEntityEscapeCharactersWorks")
// .setProperty("test", EdmType.STRING, "t1").setProperty("test2", EdmType.STRING, "t2")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this commented code for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional potential test cases, removed for now.

// .setProperty("test3", EdmType.STRING, "t3");
Entity entity = new Entity().setPartitionKey("001").setRowKey("insertEntityEscapeCharactersWorks")
.setProperty("test", EdmType.STRING, "\u0005").setProperty("test2", EdmType.STRING, "\u0011")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the "easy" characters. You should also include troublesome ones. For example, "\uB2E4\uB974\uB2E4\uB294\u0625 \u064A\u062F\u064A\u0648"

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new test case with bigger code added to validate the correctness of the logic. .

.setProperty("test3", EdmType.STRING, "\u0025");

// Act
InsertEntityResult result = service.insertEntity(TEST_TABLE_2, entity);

// Assert
assertNotNull(result);
assertNotNull(result.getEntity());

assertEquals("001", result.getEntity().getPartitionKey());
assertEquals("insertEntityEscapeCharactersWorks", result.getEntity().getRowKey());
assertNotNull(result.getEntity().getTimestamp());
assertNotNull(result.getEntity().getEtag());

assertNotNull(result.getEntity().getProperty("test"));
String actualTest1 = (String) result.getEntity().getProperty("test").getValue();
assertEquals("&#5;", actualTest1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relation between \u0005 and &#5?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. actualTest1 should be equal to "\u0005". The XML should be un-encoded when read back from the server. If it is not, then you have a bug and the test should be failing.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\u0005 is java's way of encoding special character, which cannot be accept in web service. In Web Service world, it is widely adopted to use Numeric Character Reference (http://en.wikipedia.org/wiki/Numeric_character_reference ). Unfortunately, decode numeric character reference is not provided in standard Java library. We can consider doing that for the customer in the convenience layer.
A new issue filed Azure#142


assertNotNull(result.getEntity().getProperty("test2"));
String actualTest2 = (String) result.getEntity().getProperty("test2").getValue();
assertEquals("&#11;", actualTest2);

assertNotNull(result.getEntity().getProperty("test3"));
String actualTest3 = (String) result.getEntity().getProperty("test3").getValue();
assertEquals("%", actualTest3);

}

@Test
public void updateEntityWorks() throws Exception {
System.out.println("updateEntityWorks()");
Expand Down