Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Escape Sequences #61

Open
rossgibb opened this issue Jan 26, 2017 · 2 comments
Open

Excel Escape Sequences #61

rossgibb opened this issue Jan 26, 2017 · 2 comments

Comments

@rossgibb
Copy link
Contributor

For posterity:

There is a known issue with Excel escape sequences of the form _x[0-9a-fA-F]{4}_

https://bz.apache.org/bugzilla/show_bug.cgi?id=57008

Demonstrated with this:

user> (use 'dk.ative.docjure.spreadsheet)

user> (let [wb (create-workbook "Test"
                                [["foo_x1070_bar"]])]
        (save-workbook! "/tmp/spreadsheet.xlsx" wb))

The _x1070_ gets replaced by something unprintable:

image

To prevent this the escape sequence _x005F must be prepended:

user> (let [wb (create-workbook "Test"
                                [["foo_x005F_x1070_bar"]])]
        (save-workbook! "/tmp/spreadsheet.xlsx" wb))

image

The function does the escaping, should be passed a sequence of sequences.

user> (defn spreadsheet-escape [data]
        (for [row data]
          (map #(if (string? %)
                  (clojure.string/replace
                   %
                   #"(?i)(_x[0-9a-f]{4}_)"
                   "_x005F$1")
                  %)
               row)))
#'user/spreadsheet-escape
user> (spreadsheet-escape [["foo_x1070_bar"]])
(("foo_x005F_x1070_bar"))
@mjul
Copy link
Owner

mjul commented Feb 19, 2017 via email

@rossgibb
Copy link
Contributor Author

This issue slipped my mind. I have been using the escape function with no trouble for a number of months, and I wanted to contribute it back.

I did some more digging.

Spreadsheets that are in the xls format do not seem to be affected:

dk.ative.docjure.spreadsheet-test> (let [sheet-name "Sheet 1"
	                                 sheet-data [["A1"]]
	                                 workbook (create-xls-workbook sheet-name sheet-data)
                                         a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]

                                     (set-cell! a1 "foo_x220F_bar")
                                     (.getStringCellValue a1))

"foo_x220F_bar"

The sequence is also case sensitive, this is demonstrates the replacement:

dk.ative.docjure.spreadsheet-test> (let [sheet-name "Sheet 1"
	                                 sheet-data [["A1"]]
	                                 workbook (create-workbook sheet-name sheet-data)
                                         a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]
                                     (set-cell! a1 "foo_x220F_bar")
                                     (.getStringCellValue a1))
"foo∏bar"

The 16-bit hex sequence must use upper case letters or replacement doesn't happen. For example, if using _x220f:

dk.ative.docjure.spreadsheet-test> (let [sheet-name "Sheet 1"
	                                 sheet-data [["A1"]]
	                                 workbook (create-workbook sheet-name sheet-data)
                                         a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]
                                     (set-cell! a1 "foo_x220f_bar")
                                     (.getStringCellValue a1))
"foo_x220f_bar"

The leading _x must be lowercase as well:

dk.ative.docjure.spreadsheet-test> (let [sheet-name "Sheet 1"
	                                 sheet-data [["A1"]]
	                                 workbook (create-workbook sheet-name sheet-data)
                                         a1 (-> workbook (.getSheetAt 0) (.getRow 0) (.getCell 0))]
                                     (set-cell! a1 "foo_X220F_bar")
                                     (.getStringCellValue a1))
"foo_X220F_bar"

I created this PR: #65

It adds the escape-cell function and some unit tests. I didn't see the need to alter any existing behaviour of the docjure code. A user running into this issue should be able to find escape-cell and the docstring there explains the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants