-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 in/out #118
UTF-8 in/out #118
Conversation
…acters are possible involved
Codecov Report
@@ Coverage Diff @@
## master #118 +/- ##
==========================================
+ Coverage 59.68% 62.33% +2.64%
==========================================
Files 30 30
Lines 8729 8740 +11
==========================================
+ Hits 5210 5448 +238
+ Misses 3519 3292 -227
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function would also fit into the helper.R file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently planning to replace all the readLines
with an equivalent in cpp. That should boost the performance and the UTF8 encoding can be solved at the same time.
Thank you for your work. This will improve the package a lot. |
# development openxlsx 4.2.4 ## Fixes * `Write.xlsx()` now successfully passes `withFilter` ([#151](ycphs/openxlsx#151)) * code clean up PR [#168](ycphs/openxlsx#168) * removal of unused variables PR [#168](ycphs/openxlsx#168) ## New features * adds `buildWorkbook()` to generate a `Workbook` object from a (named) list or a data.frame ([#192](ycphs/openxlsx#192), [#187](ycphs/openxlsx#187)) * this is now recommended rather than the `write.xlsx(x, file) ; wb <- read.xlsx(file)` functionality before * `write.xlsx()` is now a wrapper for `wb <- buildWorkbook(x); saveWorkbook(x, file)` * parameter checking from `write.xlsx()` >> `buildWorkbook()` are now held off until passed to `writeData()`, `writeDataTable()`, etc * `row.names` is now deprecated for `writeData()` and `writeDataTable()`; please use `rowNames` instead * `read.xlsx()` now checks for the file extension `.xlsx`; previously it would throw an error when the file was `.xls` or `.xlm` files * memory allocation improvements * global options added for `minWidth` and `maxWidth` * `write.xlsx()` >> `buildWorkbook()` can now handle `colWidths` passed as either a single element or a `list()` * Added ability to change positioning of summary columns and rows. * These can be set with the `summaryCol` and `summaryRow` arguments in `pageSetup()`. * `activeSheet` allows to set and get the active (displayed) sheet of a worbook. * Adds new global options for workbook formatting ([#165](ycphs/openxlsx#165); see `?op.openxlsx`) # openxlsx 4.2.3 ## New Features * Most of functions in openxlsx now support non-ASCII arguments better. More specifically, we can use non-ASCII strings as names or contents for `createNamedRegion()` ([#103](ycphs/openxlsx#103)), `writeComment()`, `writeData()`, `writeDataTable()` and `writeFormula()`. In addition, openxlsx now reads comments and region names that contain non-ASCII strings correctly on Windows. Thanks to @shrektan for the PR [#118](ycphs/openxlsx#118). * `setColWidths()` now supports zero-length `cols`, which is convinient when `cols` is dynamically provided [#128](ycphs/openxlsx#128). Thanks to @shrektan for the feature request and the PR. ## Fixes for Check issues * Fix to pass the tests for link-time optimization type mismatches * Fix to pass the checks of native code (C/C++) based on static code analysis ## Bug Fixes * Grouping columns after setting widths no longer throws an error ([#100](ycphs/openxlsx#100)) * Fix inability to save workbook more than once ([#106](ycphs/openxlsx#106)) * Fix `loadWorkbook()` sometimes importing incorrect column attributes # openxlsx 4.2.2 ## New Features * Added features for `conditionalFormatting` to support also 'contains not', 'begins with' and 'ends with' * Added return value for `saveWorkbook()` the default value for `returnValue` is `FALSE` ([#71](ycphs/openxlsx#71)) * Added Tests for new parameter of `saveWorkbook()` ## Bug Fixes * Solved CRAN check errors based on the change disussed in [PR#17277](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17277) # openxlsx 4.2.0 ## New Features * Added `groupColumns()`, `groupRows()`, `ungroupColumns()`, and `ungroupRows()` to group/ugroup columns/rows ([#32](ycphs/openxlsx#32)) ## Bug Fixes * Allow xml-sensitve characters in sheetnames ([#78](ycphs/openxlsx#78)) ## Internal * Updated roxygen2 to 7.1.1 # openxlsx 4.1.5.1 ## Bug Fixes * fixed issue [#68](ycphs/openxlsx#68]) # openxlsx 4.1.5 ## New Features * Add functions to get and set the creator of the xlsx file * add function to set the name of the user who last modified the xlsx file ## Bug Fixes * Fixed NEWS hyperlink * Fixed writing of mixed EST/EDT datetimes * Added description for `writeFormula()` to use only english function names * Fixed validateSheet for special characters ## Internal * applied the tidyverse-style to the package `styler::style_pkg()` * include tests for `cloneWorksheet` # openxlsx 4.1.4 ## New Features * Added `getCellRefs()` as function. [#7](ycphs/openxlsx#7) * Added parameter for customizing na.strings ## Bug Fixes * Use `zip::zipr()` instead of `zip::zip()`. * Keep correct visibility option for loadWorkbook. [#12](ycphs/openxlsx#12]) * Add space surrounding "wrapText" [#17](ycphs/openxlsx#17) * Corrected Percentage, Accounting, Comma, Currency class on column level * update to rogygen2 7.0.0 # openxlsx 4.1.3 ## New Features * Added a `NEWS.md` file to track changes to the package. * Added `pkgdown` to create site. ## Bug Fixes * Return values for cpp changed to R_NilValue for r-devel tests * Added empty lines at the end of files # openxlsx 4.1.2 * Changed maintainer # openxlsx 4.1.1 ## New Features * `sep.names` allows choose other separator than '.' for variable names with a blank inside * Improve handling of non-region names in `getNamedRegions` and add related test
Fixes #103
Hi, first of all, thanks for writing this fantasic package.
As a Chinese user, I often need to read / write Excel sheets contain Chinese letters on Windows, where the encoding issue is a huge headache. According to my test, there're at least two known issues::
These two issues are difficult to avoid because by default, Excel will create locale language names for the comments (the author name) and Excel table's names.
And these two issues cause more trouble when I use such an Excel as a template. Resaving the excel object modified by openxlsx will malform the Excel files. I mean, when I open the Excel files that write by openxlsx, Excel will alarm that some contents are illegal.
The root cause of this is that R can't use UTF-8 as the native encoding on Windows as the time of writing (luckily, this may be available in the near future). So, the only reliable cure for this headache for now is to use UTF-8 encoded strings whenever we can.
This PR tries to do the two things:
readUTF8()
to ensure that we read the xml as UTF-8 encoded.Despite it looks like lots of lines has been changed, I believe the PR is simple, if you read the commit in sequence.
Thanks!
TODO
Think & Future
We should check all the C++ function's that used in R code and change the argument type from
std::string
toSEXP
. Only so can we convert the strings to UTF-8 encoded later on the Cpp side, thus guarantee our policy UTF-8 in / out in everywhere. The reason is that we can only know the encoding info inSEXP
but notstd::string
. Another possible solution is to see if there's some configs in Rcpp so that it can ensure the auto-SEXP-to-std::string ends up with a UTF-8 encoded string.Anyway, I think we'd better not make this PR further complicate and leave this task to the future PRs.