Control character warning for weird column name in tibble in list #574

oloverm · 2019-02-07T17:27:35Z

I've got a tibble where one of the column names is ComparedtoPHECentres(2015)valueorpercentiles (not my choice). If I print it within a list, and my console isn't wide enough to fit that column, I get a warning from fansi::strwrap_ctl(). It also prints the column name as �ÿþComparedtoPHECentres(2015)valueorpercentiles�ÿþ. I don't know if it's the length of the column name or the fact that it's got parentheses, or why it only happens if it's an element in a list.

library(pacman)

p_load(dplyr, fingertipsR)

df <- fingertips_data(91523, ParentAreaTypeID = 104) %>% 
    as_tibble()

list(df)
# [[1]]
# # A tibble: 972 x 26
#    IndicatorID IndicatorName ParentCode ParentName AreaCode AreaName
#          <int> <chr>         <chr>      <chr>      <chr>    <chr>   
#  1       91523 All new STI ~ NA         NA         E920000~ England 
#  2       91523 All new STI ~ E92000001  England    E450000~ London ~
#  3       91523 All new STI ~ E92000001  England    E450000~ West Mi~
#  4       91523 All new STI ~ E92000001  England    E450000~ North E~
#  5       91523 All new STI ~ E92000001  England    E450000~ Yorkshi~
#  6       91523 All new STI ~ E92000001  England    E450000~ East Mi~
#  7       91523 All new STI ~ E92000001  England    E450000~ East of~
#  8       91523 All new STI ~ E92000001  England    E450000~ North W~
#  9       91523 All new STI ~ E92000001  England    E450000~ South E~
# 10       91523 All new STI ~ E92000001  England    E450000~ South W~
# # ... with 962 more rows, and 20 more variables: AreaType <chr>,
# #   Sex <chr>, Age <chr>, CategoryType <chr>, Category <chr>,
# #   Timeperiod <chr>, Value <dbl>, LowerCI95.0limit <dbl>,
# #   UpperCI95.0limit <dbl>, LowerCI99.8limit <dbl>,
# #   UpperCI99.8limit <dbl>, Count <dbl>, Denominator <dbl>,
# #   Valuenote <chr>, RecentTrend <chr>,
# #   ComparedtoEnglandvalueorpercentiles <chr>,
# #   `�ÿþComparedtoPHECentres(2015)valueorpercentiles�ÿþ` <chr>,
# #   TimeperiodSortable <int>, Newdata <chr>, Comparedtogoal <chr>
# 
# Warning message:
# In fansi::strwrap_ctl(x, width = max(width, 0), indent = indent,  :
#   Encountered a C0 control character, see `?unhandled_ctl`; you can use `warn=FALSE` to turn off these warnings.

The text was updated successfully, but these errors were encountered:

oloverm · 2019-02-11T17:06:22Z

Also, if there are multiple tibbles in the list with the same column names, the warning only shows up once, and that column name is printed normally for all but the first one:

hadley · 2019-03-21T20:59:48Z

I think is the same problem as tidyverse/dbplyr#223, and is probably a bug in base R.

hlynurhallgrims · 2019-04-30T15:43:53Z

I think this is possibly connected to this issue I came accross on Stack Overflow, which I could only recreate using readxl.

When I try to recreate the SO issue by creating the tibble using tribble, I don't get the SO error (I only get that with tibbles created by reading from readxl), I get the same error as @cucumberry here, but only if the console is too narrow to print all the columns.

my_tibble <- tibble::tribble(~good_column, ~'very bad\ncolumn', ~'terribly\nlong column name here', ~'more', ~'and then even', ~'more than that',
                             1, 2, 3, 4, 5, 6,
                             7, 8, 9, 10, 11, 12)
my_tibble
#> # A tibble: 2 x 6
#>   good_column `very bad\ncolu~ `terribly\nlong~  more `and then even`
#>         <dbl>            <dbl>            <dbl> <dbl>           <dbl>
#> 1           1                2                3     4               5
#> 2           7                8                9    10              11
#> # ... with 1 more variable: `more than that` <dbl>

list(my_tibble, my_tibble)
#> Warning message:
#>In fansi::strwrap_ctl(x, width = max(width, 0), indent = indent,  :
#> Encountered a C0 control character, see `?unhandled_ctl`; you can use `warn=FALSE` to turn off these warnings.

I should add that the above is not a reprex as rendering the reprex didn't render the error, no matter how wide or narrow the console and viewer panes were.

Much like the screenshot above from @cucumberry, the ÿþ mark also shows up in one of the column names in the first printing of the tibble, but not the second (See picture).

But on to the SO example I linked to above

The same bad characters result in a different error when the tibble in question is the result of being read in from Excel through readxl::read_excel(). Here's the link to the Excel file in question if anyone is interested.

Here I get a different error.

all_sheets <- readxl::excel_sheets(path = here::here("data", "Posti-Letto-Istat.xls"))

all_sheets %>% 
  purrr::map(.x = .,
             .f = ~readxl::read_excel(path  = here::here("data", "Posti-Letto-Istat.xls"),
                                      sheet = .x,
                                      skip  = 4))
#>[[1]]
#>Error in nchar(x[is_na], type = "width") : 
#>  invalid multibyte string, element 1

Of course any measure to get rid of the bad characters before printing the list of tibbles fixes this.

all_sheets <- readxl::excel_sheets(path = here::here("data", "Posti-Letto-Istat.xls"))

all_sheets %>% 
  purrr::map(.x = .,
             .f = ~readxl::read_excel(path  = here::here("data", "Posti-Letto-Istat.xls"),
                                      sheet = .x,
                                      skip  = 4)) %>% 
  map(janitor::clean_names)
# This prints just fine, obviously

Maybe I'm mistaken and it's not connected, but I figured I'd mention it if there's a chance that it is.

krlmlr · 2020-03-21T06:33:47Z

@brodieG: What's the best way to deal with unsanitized user input (in the form of borked column names) for display? I'm happy with printing a demangled version and mentioning in the output that some of the names were mangled originally. Can I safely strip_sgr(warn = FALSE) and then compare if the names changed?

Also, when reviewing the wrapping we need to take a look at why column names with spaces distort the output, at least in RStudio, when they appear in the footer of a tibble (too many columns).

brodieG · 2020-03-21T13:36:15Z

You probably want strip_ctl(warn=FALSE) as strip_sgr will only do the formatting sequences, but otherwise it should be relatively safe to do as you suggest. This will not address the invalid multi-byte sequences mentioned above.

In re: warn=FALSE, keep in mind the warning is there b/c fansi does not understand the semantics of the control sequences in the context in question. So for strip_ctl it won't even warn for C0 control sequences because it knows they are one byte long and can strip them without caring about what they do to cursor position, etc., but it will warn for malformed (or correctly formed but unsupported) CSI sequences because it doesn't necessarily know where they end and might be stripping stuff it should not or not stripping stuff it should.

So in short strip_ctl(., warn=FALSE) and compare output is probably fine, or even strip_ctl(., warn=FALSE, ctl=c('all', 'sgr', 'nl')) if you want to allow the known controls¹.

In re: spaces in footer, is this something fansi related? If so, could you give me an example, I skimmed the thread and couldn't quite tell what you were referencing.

this will leave in unknown but syntactically valid SGR sequences, which then later may cause other functions to emit warnings. ↩

krlmlr · 2020-03-21T15:34:26Z

Thanks. I don't think that column names should contain any controls -- will proceed.

Related to names with spaces, the following is an example where the tibble is too wide to fit one line and the "with ... more variables" is shown in the footer. Names are wrapped, and the first name in each footer row is printed badly. Not sure whose responsibility this is. (The SGR codes are stripped in the reprex, I can replicate in a terminal and in RStudio.)

library(tidyverse)

N <- 16
data <- tibble(letter = letters[1:N], i = 1:N)

cross <- crossing(data, j = 1:N)

row <-
  cross %>%
  filter(i >= j) %>%
  group_by(j) %>%
  summarize(name = paste(letter, collapse = " ")) %>%
  ungroup() %>%
  select(name, j) %>%
  deframe()

tbl <- tibble(!!!row)

options(crayon = TRUE)
fmt <- format(tbl)
fmt
#> [1] "# A tibble: 1 x 16"                                                                                                                                                                                                                                                                                             
#> [2] "  `a b c d e f g … `b c d e f g h … `c d e f g h i … `d e f g h i j …"                                                                                                                                                                                                                                          
#> [3] "             <int>            <int>            <int>            <int>"                                                                                                                                                                                                                                          
#> [4] "1                1                2                3                4"                                                                                                                                                                                                                                          
#> [5] "# … with 12 more variables: `e f g h i j k l m n o p` <int>, `f g h i j k l m n\n#   o p` <int>, `g h i j k l m n o p` <int>, `h i j k l m n o p` <int>, `i j k\n#   l m n o p` <int>, `j k l m n o p` <int>, `k l m n o p` <int>, `l m n o\n#   p` <int>, `m n o p` <int>, `n o p` <int>, `o p` <int>, p <int>"
cat(fmt, sep = "\n")
#> # A tibble: 1 x 16
#>   `a b c d e f g … `b c d e f g h … `c d e f g h i … `d e f g h i j …
#>              <int>            <int>            <int>            <int>
#> 1                1                2                3                4
#> # … with 12 more variables: `e f g h i j k l m n o p` <int>, `f g h i j k l m n
#> #   o p` <int>, `g h i j k l m n o p` <int>, `h i j k l m n o p` <int>, `i j k
#> #   l m n o p` <int>, `j k l m n o p` <int>, `k l m n o p` <int>, `l m n o
#> #   p` <int>, `m n o p` <int>, `n o p` <int>, `o p` <int>, p <int>

^{Created on 2020-03-21 by the reprex package (v0.3.0)}

brodieG · 2020-03-21T21:38:37Z

Ah, I see. strwrap_ctl has no concept of words beyond white-space delimited tokens. There is no parsing of strings to detect quoted tokens or anything of the sort. This is the same as with strwrap. One way to solve it might be to replace the column names with equal length, space-less strings, wrap that, compute the lengths of the resulting strings, and substring the original based on those lengths.

krlmlr · 2020-03-28T07:23:38Z

I can't replicate the original problem in R 3.6.3.

github-actions · 2021-03-29T00:15:23Z

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

krlmlr added this to the 3.0.0 milestone Mar 21, 2020

krlmlr mentioned this issue Mar 28, 2020

Restyling wrapped strings brodieG/fansi#64

Closed

krlmlr closed this as completed Mar 28, 2020

github-actions bot locked and limited conversation to collaborators Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control character warning for weird column name in tibble in list #574

Control character warning for weird column name in tibble in list #574

oloverm commented Feb 7, 2019

oloverm commented Feb 11, 2019

hadley commented Mar 21, 2019

hlynurhallgrims commented Apr 30, 2019 •

edited

Loading

krlmlr commented Mar 21, 2020

brodieG commented Mar 21, 2020

krlmlr commented Mar 21, 2020 •

edited

Loading

brodieG commented Mar 21, 2020

krlmlr commented Mar 28, 2020

github-actions bot commented Mar 29, 2021

Control character warning for weird column name in tibble in list #574

Control character warning for weird column name in tibble in list #574

Comments

oloverm commented Feb 7, 2019

oloverm commented Feb 11, 2019

hadley commented Mar 21, 2019

hlynurhallgrims commented Apr 30, 2019 • edited Loading

But on to the SO example I linked to above

krlmlr commented Mar 21, 2020

brodieG commented Mar 21, 2020

Footnotes

krlmlr commented Mar 21, 2020 • edited Loading

brodieG commented Mar 21, 2020

krlmlr commented Mar 28, 2020

github-actions bot commented Mar 29, 2021

hlynurhallgrims commented Apr 30, 2019 •

edited

Loading

krlmlr commented Mar 21, 2020 •

edited

Loading