Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor summary() to separate compute and print #677

Merged
merged 6 commits into from
Jan 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ Imports:
dplyr (>= 0.8.0),
knitr (>= 1.2),
magrittr (>= 1.5),
pillar,
purrr,
repr,
rlang (>= 0.3.1),
Expand All @@ -145,7 +146,7 @@ VignetteBuilder:
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
RoxygenNote: 7.1.2
Collate:
'deprecated.R'
'dplyr.R'
Expand Down
7 changes: 6 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Generated by roxygen2: do not edit by hand

S3method(ctl_new_pillar,one_skim_df)
S3method(format,summary_skim_df)
S3method(get_skimmers,AsIs)
S3method(get_skimmers,Date)
S3method(get_skimmers,POSIXct)
Expand All @@ -18,7 +20,6 @@ S3method(knit_print,skim_df)
S3method(knit_print,skim_list)
S3method(knit_print,summary_skim_df)
S3method(mutate,skim_df)
S3method(print,one_skim_df)
S3method(print,skim_df)
S3method(print,skim_list)
S3method(print,summary_skim_df)
Expand All @@ -29,6 +30,7 @@ S3method(skim_by_type,data.frame)
S3method(skim_by_type,data.table)
S3method(skim_by_type,grouped_df)
S3method(summary,skim_df)
S3method(tbl_format_header,one_skim_df)
S3method(to_long,default)
S3method(to_long,skim_df)
S3method(vec_cast,skim_df.skim_df)
Expand Down Expand Up @@ -62,6 +64,7 @@ export(get_sfl)
export(get_skimmers)
export(group_names)
export(has_skim_type_attribute)
export(has_skimmers)
export(has_skimr_attributes)
export(has_type_column)
export(has_variable_column)
Expand Down Expand Up @@ -107,6 +110,8 @@ export(yank)
importFrom(dplyr,mutate)
importFrom(knitr,knit_print)
importFrom(magrittr,"%>%")
importFrom(pillar,ctl_new_pillar)
importFrom(pillar,tbl_format_header)
importFrom(repr,repr_text)
importFrom(rlang,"%||%")
importFrom(rlang,.data)
Expand Down
120 changes: 48 additions & 72 deletions R/skim_print.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,11 @@
#' using [dplyr::select()] or [dplyr::summarize()] on a `skim_df`. In those
#' cases, this method falls back to [tibble::print.tbl()].
#'
#' @section Controlling metadata behavior:
#'
#' On POSIX systems, `skimr` removes the tibble metadata when generating output.
#' On some platforms, this can lead to all output getting removed. To disable
#' that behavior, set either `strip_metadata = FALSE` when calling print or use
#' `options(skimr_strip_metadata = FALSE)`. The `crayon` package and the color
#' support within `tibble` is also a factor. If your `skimr` results tables are
#' empty you may need to run the following `options(crayon.enabled = FALSE)`.
#'
#' @inheritParams tibble:::print.tbl
#' @seealso [tibble::trunc_mat()] For a list of global options for customizing
#' print formatting. [crayon::has_color()] for the variety of issues that
#' affect tibble's color support.
#' @param include_summary Whether a summary of the data frame should be printed
#' @param strip_metadata Whether tibble metadata should be removed.
#' @param rule_width Width of the cli rules in printed skim object. Defaults
#' to base::options()$width
#' @param summary_rule_width Width of Data Summary cli rule, defaults to 40.
#' @name print
NULL
Expand All @@ -46,87 +34,78 @@ print.skim_df <- function(x,
include_summary = TRUE,
n = Inf,
width = Inf,
n_extra = NULL,
strip_metadata = getOption(
"skimr_strip_metadata", FALSE
),
rule_width = base::options()$width,
summary_rule_width = 40,
...) {
withr::local_options(list(crayon.enabled = FALSE))
if (is_skim_df(x) && nrow(x) > 0) {
if (include_summary) {
print(summary(x), .summary_rule_width = summary_rule_width, ...)
}
by_type <- partition(x)
purrr::map(
by_type, print, n, width, n_extra, strip_metadata,
.rule_width = rule_width, ...
by_type,
print,
width = width,
n = n,
...
)
invisible(NULL)
} else {
NextMethod("print")
}
}

#' @describeIn print Print an entry within a partitioned `skim_df`.
#' @param .rule_width Width for the rule above the skim results for each type.
#' @param .width Width for the tibble for each type.

# Methods for correctly formatting a a `one_skim_df`. We leverage the
# customiztion options in `pillar` for this. It divides the results into: a
# header, which we customize; a body, where we strip some values; and a footer,
# which we drop. For more details, see
# https://pillar.r-lib.org/articles/extending.html

#' @importFrom pillar tbl_format_header
#' @export
print.one_skim_df <- function(x,
n = Inf,
.width = Inf,
n_extra = NULL,
strip_metadata = getOption(
"skimr_strip_metadata", FALSE
),
.rule_width = base::options()$width,
...) {
tbl_format_header.one_skim_df <- function(x, setup, ...) {
variable_type <- paste("Variable type:", attr(x, "skim_type"))
top_line <- cli::rule(line = 1, left = variable_type, width = .rule_width)
out <- format(x, ..., n = n, width = .width, n_extra = n_extra)
if (is.null(strip_metadata)) {
strip_metadata <- TRUE
}
if (strip_metadata) {
metadata <- -1 * grab_tibble_metadata(out)
} else {
metadata <- seq_along(out)
}
render_skim_body(top_line, out, metadata)
}

grab_tibble_metadata <- function(x) {
if (crayon::has_color()) {
grep("^\\s*\\\033\\[38;5;\\d{3}m[#\\*]", x)
} else {
grep("^\\s*[#\\*]", x)
}
rule <- cli::rule(
line = 1,
left = variable_type,
width = getOption("width", 80)
)
# Add an empty line before the rule
c("", rule)
}

render_skim_body <- function(top_line, out, metadata_to_remove) {
cat(paste0("\n", top_line), out[metadata_to_remove], sep = "\n")
#' @importFrom pillar ctl_new_pillar
#' @export
ctl_new_pillar.one_skim_df <- function(controller,
x,
width,
...,
title = NULL) {
out <- NextMethod()
out$type <- NULL
out
}

#' @describeIn print Print a `skim_list`, a list of `skim_df` objects.
#' @export
print.skim_list <- function(x, n = Inf, width = Inf, n_extra = NULL,
.rule_width = base::options()$width, ...) {
print.skim_list <- function(x,
n = Inf,
width = Inf,
...) {
nms <- names(x)
attributes(x) <- NULL
print(rlang::set_names(x, nms), rule_width = .rule_width)
print(rlang::set_names(x, nms))
}


#' @describeIn print Print method for a `summary_skim_df` object.
#' @param .summary_rule_width the width for the main rule above the summary.
#' @export
print.summary_skim_df <- function(x, .summary_rule_width = 40, ...) {
cat(paste0(cli::rule(
line = 1, left = "Data Summary",
width = .summary_rule_width
), "\n"))
print.table(x)
with_title <- c(
cli::rule(line = 1, left = "Data Summary", width = .summary_rule_width),
format(x)
)
writeLines(with_title)
}

#' Provide a default printing method for knitr.
Expand Down Expand Up @@ -160,14 +139,7 @@ knit_print.skim_df <- function(x, options = NULL, ...) {
if (is_skim_df(x) && nrow(x) > 0) {
if (options$skimr_include_summary %||% TRUE) {
summary_stats <- summary(x)

kabled <- knitr::kable(
summary_stats,
table.attr = "style='width: auto;'
class='table table-condensed'",
col.names = c(" "),
caption = "Data summary"
)
kabled <- knit_print(summary_stats)
} else {
kabled <- c()
}
Expand Down Expand Up @@ -214,11 +186,15 @@ knit_print.one_skim_df <- function(x, options = NULL, ...) {
#' @describeIn knit_print Default `knitr` print for `skim_df` summaries.
#' @export
knit_print.summary_skim_df <- function(x, options = NULL, ...) {
summary_mat <- cbind(
get_summary_dnames(x),
get_summary_values(x),
deparse.level = 0
)
kabled <- knitr::kable(
x,
summary_mat,
table.attr = "style='width: auto;'
class='table table-condensed'",
col.names = c(" "),
caption = "Data summary"
)

Expand Down
4 changes: 0 additions & 4 deletions R/skimr-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,6 @@
#' @docType package
NULL

.onLoad <- function(libname, pkgname) {
options(skimr_strip_metadata = .Platform$OS.type != "windows")
}


# Imports -----------------------------------------------------------------

Expand Down
86 changes: 58 additions & 28 deletions R/summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,47 +13,77 @@ summary.skim_df <- function(object, ...) {
if (is.null(object)) {
stop("dataframe is null.")
}
data_name <- df_name(object)
data_name <- ifelse(data_name %in% c("`.`", ".data"), "Piped data", data_name)
data_name <- gsub("`", "", data_name)
data_name <- ifelse(nchar(data_name) > 25,
paste0(substring(data_name, 1, 25), "..."),
data_name
)

duplicated <- duplicated(object$skim_variable)
counts <- table(type = object$skim_type[!duplicated])
types <- dimnames(counts)[[1]]
types <- paste0(" ", types)

possible_names <- group_names(object)
possible_groups <- if (length(possible_names) > 0) {
paste(possible_names, collapse = ", ")
} else {
"None"
}

summary_object <- c(
data_name,
data_rows(object),
data_cols(object),
if (!is.na(dt_key(object))) dt_key(object),
" ",
" ",
unname(counts),
" ",
possible_groups
structure(
list(
data_name = process_data_name(object),
counts = counts,
types = types,
possible_groups = possible_groups,
dt_key = dt_key(object),
data_rows = data_rows(object),
data_cols = data_cols(object)
),
class = "summary_skim_df"
)
}

process_data_name <- function(object) {
raw_name <- df_name(object)
no_ticks <- gsub("`", "", raw_name)
if (no_ticks %in% c(".", ".data")) {
"Piped data"
} else if (nchar(no_ticks) > 25) {
paste0(substring(no_ticks, 1, 25), "...")
} else {
no_ticks
}
}

#' @export
format.summary_skim_df <- function(x, ...) {
dnames <- c("", get_summary_dnames(x))
summary_values <- c("Values", get_summary_values(x))
paste(
format(dnames),
format(summary_values)
)
}

summary_object <- array(summary_object, dim = c(length(summary_object), 1))
dnames <- c(
"Name", "Number of rows ", "Number of columns ",
if (!is.na(dt_key(object))) "Key", "_______________________ ",
"Column type frequency: ", types, "________________________ ",
get_summary_dnames <- function(summary_object) {
c(
"Name",
"Number of rows ",
"Number of columns ",
if (!is.na(summary_object$dt_key)) "Key",
"_______________________ ",
"Column type frequency: ",
paste0(" ", summary_object$types),
"________________________ ",
"Group variables"
)
}

summary_object <- as.table(summary_object)
dimnames(summary_object) <- list(dnames, c("Values"))
class(summary_object) <- c("summary_skim_df", "table")
summary_object
get_summary_values <- function(summary_object) {
c(
summary_object$data_name,
summary_object$data_rows,
summary_object$data_cols,
if (!is.na(summary_object$dt_key)) summary_object$dt_key,
" ",
" ",
unname(summary_object$counts),
" ",
summary_object$possible_groups
)
}
25 changes: 0 additions & 25 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -254,31 +254,6 @@ Displays in documents of different types will vary. For example, one user found
that the font "Yu Gothic UI Semilight" produced consistent results for
Microsoft Word and Libre Office Write.

### Stripping metadata and empty results tables

In POSIX systems, `skimr` tries to remove the tibble metadata when producing
the results. A complicating factor is tibble's color support, which depends
on environment settings. In particular, not all Windows terminals support
colors in the way that tibble expects.

So, by default, we disable removing metadata on windows. You can turn this
feature on with an option. Either set it when calling print or globally.

```{r, eval = FALSE}
skimmed <- skim(chickwts)
print(skimmed, strip_metadata = TRUE)
options(skimr_strip_metadata = TRUE)
```

Separately, you might need to check the option `crayon.enabled`. Similarly, if
your skimr results tables are empty you may need to run the following

```{r, eval = FALSE}
options(crayon.enabled = FALSE)
```

You need to do this one time per session.

## Inspirations

* [TextPlots](https://github.com/sunetos/TextPlots.jl) for use of Braille
Expand Down
Loading