You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using fct_reorder() in presence of missing values, you often do not get the expected result.
For instance, in the following code, the "blue" level gets an NA summary and is therefore sent to the last level of the result. In larger datasets, where missing values happen everywhere, this results in fct_reorder() doing frustratingly nothing.
This is especially unexpected as the default function, median(), has na.rm=FALSE by default. Using other common summary functions like min() and max() has the same problem.
There is a mention of this in the documentation (... Other arguments passed on to .fun. A common argument is na.rm = TRUE.), but I don't think this is explicit enough.
Could there be some kind of warning to suggest we add na.rm=TRUE? For instance if(any(is.na(summary))) warn("missing").
Otherwise, maybe this should be mentioned up in the description, for instance something like "Any missing value returned by the summary function for a level will cause this level to be sent to the end." (ok that's not well written but you get the point)
You might even want the user to explicitly opt-in for na.rm=FALSE, and by default inject na.rm=TRUE to the summary function if na.rm is in formals(.fun).. This is a bit invasive, I'll give you that, but I cannot see any real use case where na.rm=FALSE could be wanted.
The text was updated successfully, but these errors were encountered:
hadley
changed the title
fct_reorder() yields unexpected result in presence of missing valuesfct_reorder() yields unexpected result in presence of missing values
Jan 3, 2023
Hi,
When using
fct_reorder()
in presence of missing values, you often do not get the expected result.For instance, in the following code, the "blue" level gets an
NA
summary and is therefore sent to the last level of the result. In larger datasets, where missing values happen everywhere, this results infct_reorder()
doing frustratingly nothing.Created on 2022-08-10 by the reprex package (v2.0.1)
This is especially unexpected as the default function,
median()
, hasna.rm=FALSE
by default. Using other common summary functions likemin()
andmax()
has the same problem.There is a mention of this in the documentation (
... Other arguments passed on to .fun. A common argument is na.rm = TRUE.
), but I don't think this is explicit enough.Could there be some kind of warning to suggest we add
na.rm=TRUE
? For instanceif(any(is.na(summary))) warn("missing")
.Otherwise, maybe this should be mentioned up in the description, for instance something like "Any missing value returned by the summary function for a level will cause this level to be sent to the end." (ok that's not well written but you get the point)
You might even want the user to explicitly opt-in for
na.rm=FALSE
, and by default injectna.rm=TRUE
to the summary function ifna.rm
is informals(.fun).
. This is a bit invasive, I'll give you that, but I cannot see any real use case wherena.rm=FALSE
could be wanted.The text was updated successfully, but these errors were encountered: