Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_outlier improvement (easystats/datawizard#177) #443

Merged
merged 21 commits into from
Aug 25, 2022
Merged

check_outlier improvement (easystats/datawizard#177) #443

merged 21 commits into from
Aug 25, 2022

Conversation

rempsyc
Copy link
Member

@rempsyc rempsyc commented Jun 30, 2022

(Closes: #466, closes #469)

Context

This is a pull request aiming to improve the printing method of check_outliers, based on easystats/datawizard#177.

Specifically, it aims to accomplish the following in the print output: (a) state the methods used; (b) state the thresholds used; and (c) state the variables tested. Additionally, it also aims to (d) report outliers per variable (for univariate methods), (e) report whether any observation comes up as outlier for several variables (when that is the case), and (f) include an optional ID variable along the row information. The changes were inspired by rempsyc::find_mad.

This is a prototype/proof of concept. (a) to (c) were implemented for all methods, but (d) to (f) were only implemented for method "zscore" for now. Before working on this further, I would like to get feedback to know whether it is worth implementing for other methods, and if modifications are needed before proceeding (as I would need to adapt the code to each method individually).

Reprex

Reprex demo of the changes below:

# Setup data
data <- datawizard::rownames_as_column(mtcars, var = "car")

# Basic test
performance::check_outliers(data, method = c("mahalanobis", "mcd", "zscore"))
#> 4 outliers detected: cases 9, 19, 30, 31.
#> - Based on the following methods: mahalanobis, mcd, zscore.
#> - Using the following thresholds: 21.92, 21.92, 1.96.
#> - For variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> Note: Outliers were classified as such by at least half of the selected methods. 
#> 
#> ------------------------------------------------------------------------
#> The following observations were considered outliers for more than one variable by the univariate methods: 
#> 
#>   Row n_Zscore
#> 9  31        2
#> 
#> ------------------------------------------------------------------------
#> Outliers per variable (univariate methods): 
#> 
#> $mpg
#>   Row Distance_Zscore
#> 1  18        2.042389
#> 2  20        2.291272
#> 
#> $hp
#>   Row Distance_Zscore
#> 1  31        2.746567
#> 
#> $drat
#>   Row Distance_Zscore
#> 1  19        2.493904
#> 
#> $wt
#>   Row Distance_Zscore
#> 1  15        2.077505
#> 2  16        2.255336
#> 3  17        2.174596
#> 
#> $qsec
#>   Row Distance_Zscore
#> 1   9        2.826755
#> 
#> $carb
#>   Row Distance_Zscore
#> 1  30        1.973440
#> 2  31        3.211677

# Add ID information
outliers_list <- performance::check_outliers(
  data, method = c("mahalanobis", "mcd", "zscore"), ID = "car")
outliers_list
#> 4 outliers detected: cases 9, 19, 30, 31.
#> - Based on the following methods: mahalanobis, mcd, zscore.
#> - Using the following thresholds: 21.92, 21.92, 1.96.
#> - For variables: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#> 
#> Note: Outliers were classified as such by at least half of the selected methods. 
#> 
#> ------------------------------------------------------------------------
#> The following observations were considered outliers for more than one variable by the univariate methods: 
#> 
#>   Row           car n_Zscore
#> 9  31 Maserati Bora        2
#> 
#> ------------------------------------------------------------------------
#> Outliers per variable (univariate methods): 
#> 
#> $mpg
#>   Row            car Distance_Zscore
#> 1  18       Fiat 128        2.042389
#> 2  20 Toyota Corolla        2.291272
#> 
#> $hp
#>   Row           car Distance_Zscore
#> 1  31 Maserati Bora        2.746567
#> 
#> $drat
#>   Row         car Distance_Zscore
#> 1  19 Honda Civic        2.493904
#> 
#> $wt
#>   Row                 car Distance_Zscore
#> 1  15  Cadillac Fleetwood        2.077505
#> 2  16 Lincoln Continental        2.255336
#> 3  17   Chrysler Imperial        2.174596
#> 
#> $qsec
#>   Row      car Distance_Zscore
#> 1   9 Merc 230        2.826755
#> 
#> $carb
#>   Row           car Distance_Zscore
#> 1  30  Ferrari Dino        1.973440
#> 2  31 Maserati Bora        3.211677

# Since only the printing method is modified, old features still work:

# The object is a binary vector...
filtered_data <- data[!outliers_list, ] # And can be used to filter a dataframe
nrow(filtered_data) # New size, 28 (4 outliers removed)
#> [1] 28

# Using `as.data.frame()`, we can access more details!
outliers_info <- as.data.frame(outliers_list)
head(outliers_info)
#>   Distance_Zscore Outlier_Zscore Distance_Mahalanobis Outlier_Mahalanobis
#> 1        1.189901              0             8.946673                   0
#> 2        1.189901              0             8.287933                   0
#> 3        1.224858              0             8.937150                   0
#> 4        1.122152              0             6.096726                   0
#> 5        1.043081              0             5.429061                   0
#> 6        1.564608              0             8.877558                   0
#>   Distance_MCD Outlier_MCD Outlier
#> 1    11.508353           0       0
#> 2     8.618865           0       0
#> 3    12.265382           0       0
#> 4    14.351997           0       0
#> 5     8.639128           0       0
#> 6    12.003840           0       0
outliers_info$Outlier # Including the probability of being an outlier
#>  [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333
#>  [8] 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> [15] 0.3333333 0.3333333 0.3333333 0.3333333 0.6666667 0.3333333 0.3333333
#> [22] 0.0000000 0.0000000 0.3333333 0.0000000 0.0000000 0.3333333 0.3333333
#> [29] 0.3333333 0.6666667 0.6666667 0.0000000

# For statistical models ---------------------------------------------
model <- lm(disp ~ mpg + hp, data = mtcars)
mod.outliers <- check_outliers(model)
mod.outliers
#> 1 outliers detected: cases 31.
#> - Based on the following methods: cook, pareto.
#> - Using the following thresholds: 0.81, 0.7.
#> - For variables: (Whole model)
#> 
#> Note: Outliers were classified as such by at least half of the selected methods.

# Check plots
plot(mod.outliers)

check_model(model)

# However, there seems to be a presentation issue when using a
# vector instead of a dataframe because then it is not possible to
# obtain the column name (since it has none), so it appears as x instead.

# Find all observations beyond +/- 2 SD
check_outliers(data$mpg, method = "zscore", threshold = 2)
#> 2 outliers detected: cases 18, 20.
#> - Based on the following methods: zscore.
#> - Using the following thresholds: 2.
#> - For variables: x

Created on 2022-07-01 by the reprex package (v2.0.1)

Observations

  • I got used to programming with dplyr, so it was a nice challenge attempting to convert everything to base R and datawizard. Feel free to make suggestions to improve the code.
  • There was a method called robust in thresholds referring to mahalanobis_robust, and I have changed it as such to avoid confusion with e.g., zscore_robust and to be consistent with method names to be able to refer to it back later. There was also a threshold called zscore but not one called zscore_robust, as the first one was used in both cases. Again, for clarity and compatibility with later code, I have made zscore_robust its own.
  • Personally, I don’t like the output printing in red, it’s difficult to read (I’m using a dark theme so the contrast isn’t good). The green is OK though. I tried modifying the red to a smoother red but it seems few colours are allowed with insight::print_color (seems a limitation of cat(); below)? In any case, I think it shouldn’t be red anyway, that would be more for errors, so I picked yellow for now, as I feel it is functionally close and much more readable.
In .colour(colour = color, x = text) : `color` #FF4040 not yet supported.
  • I got rid of the warning bit at the beginning of the output. It seems overkill since detecting outliers is the goal of the function, so it’s almost confusing ("is there something wrong with the outlier detection process?", one might wonder). It also adds text that adds no real information.
  • Currently, information about outliers belonging to which variables is not easily accessible. Thus, I had to apply e.g, .check_outliers_zscore again on individual columns with lapply.
  • As seen in the reprex, there seems to be a problem when using a vector instead of a dataframe because then it is not possible to obtain the column name (since it has none), so it appears as x instead. Perhaps there would be a way to make printing the variables line contingent on providing a dataframe.
  • I also corrected some minor typos.
  • The output formatting can be modified if you have a particular formatting format at the easyverse that I have missed. Open to suggestions for improvement.

What's next

  • Implement per-variable output for each of the other univariate methods:
    • "zscore"
    • "zscore_robust"
    • "iqr"
    • "ci"
    • "eti"
    • "hdi"
    • "bci"
  • Integrate all univariate methods in the outlier frequency table.
  • Also include multivariate detections in the outlier frequency table (maybe?) since column names don't need to be specified so that should make them compatible. That would mean adding support for all multivariate/model-specific methods:
    • "cook"
    • "pareto"
    • "mahalanobis"
    • "mahalanobis_robust"
    • "mcd"
    • "ics"
    • "optics"
    • "iforest"
    • "lof"
  • Add support for grouped data frames
  • Add support for check_outliers.BFBayesFactor
  • Add support for check_outliers.gls

Questions

  1. Right now, the thresholds are displayed on a separate line. I was wondering if it would make sense to save one line by doing it instead like this:
    - Based on the following methods and thresholds: mahalanobis (21.92), iqr (1.5), zscore (1.96).
  2. At first, I was tempted to add (s) to all places where words could be either plural or singular, like so:
9 outlier(s) detected: ...
- Based on the following method(s): ...
- Using the following threshold(s): ...
- For variable(s): ...
  • But I felt it impacted readability because cases were already given in parentheses on the first line (so I switched the parentheses for a colon). Other possibilities would be to report the cases on its own line (and use (s)) or yet again adapt the function to print a different message depending on the number of cases/methods/variables. Might be overthinking and not at all necessary though.
  1. Since using multiple methods aims to reach a consensus (composite scores > 0.5), the number of outliers reported at the top can differ from the number of outliers per variable as reported at the bottom (for the univariate methods).
    • I think it might be a bit less interesting to get the detailed output when using several methods since there is already a decision protocol in place. Would it make more sense to only print that part when a single, univariate method is selected?
  2. Right now, outliers per variable are computed separately, but we could add the row and ID columns in the utilities section (.check_outliers_zscore, etc.) so that this info is also part of the outlier info data frame (outliers_info in the examples). Only if useful though.
  3. One of the challenges of adapting rempsyc::find_mad to check_outliers is that the former only uses one method (zscore_robust), whereas the latter needs to support multiple methods, which complicates the output formatting, especially for the per-variable section.
    • For example, it makes sense to have a by-column output for univariate methods, but, by definition, not for multivariate ones.
    • Another downside to the current approach is when using method = "all", because then the output will be very long. Perhaps we could only print (and compute) the second per-variable part with an optional argument, detailed = TRUE (or the like) passed to check_outliers.
    • Another possibility would be to print the long output only if a single method is selected like suggested in point 3.

Looking forward to your comments and feedback.

@rempsyc rempsyc changed the title check_outlier improvement (easystats/datawizard#177) check_outlier improvement (easystats/datawizard#177) Jun 30, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 1, 2022

Codecov Report

Merging #443 (2f04137) into main (23d81d0) will decrease coverage by 1.09%.
The diff coverage is 6.55%.

@@            Coverage Diff             @@
##             main     #443      +/-   ##
==========================================
- Coverage   32.67%   31.58%   -1.10%     
==========================================
  Files          80       80              
  Lines        4682     5047     +365     
==========================================
+ Hits         1530     1594      +64     
- Misses       3152     3453     +301     
Impacted Files Coverage Δ
R/item_intercor.R 83.33% <ø> (ø)
R/check_outliers.R 9.30% <6.55%> (+9.30%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@strengejacke
Copy link
Member

Thanks a lot, impressive PR! Indeed, it takes some time to read your explanation and look at the changes, but I'll try to do this the next days.

@strengejacke
Copy link
Member

One point: please check if check_model() resp. plot(check_outliers())` still works as expected.

@rempsyc
Copy link
Member Author

rempsyc commented Jul 1, 2022

Thanks so much. And no rush. We got time. And yes, I forgot to add the plot demo! Thank you for pointing that out. I have updated my reprex accordingly. 👍

@rempsyc rempsyc closed this Jul 15, 2022
@rempsyc rempsyc deleted the check_model branch July 15, 2022 14:18
@rempsyc rempsyc restored the check_model branch July 15, 2022 22:06
@bwiernik
Copy link
Contributor

Is there a reason you closed and deleted? @rempsyc

@rempsyc
Copy link
Member Author

rempsyc commented Jul 15, 2022

I'm sorry I realized I named my branch check_model instead of check_outliers, so I renamed it on my end thinking it would update here without breaking anything. Somehow it closed the PR! I'm not sure how to fix this... I just tried restoring the branch but I'm not sure it worked correctly.

@rempsyc
Copy link
Member Author

rempsyc commented Jul 15, 2022

The correct branch is here: https://github.com/rempsyc/performance/tree/check_outliers

However, I don't see how I can merge them back to this PR or whether I should open a new PR with the correct name. I'm afraid opening a new PR will lose the existing discussion here.

Maybe it's not a big deal to keep the wrong branch name after all. I'm sorry for this unexpected extra trouble!

@rempsyc rempsyc reopened this Jul 15, 2022
@bwiernik
Copy link
Contributor

Maybe use orange as the color, rather than red? Or let's pick a better red.

@strengejacke where are the insight colors specified?

@rempsyc
Copy link
Member Author

rempsyc commented Jul 25, 2022

?insight::print_color provides the following information:

colour	
Character vector, indicating the colour for printing. May be one of "red", "yellow", "green",
"blue", "violet", "cyan" or "grey".

So there is no orange (yet). It would indeed be nice to know how to add more colours. insight::print_color is basically only defined as cat(.colour(colour = color, x = text)), but I can find no information with ?.colour. Personally though, I would probably go with orange, else for a better red, I would choose a lighter one, like almost pinkish.

@bwiernik
Copy link
Contributor

I like the default R plotting palette col = 2. That's the red I picked out when R updated its colors in 4.0

@rempsyc
Copy link
Member Author

rempsyc commented Jul 25, 2022

Ok colours are defined here: https://github.com/easystats/insight/blob/18b5aaee8735ff97022b045cd4d81ffe7e207ff2/R/colour_tools.R

E.g. .colour is:

.colour <- function(colour = "red", x) {
  switch(colour,
    red = .red(x),
    yellow = .yellow(x),
    green = .green(x),
    [...]
     )
}

And individual colours each have their own function, e.g.:

.red <- function(x) {
  if (.supports_color()) {
    x[!is.na(x)] <- paste0("\033[31m", x[!is.na(x)], "\033[39m")
  }
  x
}

So definitely seems possible to add more choices. Should I attempt a PR to insight to add colour col = 2?

@bwiernik
Copy link
Contributor

Maybe let's just swap the current color palette (which I think is the default color palette used by {cli} for one of the other palettes that is more accessible? Maybe cli::ansi_palette_show("vscode")?

image

See cli::ansi_palettes["vscode",]

I think the bright colors there look pretty good on both light and dark backgrounds.

Another option that might be even better would be to support {cli}'s option getOption("cli.palette"). If that is specified, we could call the relevant {cli} function and use the user-specified palette rather than the default.

@bwiernik
Copy link
Contributor

In any event, the color printing discussion can move to {insight}. Let's get this one merged.

Another great enhancement in the future would be for this function to pull loo results for Stan based models

@rempsyc
Copy link
Member Author

rempsyc commented Jul 29, 2022

Should I convert this PR to a draft to avoid an accidental merge? Given that there are still questions/issues to be resolved. Perhaps it would be helpful if I were to rephrase my earlier questions as simplified suggestions instead? Here’s an attempt:

  • 1. Yes (include thresholds on same line as methods)
  • 2. OK to leave like this (always use plural)
  • 3. Yes (don’t print detailed output when > 1 method selected)
  • 4. Yes (add row and ID to outlier info data frame)
  • 5. Yes (don’t print detailed output when > 1 method selected)

Also note that lists/dataframes can’t be printed with insight::print_color so they print white instead of yellow/red like the non-detailed output.

Still, I would like to receive the “green light” before moving forward with the rest of the changes in case this is not the outcome you desire. Thoughts?

@bwiernik
Copy link
Contributor

Also note that lists/dataframes can’t be printed with insight::print_color so they print white instead of yellow/red like the non-detailed output.

That's good. A whole data frame would be too much in color I think

@bwiernik
Copy link
Contributor

Sorry I didn't see the questions at the bottom of the first post.

  1. Same line like you suggest is fine
  2. Fine to leave plural for now. We should make an insight function for pluralizing words that we can use here and elsewhere @strengejacke
  3. Sure I think omitting detailed output is fine then. What all would show and not show? If we do that, do we want to default to a single method like Cook's D/LOO?
  4. I'm not sure what this question means.
  5. Agreed let's reduce output per (3)

@rempsyc
Copy link
Member Author

rempsyc commented Jul 29, 2022

Great! I'll start working on that. For 3, sorry that it wasn't clear. I'm referring to the demo from the example in the help file:

library(performance)

# Setup data
data <- datawizard::rownames_as_column(mtcars, var = "car")

# Add ID information
outliers_list <- performance::check_outliers(
  data, method = c("mahalanobis", "mcd", "zscore"), ID = "car")

# Using `as.data.frame()`, we can access more details!
outliers_info <- as.data.frame(outliers_list)
head(outliers_info)
#>   Distance_Zscore Outlier_Zscore Distance_Mahalanobis Outlier_Mahalanobis
#> 1        1.189901              0             8.946673                   0
#> 2        1.189901              0             8.287933                   0
#> 3        1.224858              0             8.937150                   0
#> 4        1.122152              0             6.096726                   0
#> 5        1.043081              0             5.429061                   0
#> 6        1.564608              0             8.877558                   0
#>   Distance_MCD Outlier_MCD Outlier
#> 1    11.508353           0       0
#> 2     8.618865           0       0
#> 3    12.265382           0       0
#> 4    14.351997           0       0
#> 5     8.639128           0       0
#> 6    12.003840           0       0

Created on 2022-07-28 by the reprex package (v2.0.1)

So here we see that the data frame resulting from as.data.frame(outliers_list) does not contain the ID information, although it was requested. Should I add it there as well, i.e., as a unique column at the very beginning? Furthermore, the row number is contained as the row names, but sometimes it is useful to have it as a column as well.

Ultimately, that data frame is similar to my detailed list output, except that it prints the information for all observations, not only outliers. Thus why I was thinking of adding ID/row there as well for consistency.

Sure I think omitting detailed output is fine then. What all would show and not show? If we do that, do we want to default to a single method like Cook's D/LOO?

If we are to print the detailed output for method = all, then the Outliers per variable (univariate methods) section would repeat again as many times as there are univariate methods (and would be renamed accordingly for each method, e.g., Outliers per variable (z-score)). However, if we are to only print detailed output when a single method is selected, then the detailed section will not be visible.

I don't think we need to change the default methods. Sure, most people using multiple methods might never see the changes and detailed output, but I think that's ok since it would mostly be useful to those using single methods anyway. Plus, default method for class numeric is already a single univariate method (zscore_robust).

@bwiernik
Copy link
Contributor

I think reduced output with "all" is good. And let's add the id as a column

@strengejacke
Copy link
Member

Thanks a lot, that's really impressive! I'll try to look at this PR asap.

@bwiernik
Copy link
Contributor

bwiernik commented Aug 14, 2022

This is a lot of detail and work! Wow! If anyone wants me to look at something, can you @ me with a pointer to the spot?

@strengejacke
Copy link
Member

There is a test failing (there is only one test) but I can’t figure out why since they seem to be correctly outputting the same error message.

The message is formatted using insight::format_message(). In the test environment, you have a certain line length, so it's likely that the output which is tested against has multiple lines. I shortened the string, should work now.

@strengejacke
Copy link
Member

I think the warnings need to be addressed, in particular the usage of data_filter() probably needs to be replaced by base R, since we don't want to define global variables.

@strengejacke
Copy link
Member

btw, I don't understand the meaning of the error argument. If FALSE, the function does not stop, but proceeds, even inf NA`s are included, which should not be allowed if I understand right?

strengejacke referenced this pull request in easystats/datawizard Aug 24, 2022
@rempsyc
Copy link
Member Author

rempsyc commented Aug 24, 2022

I think the warnings need to be addressed, in particular the usage of data_filter() probably needs to be replaced by base R, since we don't want to define global variables.

I'm sorry, I was working on yet another version with the select helpers fixed, insight::format_message integrated, new error message for Mahalanobis, etc. Instead of using base R, I decided to opt for our newly merged changes in data_remove and friends since the regex argument should now work in this new version. [edit: whoops, see what you mean now, for data_filter it still won't work because no nse/regex argument, so I'm switching to data_find like you indicated in your other comment]

I don't understand the meaning of the error argument

The error argument was purely to show both behaviours in the long reprex, I've removed it already.

(And thanks for fixing the test!)

Merge branch 'check_model' of https://github.com/rempsyc/performance into check_model

# Conflicts:
#	DESCRIPTION
#	R/check_outliers.R
#	man/check_outliers.Rd
@rempsyc
Copy link
Member Author

rempsyc commented Aug 24, 2022

OK! We’re almost there.

I've hidden my long comments post, because all of the points have either been resolved or moved to their respective issue. Except for one thing, the tryCatch call. Essentially, I was tempted to remove it to have cleaner code if it wasn’t needed anymore, but I guess there's no harm in keeping it there even though I’m not sure of its relevance. The long comment can still be consulted if one would like to see the reasoning behind all my changes.

We should definitely add more tests for check_outliers in the future, but it will have to wait some time (or for someone else to do it). I’ve spent a LOT more time on this PR (and others) than expected when Dom initially invited me to contribute this PR 😂 (also this was only my second PR ever, the first one a week before only).

Also, Dom mentioned

perhaps also adding a vignette on outliers detection would be something to look for

Would transforming my long reprex above in a vignette be useful at all?

@rempsyc
Copy link
Member Author

rempsyc commented Aug 24, 2022

R CMD check is failing because I imported datawizard (>= 0.5.1.1) to use the newly added regex arguments:

Error: Error: <callr_remote_error: Cannot install packages:
  * deps::.: Can't install dependency datawizard>

But current version is 0.5.1.2 so it should work? Is it because it takes 24h to update on r-universe so we have to wait? Or is it because it downloads only the latest CRAN version, so I'm not allowed to program using the latest features?

Note: no R CMD check error/warning/note locally.

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Aug 25, 2022

Would transforming my long reprex above in a vignette be useful at all?

We can leave it that for the future, rather than a reprex of possible options I had in mind more like a walkthrough of why/how outliers detection, but there's no hurry, we can come back to that later. It's a lot of work already ;)

@strengejacke
Copy link
Member

Is it because it takes 24h to update on r-universe so we have to wait?

You can use the Remotes field in the DESCRIPTION file to include the latest master/main branch of GitHub versions of packages for compiling/testing.

@strengejacke
Copy link
Member

All checks pass, so is here anything that needs to be done, or is it ready for merging?

@rempsyc
Copy link
Member Author

rempsyc commented Aug 25, 2022

@strengejacke strengejacke merged commit 6b68edd into easystats:main Aug 25, 2022
@strengejacke
Copy link
Member

Cool, thanks again for the work!
Can you please check which issue can now be closed?

@rempsyc rempsyc deleted the check_model branch August 25, 2022 17:24
@rempsyc
Copy link
Member Author

rempsyc commented Aug 25, 2022

I was hoping to close all issues I opened with this PR, but I could only realistically close #466 and #469 (which was done automatically when you merged). #467, #468, and #470 are not resolved yet, but I can make a new PR once they are. For easystats/datawizard#216, should commit 2c58497 have closed it? Or is there something more you were trying to implement? Because as far as I'm concerned, it works now.

@rempsyc
Copy link
Member Author

rempsyc commented Aug 25, 2022

For #468 and #470, I'm not planning on working on them because I don't know enough about the topic, so I labelled them as low priority for now. But would it be better to close them? They would still be archived and findable if someone eventually finds the motivation to address them but they wouldn't occupy precious space in the processing tray (open issues)... To be honest they are not a big priority for me but given that iforest is given as a method in the check_outliers documentation, I thought that it may be important to either add it back quickly, or remove it from the docs.

I expect #467 to resolve within half a month (or less).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants