An attempt at rewriting simple imputation methods as iterators #60

rofinn · 2020-03-06T17:45:25Z

I'll be immediately closing this PR, but I figured I'd document why rewriting even the simple imputation methods as iterators didn't seem reasonable in the end.

Many methods require looking forward or behind (e.g., locf, nocb, interpolation) which isn't guaranteed to work consistently for all iterators.
Generalizing univariate iterators to multivariate datasets is challenging to do in a performant way (e.g., iterating at different rates for each variable may result in multiple copies of an observation).
Preserving type information may be challenging when splitting and combine observations.
In some cases we need to pass significant information around in each iteration state. Particularly, if we're nesting many iterators.

Overall, I think a better approach moving forward will be to provide a handful of methods/utilities on a small selection of types that can be applied out of the box at the cost of making multiple passes. We can also focus on making the Imputor API more extensible to help get folks up and running if they think they can impute all of their data in a single pass.

An attempt at rewriting simple imputation methods as iterators.

b226f04

rofinn closed this Mar 6, 2020

rofinn mentioned this pull request Mar 6, 2020

Refactoring #17

Closed

8 tasks

rofinn mentioned this pull request Sep 25, 2020

API simplification #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An attempt at rewriting simple imputation methods as iterators #60

An attempt at rewriting simple imputation methods as iterators #60

rofinn commented Mar 6, 2020

An attempt at rewriting simple imputation methods as iterators #60

An attempt at rewriting simple imputation methods as iterators #60

Conversation

rofinn commented Mar 6, 2020