Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Tables.jl API #20

Merged
merged 34 commits into from
Jul 15, 2019
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1d85605
Started work on using Tables API.
rofinn Jul 2, 2019
78ba17f
Fixed up Context code to better fit with Tables interface changes.
rofinn Jul 3, 2019
37b7ef2
Tests and bug fixes for working with Context types directly.
rofinn Jul 3, 2019
45dfea9
Simplify exports deprecation.
rofinn Jul 4, 2019
7c6ceed
API simplification.
rofinn Jul 5, 2019
f54e2e2
Fix automerge on Project.toml
rofinn Jul 5, 2019
a0ab2ea
Drop 0.7 tests and add the deprecated file.
rofinn Jul 5, 2019
0b2bbe7
Added a deprecation for switching to the column-major convention.
rofinn Jul 5, 2019
1f99bbd
Updated tests to new API and moved existing deprecated tests to a dif…
rofinn Jul 5, 2019
20f084e
Added some more tests for Chain and mutating methods.
rofinn Jul 7, 2019
aedd1ab
Introduce dropobs and dropvars and deprecate Drop.
rofinn Jul 8, 2019
5f1f4d8
Add a test for broadcasted imputation over a groupby.
rofinn Jul 8, 2019
e512171
Review changes.
rofinn Jul 9, 2019
83a4bf5
Introduce a vardim kwarg to make the column-major convention easier t…
rofinn Jul 9, 2019
7f90aad
Cleanup docstrings and add jldoctests.
rofinn Jul 10, 2019
0d97c08
Remove test REQUIRE file.
rofinn Jul 10, 2019
eecc2d4
Cleanup docs in README and index page.
rofinn Jul 10, 2019
3edef07
More PR review cleanup.
rofinn Jul 11, 2019
9254ebf
Switched impute!(imp, data) -> impute!(data, imp)
rofinn Jul 11, 2019
2fece06
Remove matrix orientation deprecation.
rofinn Jul 11, 2019
f77421f
Update test/runtests.jl
rofinn Jul 11, 2019
e3ddd08
Update src/imputors.jl
rofinn Jul 11, 2019
cadd28d
Update src/imputors.jl
rofinn Jul 11, 2019
81fc7f8
Missed PR review fixes.
rofinn Jul 11, 2019
4b18a0d
Update src/imputors.jl
rofinn Jul 12, 2019
fafe219
Update src/context.jl
rofinn Jul 12, 2019
8f0f4b6
Throw MethodErrors in fallback table methods.
rofinn Jul 12, 2019
d8b51d4
Update src/imputors/fill.jl
rofinn Jul 15, 2019
7c70227
Update src/context.jl
rofinn Jul 15, 2019
d5ff2c5
Use selectdim for obswise and varwise.
rofinn Jul 15, 2019
5591076
Use ∘ in tests to compose imputor pipelines.
rofinn Jul 15, 2019
ec902fe
Change !any(ismissing, ...) tests to all(!ismissing, ...)
rofinn Jul 15, 2019
e823cc2
Restrict RDatasets to >=0.6.2
rofinn Jul 15, 2019
051d6ce
Don't pipe to materializer.
rofinn Jul 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
environment:
matrix:
- julia_version: 0.7
- julia_version: 1.0
- julia_version: nightly

Expand Down
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ os:
- linux
- osx
julia:
- 0.7
- 1.0
- nightly
notifications:
Expand Down
10 changes: 8 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ authors = ["Invenia Technical Computing"]
version = "0.2.0"

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
rofinn marked this conversation as resolved.
Show resolved Hide resolved
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
DataFrames = "0.17, 0.18"
IterTools = "1.2"
Tables = "0.2"
julia = "1"

[extras]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["RDatasets", "Test"]
test = ["DataFrames", "RDatasets", "Test"]
157 changes: 47 additions & 110 deletions src/Impute.jl
Original file line number Diff line number Diff line change
@@ -1,14 +1,38 @@
module Impute

using DataFrames
using IterTools
using Statistics
using StatsBase
using Tables: Tables, materializer, istable

import DataFrames: DataFrameRow
import Base.Iterators
import Base.Iterators: drop

export impute, impute!, chain, chain!, drop, drop!, interp, interp!, ImputeError

const Dataset = Union{AbstractArray, DataFrame}
function __init__()
sym = join(["chain", "chain!", "drop", "drop!", "interp", "interp!"], ", ", " and ")

@warn(
"""
The following symbols will not be exported in future releases: $sym.
Please qualify your calls with `Impute.<method>(...)` or explicitly import the symbol.
"""
)

@warn(
"""
The default limit for all impute functions will be 1.0 going forward.
If you depend on a specific threshold please pass in an appropriate `AbstractContext`.
"""
)

@warn(
"""
All matrix imputation methods will be switching to the JuliaStats column-major convention
rofinn marked this conversation as resolved.
Show resolved Hide resolved
(e.g., each column corresponds to an observation, and each row corresponds to a variable).
"""
)
end

"""
ImputeError{T} <: Exception
Expand All @@ -28,118 +52,31 @@ include("context.jl")
include("imputors.jl")

const global imputation_methods = Dict{Symbol, Type}(
rofinn marked this conversation as resolved.
Show resolved Hide resolved
:drop => Drop,
:drop => DropObs,
:dropobs => DropObs,
:dropvars => DropVars,
:interp => Interpolate,
:fill => Fill,
:locf => LOCF,
:nocb => NOCB,
)

"""
impute!(data::Dataset, method::Symbol=:interp, args...; limit::Float64=0.1)

Looks up the `Imputor` type for the `method`, creates it and calls
`impute!(imputor::Imputor, data::Dataset, limit::Float64)` with it.

# Arguments
* `data::Dataset`: the datset containing missing elements we should impute.
* `method::Symbol`: the imputation method to use
(options: [`:drop`, `:fill`, `:interp`, `:locf`, `:nocb`])
* `args::Any...`: any arguments you should pass to the `Imputor` constructor.
* `limit::Float64`: missing data ratio limit/threshold (default: 0.1)
"""
function impute!(data::Dataset, method::Symbol, args...; limit::Float64=0.1)
imputor_type = imputation_methods[method]
imputor = length(args) > 0 ? imputor_type(args...) : imputor_type()
return impute!(imputor, data, limit)
end

"""
impute!(data::Dataset, missing::Function, method::Symbol=:interp, args...; limit::Float64=0.1)

Creates the appropriate `Imputor` type and `Context` (using `missing` function) in order to call
`impute!(imputor::Imputor, ctx::Context, data::Dataset)` with them.

# Arguments
* `data::Dataset`: the datset containing missing elements we should impute.
* `missing::Function`: the missing data function to use
* `method::Symbol`: the imputation method to use
(options: [`:drop`, `:fill`, `:interp`, `:locf`, `:nocb`])
* `args::Any...`: any arguments you should pass to the `Imputor` constructor.
* `limit::Float64`: missing data ratio limit/threshold (default: 0.1)
"""
function impute!(data::Dataset, missing::Function, method::Symbol, args...; limit::Float64=0.1)
imputor_type = imputation_methods[method]
imputor = length(args) > 0 ? imputor_type(args...) : imputor_type()
ctx = Context(*(size(data)...), 0, limit, missing)
return impute!(imputor, ctx, data)
include("deprecated.jl")

let
rofinn marked this conversation as resolved.
Show resolved Hide resolved
for (k, v) in imputation_methods
rofinn marked this conversation as resolved.
Show resolved Hide resolved
local typename = nameof(v)
local f = k
rofinn marked this conversation as resolved.
Show resolved Hide resolved
local f! = Symbol(k, :!)
rofinn marked this conversation as resolved.
Show resolved Hide resolved

# NOTE: The
rofinn marked this conversation as resolved.
Show resolved Hide resolved
@eval begin
$f(data; kwargs...) = impute($typename(; _extract_context_kwargs(kwargs...)...), data)
$f!(data; kwargs...) = impute!($typename(; _extract_context_kwargs(kwargs...)...), data)
$f(; kwargs...) = data -> impute($typename(; _extract_context_kwargs(kwargs...)...), data)
$f!(; kwargs...) = data -> impute!($typename(; _extract_context_kwargs(kwargs...)...), data)
end
rofinn marked this conversation as resolved.
Show resolved Hide resolved
end
end

"""
impute(data::Dataset, args...; kwargs...)

Copies the `data` before calling `impute!(new_data, args...; kwargs...)`
"""
function impute(data::Dataset, args...; kwargs...)
return impute!(deepcopy(data), args...; kwargs...)
end

"""
chain!(data::Dataset, missing::Function, imputors::Imputor...; kwargs...)

Creates a `Chain` with `imputors` and calls `impute!(imputor, missing, data; kwargs...)`
"""
function chain!(data::Dataset, missing::Function, imputors::Imputor...; kwargs...)
imputor = Chain(imputors...)
return impute!(imputor, missing, data; kwargs...)
end

"""
chain!(data::Dataset, imputors::Imputor...; kwargs...)

Creates a `Chain` with `imputors` and calls `impute!(imputor, data; kwargs...)`
"""
function chain!(data::Dataset, imputors::Imputor...; kwargs...)
imputor = Chain(imputors...)
return impute!(imputor, data; kwargs...)
end

"""
chain(data::Dataset, args...; kwargs...)

Copies the `data` before calling `chain!(data, args...; kwargs...)`
"""
function chain(data::Dataset, args...; kwargs...)
result = deepcopy(data)
return chain!(data, args...; kwargs...)
end

"""
drop!(data::Dataset; limit=1.0)

Utility method for `impute!(data, :drop; limit=limit)`
"""
drop!(data::Dataset; limit=1.0) = impute!(data, :drop; limit=limit)

"""
drop(data::Dataset; limit=1.0)

Utility method for `impute(data, :drop; limit=limit)`
"""
Iterators.drop(data::Dataset; limit=1.0) = impute(data, :drop; limit=limit)

"""
interp!(data::Dataset; limit=1.0)

Utility method for `impute!(data, :interp; limit=limit)`
"""
interp!(data::Dataset; limit=1.0) = impute!(data, :interp; limit=limit)

"""
interp(data::Dataset; limit=1.0)

Utility method for `impute(data, :interp; limit=limit)`
"""
interp(data::Dataset; limit=1.0) = impute(data, :interp; limit=limit)

end # module
Loading