Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update impute/impute! docstrings #145

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ElOceanografo
Copy link

Documentation changes to resolve my confusion in #144. I combined the docstrings for impute and impute!, since their interfaces are basically identical, the only difference being whether or not they mutate their argument. I also and added some more examples/doctests to show explicitly how the dims keyword arg works. Happy to make more changes if there are subtleties or special cases of the interface I'm missing!

Copy link
Member

@rofinn rofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for the late reply. I get that you're coming at this from a DataFrames perspective, but some of the changes are misleading. We don't actually care about DataFrames specifically, but rather tabular data in general (ie: Tables.jl). Since columns typically represent variables while rows represent observations we want to fill in missing observations based on an existing value for the same variable (not other variables). There are cases where you might want to impute across different or multiple dimension (e.g., svd, knn), but typically you're better off using an n-dimensional array for those use cases.

impute(data::T, imp; kwargs...) -> T
impute_docstring = """
impute(data::T, imp; dims=:, kwargs...) -> T
impute!(data::A, imp; dims=:, kwargs...) -> A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also use T here.

Suggested change
impute!(data::A, imp; dims=:, kwargs...) -> A
impute!(data::T, imp; dims=:, kwargs...) -> T

Returns a new copy of the `data` with the missing data imputed by the imputor `imp`.
For matrices and tables, data is imputed one variable/column at a time.
If this is not the desired behaviour then you should overload this method or specify a different `dims` value.
Returns a new copy of the `data` with the missing data imputed by the imputor `imp`. If the mutating version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave the newlines as they were unless it's necessary? It makes it harder to review the diff.

Returns a new copy of the `data` with the missing data imputed by the imputor `imp`. If the mutating version
`impute!` is used, it will also update the missing values in-place.

By default, `data` is assumed to be laid out like a `DataFrame`, with each column representing a variable and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat misleading. Impute assumes it's working on either an array or a table. Perhaps this wording would satisfy what you're trying to achieve?

By default, data is imputed along the first dimension of the input data (i.e., columns for matrices or tables). Other orientations can be handled via the `dims` keyword argument.


# Arguments
* `data`: the data to be impute
* `imp::Imputor`: the Imputor method to use

# Keyword arguments
* `dims = :`: The dimensions to impute along, either `:cols` or `:rows`. If data are in `DataFrame` format,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually have whitespace around the = for kwargs. Again, we don't focus on DataFrames here, and I think just saying columns is more representative of the behaviour for both tables and arrays.

# Returns
* the input `data` with values imputed
* `AbstractArray{Union{T, Missing}}`: the input `data` with values imputed. (Mutation isn't guaranteed for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type isn't this specific. It could be an array, but it could also be a table type.

@@ -71,46 +87,49 @@ julia> impute(v, Interpolate())
3.0
4.0
5.0
```
"""
function impute(data, imp::Imputor; kwargs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, splitting up with merging of the docstrings and your additions makes it hard to follow what is new vs merged.

@@ -38,24 +38,40 @@ function Base.:(==)(a::T, b::T) where T <: Imputor
return result
end

"""
impute(data::T, imp; kwargs...) -> T
impute_docstring = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than assigning this to a variable, you can just do:

"""
...
""" impute, impute!

This will declare the docstring for those function names and then we can define the actual methods below.

1.0 2.0
1.0 2.0

# In-place imputation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of examples for one docstring. I'm okay with two examples to differentiate :rows from :cols, but NamedDims exists, and the dims keyword seems pretty ubiquitous throughout the Julia ecosystem at this point.


julia> M = [1.0 2.0 missing missing 5.0; 1.1 2.2 3.3 missing 5.5]
2×5 Matrix{Union{Missing, Float64}}:
1.0 2.0 missing missing 5.0
1.1 2.2 3.3 missing 5.5

julia> impute!(M, Interpolate(); dims=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think showing that dims can still be an integer dimension is important, as that's how it works in Base. For example, you can impute along dims=3 for a n-dimensional array.

https://github.com/invenia/Impute.jl/blob/master/test/testutils.jl#L224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants