Update impute/impute! docstrings #145

ElOceanografo · 2024-07-14T22:15:11Z

Documentation changes to resolve my confusion in #144. I combined the docstrings for impute and impute!, since their interfaces are basically identical, the only difference being whether or not they mutate their argument. I also and added some more examples/doctests to show explicitly how the dims keyword arg works. Happy to make more changes if there are subtleties or special cases of the interface I'm missing!

rofinn

Sorry, for the late reply. I get that you're coming at this from a DataFrames perspective, but some of the changes are misleading. We don't actually care about DataFrames specifically, but rather tabular data in general (ie: Tables.jl). Since columns typically represent variables while rows represent observations we want to fill in missing observations based on an existing value for the same variable (not other variables). There are cases where you might want to impute across different or multiple dimension (e.g., svd, knn), but typically you're better off using an n-dimensional array for those use cases.

rofinn · 2024-07-15T04:23:26Z

src/imputors.jl

-    impute(data::T, imp; kwargs...) -> T
+impute_docstring = """
+    impute(data::T, imp; dims=:, kwargs...) -> T
+    impute!(data::A, imp; dims=:, kwargs...) -> A


I think we can also use T here.

Suggested change

impute!(data::A, imp; dims=:, kwargs...) -> A

impute!(data::T, imp; dims=:, kwargs...) -> T

rofinn · 2024-07-15T04:24:37Z

src/imputors.jl

-Returns a new copy of the `data` with the missing data imputed by the imputor `imp`.
-For matrices and tables, data is imputed one variable/column at a time.
-If this is not the desired behaviour then you should overload this method or specify a different `dims` value.
+Returns a new copy of the `data` with the missing data imputed by the imputor `imp`. If the mutating version


Could you leave the newlines as they were unless it's necessary? It makes it harder to review the diff.

rofinn · 2024-07-15T04:35:21Z

src/imputors.jl

+Returns a new copy of the `data` with the missing data imputed by the imputor `imp`. If the mutating version
+`impute!` is used, it will also update the missing values in-place.
+
+By default, `data` is assumed to be laid out like a `DataFrame`, with each column representing a variable and


This is somewhat misleading. Impute assumes it's working on either an array or a table. Perhaps this wording would satisfy what you're trying to achieve?

By default, data is imputed along the first dimension of the input data (i.e., columns for matrices or tables). Other orientations can be handled via the `dims` keyword argument.

rofinn · 2024-07-15T04:37:54Z

src/imputors.jl


 # Arguments
 * `data`: the data to be impute
 * `imp::Imputor`: the Imputor method to use

+# Keyword arguments
+* `dims = :`: The dimensions to impute along, either `:cols` or `:rows`. If data are in `DataFrame` format,


We don't usually have whitespace around the = for kwargs. Again, we don't focus on DataFrames here, and I think just saying columns is more representative of the behaviour for both tables and arrays.

rofinn · 2024-07-15T04:39:15Z

src/imputors.jl

 # Returns
-* the input `data` with values imputed
+* `AbstractArray{Union{T, Missing}}`: the input `data` with values imputed. (Mutation isn't guaranteed for 


The return type isn't this specific. It could be an array, but it could also be a table type.

rofinn · 2024-07-15T04:39:56Z

src/imputors.jl

@@ -71,46 +87,49 @@ julia> impute(v, Interpolate())
 3.0
 4.0
 5.0
-```
-"""
-function impute(data, imp::Imputor; kwargs...)


For future reference, splitting up with merging of the docstrings and your additions makes it hard to follow what is new vs merged.

rofinn · 2024-07-20T21:44:28Z

src/imputors.jl

@@ -38,24 +38,40 @@ function Base.:(==)(a::T, b::T) where T <: Imputor
    return result
 end

-"""
-    impute(data::T, imp; kwargs...) -> T
+impute_docstring = """


Rather than assigning this to a variable, you can just do:

""" ... """ impute, impute!

This will declare the docstring for those function names and then we can define the actual methods below.

rofinn · 2024-07-20T21:56:04Z

src/imputors.jl

+ 1.0  2.0
+ 1.0  2.0
+
+# In-place imputation


This is a lot of examples for one docstring. I'm okay with two examples to differentiate :rows from :cols, but NamedDims exists, and the dims keyword seems pretty ubiquitous throughout the Julia ecosystem at this point.

rofinn · 2024-07-20T22:01:05Z

src/imputors.jl


 julia> M = [1.0 2.0 missing missing 5.0; 1.1 2.2 3.3 missing 5.5]
 2×5 Matrix{Union{Missing, Float64}}:
 1.0  2.0   missing  missing  5.0
 1.1  2.2  3.3       missing  5.5

-julia> impute!(M, Interpolate(); dims=1)


I think showing that dims can still be an integer dimension is important, as that's how it works in Base. For example, you can impute along dims=3 for a n-dimensional array.

https://github.com/invenia/Impute.jl/blob/master/test/testutils.jl#L224

Update impute/impute! docstring

850d5bb

rofinn reviewed Jul 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update impute/impute! docstrings #145

Update impute/impute! docstrings #145

ElOceanografo commented Jul 14, 2024

rofinn left a comment

rofinn Jul 15, 2024

rofinn Jul 15, 2024

rofinn Jul 15, 2024

rofinn Jul 15, 2024

rofinn Jul 15, 2024

rofinn Jul 15, 2024

rofinn Jul 20, 2024

rofinn Jul 20, 2024

rofinn Jul 20, 2024

	impute!(data::A, imp; dims=:, kwargs...) -> A
	impute!(data::T, imp; dims=:, kwargs...) -> T

Update impute/impute! docstrings #145

Are you sure you want to change the base?

Update impute/impute! docstrings #145

Conversation

ElOceanografo commented Jul 14, 2024

rofinn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment