-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue adding Metadata to dataframes #1458
Closed
Closed
Changes from all commits
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
ec32311
Make `describe` return a DataFrame
pdeffebach 3ac8af1
Delete REQUIRE
pdeffebach c0227e9
Add files via upload
pdeffebach a6fa29e
fix rowvectors
pdeffebach d9621b4
Update abstractdataframe.jl
pdeffebach fc6ef8c
Update abstractdataframe.jl
pdeffebach c477684
Update abstractdataframe.jl
pdeffebach 215233f
Get rid of describe tests
pdeffebach 21936e1
delete tests
pdeffebach ab120a4
Merge remote-tracking branch 'origin/master'
pdeffebach 9f80ddf
Add improved described
pdeffebach bc3f2fa
Improve time for describe
pdeffebach e29de8b
Fix colstats vector in kw
pdeffebach 935d2f8
Fix kw again
pdeffebach 6561d89
Improve kwargs closure
pdeffebach f9b5e77
Fix NUnique error
pdeffebach 01d9f86
Add docstring
pdeffebach 01d836d
edit docstring
pdeffebach 9f69461
Merge pull request #1 from pdeffebach/describe_to_dataframe
pdeffebach c891344
finall fix noNunique error
pdeffebach 5ef7a16
more fixes
pdeffebach d9db833
fuck it more changes
pdeffebach 4c54110
Respond to nalimilan's comments
pdeffebach ebfa988
fix stats
pdeffebach fb3ca98
fix stats again
pdeffebach 5d410e6
fix test
pdeffebach e18be15
fix REQUIRE
pdeffebach fe02fbe
Respond to nalimilan's comments 2
pdeffebach 707fea7
fix indentation on test
pdeffebach d163c5e
added bad stuff
pdeffebach 36f18a9
fix test
pdeffebach c81bde3
Undo all the stupid things I had earlier
pdeffebach bf6b97a
Update tests and comments
pdeffebach 6537afc
Fix indentation
pdeffebach 820df36
Add back in description in docstring
pdeffebach 451e474
fix space
pdeffebach a6e8f3b
Respond to Milan's comments 3
pdeffebach a1b1d6e
Respond to Milan 4
pdeffebach eb48ec7
Merge branch 'master' into master
nalimilan b5777a4
Add type-agnistic get_stats functions
pdeffebach 8c2808e
Merge branch 'master' of https://github.com/JuliaData/DataFrames.jl
pdeffebach f61db94
Merge pull request #2 from pdeffebach/MetaData
pdeffebach e22d392
Add nomissing for new try...catch arguments
pdeffebach 29d8a61
Merge branch 'MetaData'
pdeffebach 5ae1eed
Merge remote-tracking branch 'origin/master'
pdeffebach 22f2c3a
Add nunique to default, added optional last and first
pdeffebach 22767a8
Add nunique to default, added optional last and first
pdeffebach 56ece14
Merge remote-tracking branch 'origin/MetaData' into MetaData
pdeffebach a8c8fbb
Merge branch 'MetaData'
pdeffebach 1dfc0a1
Add deprecation warning, change docstring
pdeffebach ba07f02
add deprecation warning
pdeffebach 2f4f8e6
Add metadata without touching Index
pdeffebach efb5ee4
:
pdeffebach 97b50e9
fix isequal use in tests.
pdeffebach 53f6ad7
Respond to comments about deprecations and :all
pdeffebach d638068
Fix eltype call and some comments
pdeffebach 33859c6
Make there be only one describe definition
pdeffebach d93e537
change :all to symbol argument
pdeffebach 708a4fb
Fix docs for `describe`
pdeffebach 98d082a
Small fixes
nalimilan cdc6e3b
trim whitespace in docstring
pdeffebach fc780da
Change error handling for symbol kw
pdeffebach da3bb6b
Merge branch 'master' of https://github.com/pdeffebach/DataFrames.jl …
pdeffebach 3f68f09
Merge remote-tracking branch 'origin/master' into describe_changes
pdeffebach de73135
Progress with metadata, add test
pdeffebach 06354fe
More fixes
pdeffebach 4cec218
Change to isa, only generate error message on error.
pdeffebach e82dded
Merge branch 'master' of https://github.com/pdeffebach/DataFrames.jl …
pdeffebach 6b73913
Addded merge operation to dataframes.jl
pdeffebach 8dc227e
Merge remote-tracking branch 'JuliaData/master'
pdeffebach 4f09f74
Merge remote-tracking branch 'JuliaData/master'
pdeffebach 7478daa
respond to milan's comments 1: use Any etc.
pdeffebach e11ec8b
Merge remote-tracking branch 'JuliaData/master'
pdeffebach 174b924
Respond to Milan 2
pdeffebach 6a8f706
Merge remote-tracking branch 'JuliaData/master'
pdeffebach 8a43666
Update docs for new `describe` (#1442)
pdeffebach 0c15f20
make REPL printing of `nothing` an `empty string` (#1444)
pdeffebach c9a8a3b
Move Missing to it's own page in the Docs (#1415)
oxinabox 12d9835
Add section headings and row-by-row construction example (#1416)
oxinabox 8a8efd1
make nothing printing not expand size
pdeffebach 40b0021
respond to milan 3
pdeffebach 8ea657c
add docs...
pdeffebach 94445ba
add docs for missing
pdeffebach d48132c
remove missings.md
pdeffebach 75cdb25
respond to milan 3
pdeffebach a223ece
just metadata changes
pdeffebach 962afa0
just metadat changes 2
pdeffebach e160375
Merge remote-tracking branch 'origin/metadata' into metadata
pdeffebach File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Defining behavior for DataFrames metadata | ||
struct MetaData | ||
dict::Dict{Symbol, Vector} | ||
end | ||
|
||
MetaData() = MetaData(Dict{Symbol,Vector}()) | ||
|
||
Base.isequal(x::MetaData, y::MetaData) = isequal(x.dict, y.dict) | ||
Base.:(==)(x::MetaData, y::MetaData) = isequal(x, y) | ||
|
||
Base.copy(x::MetaData) = MetaData(copy(x.dict)) | ||
Base.deepcopy(x::MetaData) = MetaData(copy(x.dict)) # field is immutable | ||
|
||
function Base.getindex(x::MetaData, col_inds::AbstractVector) | ||
new_dict = copy(x.dict) | ||
for key in keys(new_dict) | ||
new_dict[key] = new_dict[key][col_inds] | ||
end | ||
MetaData(new_dict) | ||
end | ||
|
||
function Base.permute!(x::MetaData, p::AbstractVector) | ||
for key in keys(x.dict) | ||
x.dict[key] = permute!(x.dict[key], p) | ||
end | ||
nothing | ||
end | ||
|
||
function Base.permute(x::MetaData, p::AbstractVector) | ||
new_metadata = copy(x) | ||
permute!(new_metadata, p) | ||
end | ||
|
||
|
||
function newfield!(x::MetaData, ncol::Int, field::Symbol, info) | ||
x.dict[field] = Union{typeof(info), Nothing}[nothing for i in 1:ncol] | ||
end | ||
|
||
function addmeta!(x::MetaData, col_ind::Int, ncol::Int, field::Symbol, info) | ||
if !haskey(x.dict, field) | ||
newfield!(x, ncol, field, info) | ||
end | ||
x.dict[field][col_ind] = info | ||
end | ||
|
||
# For creating a new column in the dataframe | ||
function Base.push!(x::MetaData, info) | ||
for key in keys(x.dict) | ||
push!(x.dict[key], info) | ||
end | ||
end | ||
|
||
function Base.insert!(x::MetaData, col_ind::Int, item) | ||
for key in keys(x.dict) | ||
insert!(x.dict[key], col_ind, item) | ||
end | ||
end | ||
|
||
function Base.merge!(leftmeta::MetaData, rightmeta::MetaData, leftindex::Index, rightindex::Index) | ||
# Find the unique columns on the right | ||
right_and_not_left_names = setdiff(names(rightindex), names(leftindex)) | ||
right_and_not_left_cols = rightindex[right_and_not_left_names] | ||
# this imitates what's going on with the parent dataframes in merge! | ||
rightmeta = rightmeta[right_and_not_left_cols] | ||
rightindex = rightindex[right_and_not_left_names] | ||
# Find the difference in the keys and allocate if needed | ||
notonleft = setdiff(keys(rightmeta.dict), keys(leftmeta.dict)) | ||
notonright = setdiff(keys(leftmeta.dict), keys(rightmeta.dict)) | ||
|
||
for field in notonleft | ||
newfield!(leftmeta, length(leftindex), field, nothing) | ||
end | ||
|
||
for field in notonright | ||
newfield!(rightmeta, length(rightindex), field, nothing) | ||
end | ||
|
||
for key in keys(leftmeta.dict) | ||
leftmeta.dict[key] = | ||
vcat(leftmeta.dict[key], rightmeta.dict[key]) | ||
end | ||
end | ||
|
||
function append(leftmeta::MetaData, rightmeta::MetaData) | ||
append!(copy(leftmeta), rightmeta) | ||
end | ||
|
||
# deleting columns is handled by get_index? | ||
function getmeta(x::MetaData, col_ind::Int, field::Symbol) | ||
if haskey(x.dict, field) | ||
return x.dict[field][col_ind] | ||
else | ||
error("The field does not exist") | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
module TestMetaData | ||
using Compat, Compat.Test, DataFrames, StatsBase, Compat.Random | ||
using Suppressor | ||
using Compat: @warn | ||
|
||
df1 = DataFrame(a = [1, 2], b = [3, 4]) | ||
df2 = DataFrame(c = [3, 4], d = [5, 6]) | ||
|
||
# Just used to add metadata easily for testing. | ||
metadata!(df, :a, :label, "A label for variable a") | ||
|
||
testdata = DataFrame(variable = names(df1), label = | ||
["A label for variable a", | ||
nothing]) | ||
|
||
@test showmeta(df1) == testdata | ||
|
||
mergeddata = merge!(df1, df2) | ||
testmergeddata = DataFrame(variable = names(mergeddata, | ||
label = | ||
["A label for variable a", | ||
nothing, | ||
nothing, | ||
nothing, | ||
nothing])) | ||
|
||
@test showmeta(mergeddata) == testmergeddata | ||
|
||
end # module TestMetaData |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you noted in the issue description, this approach is not very efficient, and it doesn't work for
rightmeta
since it shouldn't be modified. Another way of doing this is infor key in keys(leftmeta.dict)
, to check whether the key exists in the left and right data frames. If it exists in both, callvcat
as you currently do. If it exists only in one of the data frames, allocate aVector{Union{Nothing, eltype(key_vec)}}
and callcopyto!
to fill the corresponding entries.