Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance joining and grouping #850

Closed
wants to merge 110 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
968e980
RFC: Add compatibility with pre-contrasts ModelFrame constructor (#1042)
Sep 14, 2016
d4ad15b
Reindex transposed sparse contrast matrix into modelmat_cols column-w…
Sep 19, 2016
2931693
Fill existing arrays with scalars (#1057)
TotalVerb Sep 20, 2016
e4662fd
Port to NullableArrays and CategoricalArrays
nalimilan May 6, 2016
9de5c08
Get rid of custom Nullable operators and functions
nalimilan Aug 28, 2016
6ac7549
Fix grouping
nalimilan Aug 29, 2016
653fc1d
Remove custom isnull() definition
nalimilan Aug 29, 2016
a17f264
Remove optimized sorting methods
nalimilan Aug 29, 2016
9a71705
Remove inscrutable FIXME
nalimilan Aug 29, 2016
9f1e5e6
More Julia 0.4 compatibility
nalimilan Aug 29, 2016
a75a4a4
Remove another FIXME
nalimilan Aug 29, 2016
1b44ffe
Remove FIXME about insert!()
nalimilan Aug 29, 2016
0ff4dc8
Remove FIXME about +(::NullableArray{Int}, ::Int)
nalimilan Aug 29, 2016
110deac
Remove FIXME about test/indexing.jl
nalimilan Aug 29, 2016
0ff6373
Remove FIXME about map()
nalimilan Aug 29, 2016
431d135
Fix sortperm() tests
nalimilan Aug 29, 2016
cc87f46
Remove FIXME about predict()
nalimilan Aug 30, 2016
ec9b706
Remove FIXME about head() and tail()
nalimilan Aug 30, 2016
e9a1c8c
Remove FIXME about PooledDataVecs
nalimilan Aug 30, 2016
bf16c5f
Remove unused NominalArray methods
nalimilan Aug 30, 2016
3bb1323
Mention Julia bug in FIXME
nalimilan Aug 31, 2016
95789bf
Bump dependencies on NullableArrays and CategoricalArrays
nalimilan Aug 31, 2016
5c33249
Require NullableArrays 0.0.8
nalimilan Aug 31, 2016
e1df391
Bump CategoricalArrays requirement
quinnj Sep 2, 2016
63c1d96
Fix tests on Julia 0.4
nalimilan Sep 2, 2016
2ec131e
Use CategoricalArray instead of NominalArray
nalimilan Sep 13, 2016
ad75f67
Remove DataArrays benchmarks
nalimilan Sep 13, 2016
492351c
Update docs
nalimilan Sep 22, 2016
d48d7f8
Fix failures introduced when rebasing
nalimilan Sep 22, 2016
f8dc8c6
Update docs to remove references to DataArrays and fully qualify a fe…
quinnj Sep 22, 2016
fde1c96
Cleanup a few more examples
quinnj Sep 22, 2016
a2ae0ca
Update docs to work with NullableArray interiors
quinnj Sep 23, 2016
eddb824
Deprecate pool/pool-bang in favor of categorize/categorize-bang
quinnj Sep 23, 2016
c27e45a
Remove reference to DataArray
quinnj Sep 23, 2016
f158285
Return a Bool for == instead of Nullable{Bool}
quinnj Sep 23, 2016
d082a8d
Fix a bug in hcat with CategoricalArrays
quinnj Sep 23, 2016
07c46d8
Bump DataFrames to julia 0.5-
quinnj Sep 23, 2016
ec0126d
Still compare columns using isequal for now
quinnj Sep 24, 2016
f53aeaf
Remove 2 redundant definitions causing override warnings
quinnj Sep 24, 2016
854ce92
Fix failing tests
quinnj Sep 24, 2016
166475a
Change julia REQUIRE to 0.5 and remove 0.4 testing from travis and ap…
quinnj Sep 24, 2016
044537f
Avoid introducing new loadiris() function
nalimilan Sep 24, 2016
1864086
Make == fall back to isequal() for now
nalimilan Sep 24, 2016
1fe210d
Rename categorize() to categorical()
nalimilan Sep 24, 2016
333ce3e
Fix failure in I/O test
nalimilan Sep 25, 2016
698ba0f
Deprecate pool properly
quinnj Sep 26, 2016
18fd664
Merge pull request #1008 from JuliaStats/nl/nullable
quinnj Sep 26, 2016
6590ac1
Only sort duplicated columns once (#1072)
gustafsson Sep 26, 2016
5998148
collecting with brackets is deprecated (#939)
gustafsson Sep 27, 2016
6c760d9
Fix test failures on master (#1075)
nalimilan Sep 27, 2016
115bb5e
Update Documenter syntax (#966)
MichaelHatherly Sep 27, 2016
576b26b
test empty frames joins
alyst Aug 30, 2015
2b932f9
test empty frames groupby()
alyst Sep 27, 2016
bd3e2b2
more DataFrame assignment tests
alyst Jun 17, 2015
c544b91
readonly AbstractVector interface for Cols
alyst Sep 27, 2016
f21be25
simplify eltypes()
alyst Sep 27, 2016
d2245ef
small cleanups to stack/unstack
alyst Sep 27, 2016
43e4393
immutable GroupApplied, enhance combine()
alyst Sep 27, 2016
06e71f1
aggregate() optimizations
alyst Aug 30, 2016
da389d9
fix groupby() doc
alyst Sep 28, 2016
de55a0e
Add querying section with links to other packages to documentation (#…
davidanthoff Oct 1, 2016
23ec690
Merge pull request #1076 from alyst/misc_fixes
ararslan Oct 1, 2016
1658c35
Add output to LaTeX (useful for IJulia notebook export to PDF) (#845)
maximerischard Oct 3, 2016
400da84
handle `A ~ B - 1` and add tests (#1086)
kleinschmidt Oct 3, 2016
e1c5014
Fix join when mixing NullableArray and Array{Nullable} (#1089)
nalimilan Oct 4, 2016
725a226
Better display of Nullables (#1084)
amellnik Oct 5, 2016
e4ab277
Update StatsBase.df to dof (#1097)
ararslan Oct 7, 2016
203b50f
limit attribute of IOContext is used for html generation (#1099)
jw3126 Oct 11, 2016
b6de65a
Fix docstring example (#1107)
femtotrader Oct 16, 2016
10c4423
Loosen constructor for a DataFrame (#1108)
andyferris Oct 18, 2016
9cae226
Use the tagged version of Documenter (#1109)
davidanthoff Oct 18, 2016
e7ea227
fix typo in Nullable holding 1 example (#1112)
bkamins Oct 22, 2016
3706704
Small docs fixes (#1077)
nalimilan Oct 22, 2016
e418174
Enable doctests (#1110)
davidanthoff Oct 24, 2016
4c47ed3
Add documentation for Query.jl (#1105)
davidanthoff Nov 12, 2016
cd6d749
Juno display (#1125)
MikeInnes Nov 13, 2016
c81d57e
Add querying chapter to table of content (#1129)
davidanthoff Nov 16, 2016
dd2772b
Update joins doc to include rename! (#1131)
dx034 Nov 21, 2016
e205b70
Avoid closing IO unless responsible for opening (#1138)
omus Dec 28, 2016
b44ca70
remove 0.3 @compat
alyst Oct 4, 2016
d441c95
crossjoin: update for 0.5 repeat()
alyst Oct 4, 2016
72f578c
better compat
alyst Oct 1, 2016
590e1e1
fix whitespace
alyst Oct 4, 2016
22e71c2
add get() methods for Index
alyst Jun 16, 2015
be9f57a
simplify column names construction
alyst Oct 4, 2016
07785c9
method to convert SubDataFrame to DataFrame
alyst Jun 17, 2015
e83180e
permute!() and ipermute!() for rows
alyst Oct 4, 2016
10b5ad8
simplify Sort.lt()
alyst Aug 26, 2015
975dd1c
fixup FastPerm missing
alyst Sep 29, 2016
c4460e1
throw ArgumentError for ordering() kwarg check
alyst Sep 29, 2016
58afed2
join(): save two lines
alyst Oct 4, 2016
5a27605
join test for different order of "on" columns
alyst Aug 6, 2015
fe9bba3
join(): enable test for missing on=
alyst Aug 2, 2015
9813e8e
_isnull(A, i) helper methods
alyst Oct 4, 2016
1fe9704
cleanup RepeatedVector indexing
alyst Oct 4, 2016
daa3035
RepeatedVector: NullableArrays support
alyst Oct 4, 2016
fca5c38
DFRowIterator: cache nrow
alyst Sep 28, 2016
2e97974
DataFrameRow: rearrange methods a bit
alyst Sep 29, 2016
01e5b9f
DataFrameRow: comparing rows from different frames
alyst Aug 5, 2015
a7d6215
add isless(DataFrameRow, DataFrameRow)
alyst Aug 26, 2015
8a966d8
hash() and isequal() that require DF and row ix
alyst Aug 26, 2015
2340c92
add helper functions for grouping and joining
alyst Oct 4, 2016
1f9dda0
use RowGroupDict for nonunique()
alyst Aug 24, 2015
9a760dc
more stable join()
alyst Oct 4, 2016
822384f
groupby(): use group_rows()
alyst Aug 24, 2015
a7598a6
sort groups without temporary data frame creation
alyst Aug 26, 2015
5c5f4de
make sorting of row groups optional
alyst Oct 4, 2016
66c986e
group tests with @testset
alyst Oct 4, 2016
a9f0be6
replace padnull!() with resize!()
alyst Oct 5, 2016
76947d4
unsafe_hashindex() -> hash_colel() and @prop_inbnd
alyst Jan 18, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@

language: julia
julia:
- 0.4
- 0.5
- nightly
os:
Expand All @@ -15,6 +14,5 @@ script:
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
- julia --check-bounds=yes -e 'Pkg.clone(pwd()); Pkg.build("DataFrames"); Pkg.test("DataFrames"; coverage=true)'
after_success:
- julia -e 'cd(Pkg.dir("DataFrames")); Pkg.clone("https://github.com/MichaelHatherly/Documenter.jl"); include(joinpath("docs", "make.jl"))'
- julia -e 'cd(Pkg.dir("DataFrames")); Pkg.add("Documenter"); Pkg.add("Query"); include(joinpath("docs", "make.jl"))'
- julia -e 'cd(Pkg.dir("DataFrames")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())'

8 changes: 5 additions & 3 deletions REQUIRE
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
julia 0.4
DataArrays 0.3.4
StatsBase 0.8.3
julia 0.5
NullableArrays 0.0.8
CategoricalArrays 0.0.6
StatsBase 0.11.0
GZip
SortingAlgorithms
Reexport
Compat 0.8.4
FileIO 0.1.2
Juno 0.2.4
2 changes: 0 additions & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
environment:
matrix:
- JULIAVERSION: "julialang/bin/winnt/x86/0.4/julia-0.4-latest-win32.exe"
- JULIAVERSION: "julialang/bin/winnt/x64/0.4/julia-0.4-latest-win64.exe"
- JULIAVERSION: "julialang/bin/winnt/x86/0.5/julia-0.5-latest-win32.exe"
- JULIAVERSION: "julialang/bin/winnt/x64/0.5/julia-0.5-latest-win64.exe"
- JULIAVERSION: "julianightlies/bin/winnt/x86/julia-latest-win32.exe"
Expand Down
37 changes: 0 additions & 37 deletions benchmark/datamatrix.jl

This file was deleted.

56 changes: 0 additions & 56 deletions benchmark/datavector.jl

This file was deleted.

69 changes: 0 additions & 69 deletions benchmark/results.csv

Large diffs are not rendered by default.

4 changes: 1 addition & 3 deletions benchmark/runbenchmarks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@
using DataFrames
using Benchmark

benchmarks = ["datavector.jl",
"datamatrix.jl",
"io.jl"]
benchmarks = [ "io.jl"]

# TODO: Print summary to stdout_stream, while printing results
# to file with appends.
Expand Down
40 changes: 32 additions & 8 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,23 +1,47 @@
using Documenter, DataFrames, DataArrays
using Documenter, DataFrames

# Build documentation.
# ====================

makedocs(
# options
modules = [DataFrames],
doctest = false,
clean = false
doctest = true,
clean = false,
sitename = "DataFrames.jl",
format = Documenter.Formats.HTML,
pages = Any[
"Introduction" => "index.md",
"User Guide" => Any[
"Getting Started" => "man/getting_started.md",
"IO" => "man/io.md",
"Joins" => "man/joins.md",
"Split-apply-combine" => "man/split_apply_combine.md",
"Reshaping" => "man/reshaping_and_pivoting.md",
"Sorting" => "man/sorting.md",
"Formulas" => "man/formulas.md",
"Pooling" => "man/pooling.md",
"Querying frameworks" => "man/querying_frameworks.md",
],
"API" => Any[
"Main types" => "lib/maintypes.md",
"Utilities" => "lib/utilities.md",
"Data manipulation" => "lib/manipulation.md",
],
"About" => Any[
"Release Notes" => "NEWS.md",
"License" => "LICENSE.md",
]
]
)

# Deploy built documentation from Travis.
# =======================================

# Needs to install an additional dep, mkdocs-material, so provide a custom `deps`.
custom_deps() = run(`pip install --user pygments mkdocs mkdocs-material`)

deploydocs(
# options
deps = custom_deps,
repo = "github.com/JuliaStats/DataFrames.jl.git"
repo = "github.com/JuliaStats/DataFrames.jl.git",
target = "build",
deps = nothing,
make = nothing,
)
43 changes: 0 additions & 43 deletions docs/mkdocs.yml

This file was deleted.

19 changes: 11 additions & 8 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@

## Package Manual

{contents}
Pages = ["man/getting_started.md", "man/io.md", "man/joins.md", "man/split_apply_combine.md", "man/reshaping_and_pivoting.md", "man/sorting.md", "man/formulas.md", "man/pooling.md"]
Depth = 2
```@contents
Pages = ["man/getting_started.md", "man/io.md", "man/joins.md", "man/split_apply_combine.md", "man/reshaping_and_pivoting.md", "man/sorting.md", "man/formulas.md", "man/pooling.md", "man/querying_frameworks.md"]
Depth = 2
```

## API

{contents}
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
Depth = 2
```@contents
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
Depth = 2
```

## Documentation Index

{index}
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md", "man/io.md"]
```@index
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md", "man/io.md"]
```
22 changes: 11 additions & 11 deletions docs/src/lib/maintypes.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@

{meta}
CurrentModule = DataFrames
```@meta
CurrentModule = DataFrames
```

# Main Types

{index}
Pages = ["maintypes.md"]

...

{docs}
AbstractDataFrame
DataFrame
SubDataFrame
```@index
Pages = ["maintypes.md"]
```

```@docs
AbstractDataFrame
DataFrame
SubDataFrame
```
33 changes: 18 additions & 15 deletions docs/src/lib/manipulation.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@

{meta}
CurrentModule = DataFrames
```@meta
CurrentModule = DataFrames
```

# Data Manipulation

{index}
Pages = ["manipulation.md"]

```@index
Pages = ["manipulation.md"]
```

## Joins

{docs}
join

```@docs
join
```

## Reshaping

{docs}
melt
stack
unstack
stackdf
meltdf
```@docs
melt
stack
unstack
stackdf
meltdf
```
44 changes: 21 additions & 23 deletions docs/src/lib/utilities.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,25 @@

{meta}
CurrentModule = DataFrames
```@meta
CurrentModule = DataFrames
```

# Utilities

{index}
Pages = ["utilities.md"]
```@index
Pages = ["utilities.md"]
```

...

{docs}
eltypes
head
complete_cases
complete_cases!
describe
dump
names!
nonunique
rename
rename!
tail
unique
unique!


```@docs
eltypes
head
complete_cases
complete_cases!
describe
dump
names!
nonunique
rename
rename!
tail
unique
unique!
```
Loading