Skip to content

Commit

Permalink
Merge pull request #12 from tk3369/issue-3
Browse files Browse the repository at this point in the history
Issue 3 implemented
  • Loading branch information
tk3369 authored Dec 30, 2017
2 parents b2ccb3a + 71c1ad1 commit a865410
Show file tree
Hide file tree
Showing 8 changed files with 469 additions and 125 deletions.
117 changes: 74 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,44 +20,48 @@ Use the `readsas` function to read the file. The result is a dictionary of vari
```julia
julia> using SASLib

julia> x = readsas("test1.sas7bdat")
Read data set of size 10 x 100 in 0.019 seconds
julia> x = readsas("productsales.sas7bdat")
Read data set of size 1440 x 10 in 2.0 seconds
Dict{Symbol,Any} with 16 entries:
:filename => "test1.sas7bdat"
:page_length => 65536
:file_encoding => "wlatin1"
:filename => "productsales.sas7bdat"
:page_length => 8192
:file_encoding => "US-ASCII"
:system_endianness => :LittleEndian
:ncols => 100
:column_types => DataType[Float64, String, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64 Float64, Float64
:data => Dict{Any,Any}(Pair{Any,Any}(:Column60, [2987.0, 8194.0, 9820.0, 8252.0, 9640.0, 9168.0, 7547.0, 1419.0, 4884.0, NaN])
:perf_type_conversion => 0.0052096
:page_count => 1
:column_names => String["Column60", "Column42", "Column68", "Column35", "Column33", "Column1", "Column41", "Column16", "Column72", "Co…
:column_symbols => Symbol[:Column60, :Column42, :Column68, :Column35, :Column33, :Column1, :Column41, :Column16, :Column72, :Column19 ……
:column_lengths => [8, 9, 8, 8, 8, 9, 8, 8, 8, 9 … 8, 8, 8, 5, 8, 8, 8, 9, 8, 8]
:ncols => 10
:column_types => Type[Float64, Float64, Union{AbstractString, Missings.Missing}, Union{AbstractString, Missings.Missing}, Union{AbstractString,
:data => Dict{Any,Any}(Pair{Any,Any}(:QUARTER, [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0,
:perf_type_conversion => 0.0262305
:page_count => 18
:column_names => String["QUARTER", "YEAR", "COUNTRY", "DIVISION", "REGION", "MONTH", "PREDICT", "ACTUAL", "PRODTYPE", "PRODUCT"]
:column_symbols => Symbol[:QUARTER, :YEAR, :COUNTRY, :DIVISION, :REGION, :MONTH, :PREDICT, :ACTUAL, :PRODTYPE, :PRODUCT]
:column_lengths => [8, 8, 10, 10, 10, 10, 10, 8, 8, 8]
:file_endianness => :LittleEndian
:nrows => 10
:perf_read_data => 0.00612195
:column_offsets => [0, 600, 8, 16, 24, 609, 32, 40, 48, 618 … 536, 544, 552, 795, 560, 568, 576, 800, 584, 592]
:nrows => 1440
:perf_read_data => 0.00639309
:column_offsets => [0, 8, 40, 50, 60, 70, 80, 16, 24, 32]
```
Number of columns and rows are returned as in `:ncols` and `:nrows` respectively.
The data, reference by `:data` key, is represented as a Dict object with the column symbol as the key.
```juia
julia> x[:data][:Column1]
10-element Array{Float64,1}:
0.636
0.283
0.452
0.557
0.138
0.948
0.162
0.148
NaN
0.663
julia> x[:data][:ACTUAL]
1440-element Array{Float64,1}:
925.0
999.0
608.0
642.0
656.0
948.0
612.0
114.0
685.0
657.0
608.0
353.0
107.0
```
If you really like DataFrame, you can easily convert as such:
Expand All @@ -67,26 +71,53 @@ julia> using DataFrames

julia> df = DataFrame(x[:data]);

julia> df[:, 1:5]
10×5 DataFrames.DataFrame
│ Row │ Column1 │ Column10 │ Column100 │ Column11 │ Column12 │
├─────┼─────────┼─────────────┼───────────┼──────────┼────────────┤
│ 1 │ 0.636 │ "apple" │ 3230.0 │ NaN │ 1986-07-20 │
│ 2 │ 0.283 │ "apple" │ 4904.0 │ 22.0 │ 1983-07-15 │
│ 3 │ 0.452 │ "apple" │ NaN │ 7.0 │ 1973-11-27 │
│ 4 │ 0.557 │ "dog" │ 8566.0 │ 26.0 │ 1967-01-20 │
│ 5 │ 0.138 │ "crocodile" │ 894.0 │ 11.0 │ 1970-11-29 │
│ 6 │ 0.948 │ "crocodile" │ 6088.0 │ 27.0 │ 1963-01-09 │
│ 7 │ 0.162 │ "" │ 6122.0 │ NaN │ 1979-10-18 │
│ 8 │ 0.148 │ "crocodile" │ 2570.0 │ 5.0 │ 1961-03-15 │
│ 9 │ NaN │ "pear" │ 2709.0 │ 12.0 │ 1964-06-15 │
│ 10 │ 0.663 │ "pear" │ NaN │ 16.0 │ 1985-01-28 │
julia> head(df, 5)
5×10 DataFrames.DataFrame
│ Row │ ACTUAL │ COUNTRY │ DIVISION │ MONTH │ PREDICT │ PRODTYPE │ PRODUCT │ QUARTER │ REGION │ YEAR │
├─────┼────────┼─────────┼───────────┼────────────┼─────────┼───────────┼─────────┼─────────┼────────┼────────┤
1925.0 │ CANADA │ EDUCATION │ 1993-01-01850.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
2999.0 │ CANADA │ EDUCATION │ 1993-02-01297.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
3608.0 │ CANADA │ EDUCATION │ 1993-03-01846.0 │ FURNITURE │ SOFA │ 1.0 │ EAST │ 1993.0
4642.0 │ CANADA │ EDUCATION │ 1993-04-01533.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0
5656.0 │ CANADA │ EDUCATION │ 1993-05-01646.0 │ FURNITURE │ SOFA │ 2.0 │ EAST │ 1993.0
```
If you only need to read few columns, just pass an `include_columns` argument:
```
julia> head(DataFrame(readsas("productsales.sas7bdat", include_columns=[:YEAR, :MONTH, :PRODUCT, :ACTUAL])[:data]))
Read data set of size 1440 x 4 in 0.004 seconds
6×4 DataFrames.DataFrame
│ Row │ ACTUAL │ MONTH │ PRODUCT │ YEAR │
├─────┼────────┼────────────┼─────────┼────────┤
1925.01993-01-01 │ SOFA │ 1993.0
2999.01993-02-01 │ SOFA │ 1993.0
3608.01993-03-01 │ SOFA │ 1993.0
4642.01993-04-01 │ SOFA │ 1993.0
5656.01993-05-01 │ SOFA │ 1993.0
6948.01993-06-01 │ SOFA │ 1993.0
```
Likewise, you can read all columns except the ones you don't want as specified in `exclude_columns` argument:
```
julia> head(DataFrame(readsas("productsales.sas7bdat", exclude_columns=[:YEAR, :MONTH, :PRODUCT, :ACTUAL])[:data]))
Read data set of size 1440 x 6 in 0.031 seconds
6×6 DataFrames.DataFrame
│ Row │ COUNTRY │ DIVISION │ PREDICT │ PRODTYPE │ QUARTER │ REGION │
├─────┼─────────┼───────────┼─────────┼───────────┼─────────┼────────┤
1 │ CANADA │ EDUCATION │ 850.0 │ FURNITURE │ 1.0 │ EAST │
2 │ CANADA │ EDUCATION │ 297.0 │ FURNITURE │ 1.0 │ EAST │
3 │ CANADA │ EDUCATION │ 846.0 │ FURNITURE │ 1.0 │ EAST │
4 │ CANADA │ EDUCATION │ 533.0 │ FURNITURE │ 2.0 │ EAST │
5 │ CANADA │ EDUCATION │ 646.0 │ FURNITURE │ 2.0 │ EAST │
6 │ CANADA │ EDUCATION │ 486.0 │ FURNITURE │ 2.0 │ EAST │
```
If you need to read files incrementally:
```julia
handler = SASLib.open("test1.sas7bdat")
handler = SASLib.open("productsales.sas7bdat")
results = SASLib.read(handler, 3) # read 3 rows
results = SASLib.read(handler, 4) # read next 4 rows
SASLib.close(handler) # remember to close the handler when done
Expand Down
Loading

0 comments on commit a865410

Please sign in to comment.