Enhance RDA #751

alyst · 2014-12-30T12:27:18Z

Various enhancements to RDA import

Refactor RDA: introduce RVector{T} class and RDAContext to minimize code duplication
Fix NA/NaN recognition for numeric and complex vectors.
Allow loading dataframe column names as is.
Allow automatic conversion of R dataframes into DataFrame objects within read_rda().
Additional tests.

coveralls · 2014-12-30T12:32:23Z

Coverage increased (+0.6%) when pulling 3722beb on alyst:enhance_RDA into 714e633 on JuliaStats:master.

- add ROBJ and RVector{T} to minimize code duplication - add RDAIO and RDAContext to store the RDA stream state and props and minimize the number of parameters passed to readxxx() - rename all funcs that read from stream into readxxx() - Dict-based dispatch of readxxx() methods

- NA is not NaN in R, but Julia was treating R NaNs as NA for RNumeric - since NaN != NaN for any NaN number, to properly detect NA '===' op should be used - add NA/NaN tests

coveralls · 2014-12-30T12:45:03Z

Coverage increased (+0.6%) when pulling fb4f568 on alyst:enhance_RDA into 714e633 on JuliaStats:master.

- if true (default behaviour) R column names are checked to be valid Julia identifiers and fixed, if necessary - if false, column names are imported as is - test added

- support passing keyword options to read_rda() - add support for convertdataframes= option - add support for fixcolnames= option

coveralls · 2014-12-30T20:19:22Z

Coverage increased (+0.6%) when pulling 8e604e0 on alyst:enhance_RDA into 714e633 on JuliaStats:master.

Enhance RDA Thanks for doing this.

This was discussed and intentionally left out for all IO methods. The plan is for `df.colname` to be the idiomatic way of specifying a column in the near future (it already works on some experimental AbstractDataFrame types), and `df.col.name`, for example, can't be parsed as desired. (`.` is meaningful syntax in Julia, so using `col.name` in place of a valid identifier is like using `col+name` which wouldn't be valid in R, for example. It's trivial to work around in user code if the user insists, but it doesn't belong in any package code for now, though adding it uniformly to all IO methods may be revisited later if the roadmap changes.

garborg · 2014-12-31T22:55:06Z

Thanks for submitting this -- it was a major improvement! As a heads up, fixcolnames was removed in the commit linked above -- more context in that commit message. Thanks again.

alyst · 2015-01-01T21:38:04Z

Thanks for merging my PR that fast! And no problems with fixcolnames removal. :)

Regarding df.colname syntax discussed in f3a89d7 -- I think it's a very nice idea indeed. In R, where df$colname serves the same purpose, it's also possible to use that syntax with non-standard characters in column names via "symbolic quotes", e.g. df$col.name!``. Maybe DataFrames.jl could support similar approach as well? I just don't know if there are symbolic quotes in Julia, but it would be nice to have them as a language feature.
Some R packages, produce dataframes with non-standard column names (e.g. 2.5% and `25%` in RStan). There are also cases, when column names contain metadata and have to be parsed. If such dataframes would be imported to Julia with modified column names, some individual workarounds would have to be written.

garborg · 2015-01-01T22:13:59Z

Those are good points and suggestions. Metadata / pretty printing headers, etc. have gotten some discussion, and there seemed to be agreement that they're important -- just perhaps a little ways off while we wait and see how dot-overloading is implemented in Base, play catch up on basic functionality, and figure out even what DataFrame objects should look like in Julia (see #744, for recent discussion).

Enhance RDA Thanks for doing this.

This was discussed and intentionally left out for all IO methods. The plan is for `df.colname` to be the idiomatic way of specifying a column in the near future (it already works on some experimental AbstractDataFrame types), and `df.col.name`, for example, can't be parsed as desired. (`.` is meaningful syntax in Julia, so using `col.name` in place of a valid identifier is like using `col+name` which wouldn't be valid in R, for example. It's trivial to work around in user code if the user insists, but it doesn't belong in any package code for now, though adding it uniformly to all IO methods may be revisited later if the roadmap changes.

alyst added 3 commits December 30, 2014 13:37

fix data.frame detection

d3e92f4

fix NA and NaN detection

6b12e89

- NA is not NaN in R, but Julia was treating R NaNs as NA for RNumeric - since NaN != NaN for any NaN number, to properly detect NA '===' op should be used - add NA/NaN tests

alyst force-pushed the enhance_RDA branch from 3722beb to fb4f568 Compare December 30, 2014 12:39

alyst added 4 commits December 30, 2014 21:12

add fixcolnames param to DataFrames(RList)

8cdb8e7

- if true (default behaviour) R column names are checked to be valid Julia identifiers and fixed, if necessary - if false, column names are imported as is - test added

add tests for complex numbers import from RDA

aa59874

add test for ASCII RDA

16aa762

support automatic RDA dataframes conversion

8e604e0

- support passing keyword options to read_rda() - add support for convertdataframes= option - add support for fixcolnames= option

alyst force-pushed the enhance_RDA branch from fb4f568 to 8e604e0 Compare December 30, 2014 20:13

dmbates added a commit that referenced this pull request Dec 30, 2014

Merge pull request #751 from alyst/enhance_RDA

4035fda

Enhance RDA Thanks for doing this.

dmbates merged commit 4035fda into JuliaData:master Dec 30, 2014

alyst deleted the enhance_RDA branch September 25, 2017 14:37

nalimilan pushed a commit that referenced this pull request May 26, 2022

Merge pull request #751 from alyst/enhance_RDA

8ae13e8

Enhance RDA Thanks for doing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance RDA #751

Enhance RDA #751

alyst commented Dec 30, 2014

coveralls commented Dec 30, 2014

coveralls commented Dec 30, 2014

coveralls commented Dec 30, 2014

garborg commented Dec 31, 2014

alyst commented Jan 1, 2015

garborg commented Jan 1, 2015

Enhance RDA #751

Enhance RDA #751

Conversation

alyst commented Dec 30, 2014

coveralls commented Dec 30, 2014

coveralls commented Dec 30, 2014

coveralls commented Dec 30, 2014

garborg commented Dec 31, 2014

alyst commented Jan 1, 2015

garborg commented Jan 1, 2015