Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance RDA #751

Merged
merged 7 commits into from
Dec 30, 2014
Merged

Enhance RDA #751

merged 7 commits into from
Dec 30, 2014

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Dec 30, 2014

Various enhancements to RDA import

  1. Refactor RDA: introduce RVector{T} class and RDAContext to minimize code duplication
  2. Fix NA/NaN recognition for numeric and complex vectors.
  3. Allow loading dataframe column names as is.
  4. Allow automatic conversion of R dataframes into DataFrame objects within read_rda().
  5. Additional tests.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) when pulling 3722beb on alyst:enhance_RDA into 714e633 on JuliaStats:master.

- add ROBJ and RVector{T} to minimize code duplication
- add RDAIO and RDAContext to store the RDA stream state and props
  and minimize the number of parameters passed to readxxx()
- rename all funcs that read from stream into readxxx()
- Dict-based dispatch of readxxx() methods
- NA is not NaN in R, but Julia was treating R NaNs as NA for RNumeric
- since NaN != NaN for any NaN number,
  to properly detect NA '===' op should be used
- add NA/NaN tests
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) when pulling fb4f568 on alyst:enhance_RDA into 714e633 on JuliaStats:master.

- if true (default behaviour) R column names are checked to be valid
  Julia identifiers and fixed, if necessary
- if false, column names are imported as is
- test added
- support passing keyword options to read_rda()
- add support for convertdataframes= option
- add support for fixcolnames= option
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) when pulling 8e604e0 on alyst:enhance_RDA into 714e633 on JuliaStats:master.

dmbates added a commit that referenced this pull request Dec 30, 2014
Enhance RDA

Thanks for doing this.
@dmbates dmbates merged commit 4035fda into JuliaData:master Dec 30, 2014
garborg added a commit that referenced this pull request Dec 31, 2014
This was discussed and intentionally left out for all IO methods.

The plan is for `df.colname` to be the idiomatic way of specifying
a column in the near future (it already works on some experimental
AbstractDataFrame types), and `df.col.name`, for example, can't be
parsed as desired. (`.` is meaningful syntax in Julia, so using
`col.name` in place of a valid identifier is like using `col+name`
which wouldn't be valid in R, for example.

It's trivial to work around in user code if the user insists, but
it doesn't belong in any package code for now, though adding it
uniformly to all IO methods may be revisited later if the roadmap
changes.
@garborg
Copy link
Contributor

garborg commented Dec 31, 2014

Thanks for submitting this -- it was a major improvement! As a heads up, fixcolnames was removed in the commit linked above -- more context in that commit message. Thanks again.

@alyst
Copy link
Contributor Author

alyst commented Jan 1, 2015

Thanks for merging my PR that fast! And no problems with fixcolnames removal. :)

Regarding df.colname syntax discussed in f3a89d7 -- I think it's a very nice idea indeed. In R, where df$colname serves the same purpose, it's also possible to use that syntax with non-standard characters in column names via "symbolic quotes", e.g. df$col.name!``. Maybe DataFrames.jl could support similar approach as well? I just don't know if there are symbolic quotes in Julia, but it would be nice to have them as a language feature.
Some R packages, produce dataframes with non-standard column names (e.g. 2.5% and `25%` in RStan). There are also cases, when column names contain metadata and have to be parsed. If such dataframes would be imported to Julia with modified column names, some individual workarounds would have to be written.

@garborg
Copy link
Contributor

garborg commented Jan 1, 2015

Those are good points and suggestions. Metadata / pretty printing headers, etc. have gotten some discussion, and there seemed to be agreement that they're important -- just perhaps a little ways off while we wait and see how dot-overloading is implemented in Base, play catch up on basic functionality, and figure out even what DataFrame objects should look like in Julia (see #744, for recent discussion).

@alyst alyst deleted the enhance_RDA branch September 25, 2017 14:37
nalimilan pushed a commit that referenced this pull request May 26, 2022
Enhance RDA

Thanks for doing this.
nalimilan pushed a commit that referenced this pull request May 26, 2022
This was discussed and intentionally left out for all IO methods.

The plan is for `df.colname` to be the idiomatic way of specifying
a column in the near future (it already works on some experimental
AbstractDataFrame types), and `df.col.name`, for example, can't be
parsed as desired. (`.` is meaningful syntax in Julia, so using
`col.name` in place of a valid identifier is like using `col+name`
which wouldn't be valid in R, for example.

It's trivial to work around in user code if the user insists, but
it doesn't belong in any package code for now, though adding it
uniformly to all IO methods may be revisited later if the roadmap
changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants