-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SASDataResult object #30
Comments
Breaking changes. Do you guys have time to review the new design? Appreciate your help. |
Thanks for the ping. In general I like it, but I have some questions.
I think moving that direction would definitely address some of the current shortcomings of the package, so I'm in favor of it. |
Thanks for the detailed feedbacks.
|
I think this looks great. At some point it might be neat to be able to stream the data out as an iterator of named tuples without the overhead of ever allocating an array. That can be quite convenient in combination with Query.jl. But, that will require quite a different API and maybe is just an addition for a future iteration? |
Yes, I think that could be a separate enhancement. Are you thinking about integrating with IterableTables or implementing DataStreams API or something different? |
IterableTables. Or rather, I'm thinking that StatFiles.jl might use this package instead of ReadStat.jl for the file formats supported. StatFiles.jl is just an iterable tables implementation. |
That's alright. I know you already have tons of packages to worry about. Let me know if you need help integrating with StatFiles.jl. However, StatFiles.jl does not seem to have a streaming iterator. Please correct me if I'm wrong. Given that SASLib can read the file incrementally, it would possible to build a buffered DataStream source. As Query.jl can take data streams, it would directly support #26. |
StatFiles.jl does produce an iterator of named tuples, the iterator table interface is by definition streaming (it is just an iterator of named tuples, more or less). BUT, the implementation is pretty stupid: it first reads everything into arrays, and then streams things from these arrays. The reason for that is simply that ReadStat.jl returns full arrays (and also that this was the quickest way to implement things). I think one could probably pretty easily change StatFiles.jl into the kind of buffered stream with the implementation of SASLib.jl as it stands right now. But one could probably go one step further: the iterator in StatFiles.jl could literally just read every row on demand from disc, i.e. we could even get rid of the buffer. All of these strategies though would read everything from disc, even if a user only wants to use a small part of it. If we query later (e.g. |
@tk3369 I do suppose that the decision to bring in or not bring in @davidanthoff The plans to make things stream right from disc sound great, but I agree with your sentiment earlier that perhaps that is best left for a future iteration. The integration with StatFiles.jl as it is currently would be a big step forward though. |
When you start working on this, will you let us know what branch to look at? |
See |
Just updated user guide (README.md) in the same branch. |
SASLib current returns a
Dict
object as the result of reading the file. It was a decision made during the early stage of development so it's easier to develop e.g. Revise.jl does not work whenever there's struct changes if I remember correctly.I think it would be beneficial to migrate to a struct so it is more sane and allow us to write additional functions that operate on the result.
The text was updated successfully, but these errors were encountered: