Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SASLib produces different (and sometimes obviously incorrect) output vs. readstat/ReadStat.jl #80

Open
kleinschmidt opened this issue Jul 21, 2022 · 3 comments

Comments

@kleinschmidt
Copy link

I'm sorry for a vague bug report, but I can't share the data files we're dealing with here since they're confidential.

But on both the sas7bdat files I've tried SASLib.jl with, the output is either obviously wrong or diverges from what readstat (with CSV export + CSV.jl) and ReadStat.jl (which uses the C API of readstat directly). By "obviously wrong" I mean that SASLib produces a table of the correct schema (types and column names) but with all 0.0/"" values. For the other one, the structure again appears to be correct, but some values are incorrect (e.g., a bunch of numeric values are different; the strings appear to be okay).

Again, sorry I can't share any more details about this but I'd be willing to do some debugging if you have suggestions about where to start!

@tk3369
Copy link
Owner

tk3369 commented Jul 23, 2022

Might be related to #53. Unfortunately I have been really busy lately. If you can fix this bug then you'll be my hero :-)

@tk3369
Copy link
Owner

tk3369 commented Jul 23, 2022

On another note, I had also seen issue when the SAS file is compressed. If you also control the upstream data pipeline then you can play around different compression options.

@tk3369
Copy link
Owner

tk3369 commented Jul 27, 2022

It would be helpful if you can take a dataset, mask away data with random values, and share that file for testing. Also, smaller file is easier to work with as long as you can replicate the same problem with a smaller file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants