Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add extra conversion test for incomplete files
- Include a new complete 1.5MB sas7bdat file to test against - Also include a truncated 512KB version of the same file to verify partial read behavior - Add a test that reads the incomplete file and proves that all the important SAS file metadata is contained at the beginning of the file. While it was expected that the read of the incomplete stream would somehow generate an error -- it actually didn't. Instead, the result is a partial CSV file containing garbage data for missing rows with no good indication of when the parsing died. The test verifies the last correctly written row and the first garbage row thereafter. - Requires regenerating checksums inside columnize_checksums.json by setting the value generateColumnize to true inside of columnize_test.go and re-running TestGenerateColumnize to regenerate the columnize_checksums.json - Also capture versions of the converted project.sas7bdat and project_incomplete.sas7bdat to csv equivalents to reflect the reality of what conversions look like. NOTE: the original upstream source data is a little different in how it's formatted within the CSV. In the original source data, values are quoted, precision is different for the RAD field and and the columns appear in a different order. Another conversion tool similarly uses values like 96 rather than 96.000000 for the RAD field. It's unclear if this is a bug in this version of the library or not. project-source.csv is from the AHS source project-converted.csv is converted by https://dumbmatter.com/sas7bdat/ NOTE: project_incomplete.csv contains exactly the converted file -- which includes a bunch of extraneous data that couldn't be read from the original file. This clearly seems like a bug, but documents current behavior accurately for partial files. A future PR will try to address this bug. - Good sources of test data files https://github.com/xiaodaigh/sas7bdat-resources https://github.com/olivia76/cpp-sas7bdat/tree/main/test https://github.com/tk3369/SASLib.jl/tree/master/test project.sas7bdat is from: https://www.census.gov/programs-surveys/ahs/data/2013/ahs-2013-public-use-file--puf-/ahs-2013-national-public-use-file--puf-.html
- Loading branch information