Skip to content
This repository has been archived by the owner on Jun 28, 2021. It is now read-only.

Preserve numbers within quotes as String with auto_parse set to true #182

Closed
saswatds opened this issue Mar 19, 2018 · 2 comments
Closed

Comments

@saswatds
Copy link

If we consider the following csv file

nameWithComma , nameWithoutComma ,numString, num
"Tim, Lee", "Tim \\\"TDOS\\\" Lee","1", 1

The parser will convert both numString and num to Integers. What we want is numString to not be converted to Integer but remain as string.

One method to achieve this would be to use a custom auto_parse function with quote option set to false and manually checking if the value is within quotes and not convert then to Integer.
But with that, as a side effect "Tim, Lee" gets split into two values Tim and Lee which is not something we want.

Could you suggest some method how this preservation could be achieved?

@wdavidw
Copy link
Member

wdavidw commented Mar 19, 2018

I understand your usecase, but speaking CSV, there are no reason to treat a field surrounded by double quotes differently than a field without it. The "auto_parse" options is intrusive and, as such, has limits. If you need full control, a post stream transformer is needed to apply custom user functions. This is the reason why we ship stream-transform. Do you agree ?

@saswatds
Copy link
Author

@wdavidw Yeah, you are right. I will try and work on a way to accommodate our use case on the lines of stream-transform. Thanks for quick reply 😄

@wdavidw wdavidw closed this as completed Mar 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants