Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: a type to attempt parsing #146

Open
ivan-m opened this issue Jun 28, 2017 · 6 comments
Open

RFC: a type to attempt parsing #146

ivan-m opened this issue Jun 28, 2017 · 6 comments

Comments

@ivan-m
Copy link
Contributor

ivan-m commented Jun 28, 2017

In my (internal) codebase, I've just written this type:

-- | Try parsing a value.  If it fails, record the error message.
newtype Try a = Try { tryValue :: Either String a }
  deriving (Eq, Ord, Show, Read, Functor)

-- | Always succeeds.
instance (FromField a) => FromField (Try a) where
  parseField = return . Try . runParser . parseField

The intent is for fields that may not match a specified format; e.g. the documentation for FromField has an example of a Color type. In practice, if this is in the middle of a row, then parsing that row fails, and thus parsing an entire CSV file can fail.

Whilst the type can be wrapped with a Maybe, this has two unfortunate side-effects:

  1. Cannot distinguish between "Empty cell" and "Failed to parse cell"
  2. No error message available to report back as to why a row may be bad

As such, this wrapper type allows you to successfully parse a file, and then discard rows which did not actually succeed (logging errors, etc.).

If we add in the attempted field into the Left case, we can then use that for ToField and thus have the round-trip property. We can similarly also apply this to the FromRow/ToRow case to be able to discard rows that cannot succeed (especially useful for streaming scenarios).

Do you think this is a useful enough type for me to send a PR? What should the name be?

@maxigit
Copy link

maxigit commented Jul 5, 2017

How is this different from just using Either Text (or Either String). The default instance are already written and I think are identical to yours.

@ivan-m
Copy link
Contributor Author

ivan-m commented Jul 5, 2017

There is an Either Field a instance which keeps the field value, but not the error message. When I wrote that type I needed the error message to be able to report why a row was skipped.

@maxigit
Copy link

maxigit commented Jul 5, 2017

I'm a bit confused. Your Try is at the column level not row level, so I'm not sure which error message you get at the row level .

@ivan-m
Copy link
Contributor Author

ivan-m commented Jul 5, 2017 via email

@maxigit
Copy link

maxigit commented Jul 5, 2017

What I mean, I have similar problems and the way I do to have row parametrized by a functor

data Product f = Product { name :: f String, price :: f Double }

What I parse (and declare as a FromRecord instance) is Product (Either Text). Then I have a validate function Product (Either Text) -> Either (Product (Either Text)) (Product Identity) which returns a valid product or an invalid one, then I can display if needed the all the rows (valid or not) with the original value and error message. I'm not sure how your approach is different ?

@ivan-m
Copy link
Contributor Author

ivan-m commented Jul 6, 2017

My use-case was for having fields that were meant to have specific textual values (hence why I referenced the example Color example) but were at times incorrect; alternatively, they would need to be in a specific format (such as a date encoded as an 8-digit integer) that I need to parse. However, I wanted to distinguish between a Maybe because the column was empty and a Maybe because the value was invalid.

I wasn't aware of the Either Field a instance or I might have used that; but what I actually wanted was to get the actual error message and display that as I processed the CSV file (whilst continuing on with the rest of the file rather than erroring), so if I used Either I'd have to re-parse the (invalid) Field again just to get that message.

So in practice I have a bunch of Try (Maybe Foo) in my datatype that parses the CSV file, then a function to convert from that to a datatype that contains what I actually want which just has Maybe Foo; that function is of type CSVType -> Either ErrorType UsedType where ErrorType is a richer consistent error type that I use throughout my program that contains the error message, unique key from the record, etc.

TL;DR: I want error messages; to use Either Field a I would have to re-parse them, and there is no corresponding Either String a for that (that could be used instead of wrapping it up, but it means that e.g. #116 couldn't be implemented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants