What does the Validation construct actually buy us? #966

WardBrian · 2021-09-16T14:53:15Z

WardBrian
Sep 16, 2021
Maintainer

The note in the source file says:

In some situations, it may be preferable to report all of these errors to a user in one go. Validation gives us this ability*.

with the caveat saying

This is true so long as we restrict our usage to apply (and related)
functions i.e. the applicative interface. As soon as we use bind
(i.e. the monadic interface) we lose this ability. Technically, Validation
doesn't constitute a monad

My question is, are we actually leveraging this ability? We seem to be using bind a lot in the actual semantic check, and in our type_ast_or_exit function we have this chunk of code:

 match Semantic_check.semantic_check_program ast with
    | Result.Ok p -> emit_warnings_and_return_ast p
    | Result.Error (error :: _) ->
        Errors.pp_semantic_error Fmt.stderr error ;
        exit 1

which is just peeling off the first error to report and then throwing the rest away. (For the record: I think this is probably the correct behavior -- reporting multiple errors is very rarely useful in my experience, because [for example] one mis-typed function definition can generate a lot of errors, but only the first one is really relevant.)

So, is there a strong reason we still use this abstraction? In my work so far I've noticed that the semantic check code can be a real bear and I think a lot of it is to work around this almost-monad we don't seem to actually use. A refactor to something which truly fails on the first error (or simply returns a vanilla Result type) would make the code a lot cleaner imo, and a lot closer to the more pedagogical examples of typecheckers (e.g https://www.cs.cornell.edu/courses/cs3110/2019sp/textbook/interp/simpl_typecheck.html)

Answered by WardBrian

Sep 17, 2021

I've spent the past day digging around in the typechecker and prototyping an alternative style, so I want to share some of what I've thought about. I want to be clear I'm not trying to start an exception-monad holy war or anything, they are 100% the correct thing for errors that need to be handled and recovered from and a lot of other applications.

But, when the correct behavior for an error is to stop the computation and report the error, which is how our typechecker ultimately behaves right now, I really think it's better to do the simpler solution, even if it's not the most 'functional' thing to do [1].

Here are the final 6 lines of the typechecker at the moment (this function is ~70 l…

View full answer

nhuurre · 2021-09-16T15:49:06Z

nhuurre
Sep 16, 2021
Maintainer

I think you'll need @seantalts or @rybern for a proper answer but my impression is that monads and applicatives are just the conventional functional programming style, i.e. be explicit about the error types instead of smuggling errors by raising exceptions (like the "pedagogical example" you link to)

The semantic check code can be a bit cumbersome but IMO upgrading to 4.08 (#710) should allow cleaning it up enough without the need to ditch Validation.

1 reply

WardBrian Sep 16, 2021
Maintainer Author

I'm not opposed to some structure for errors other than simply raising exceptions, I'm more so wondering why we go through all this effort to not use the built in Result type, and then when it comes time to actually emit an error, we throw away the difference between what we've done and a Result

WardBrian · 2021-09-16T16:17:14Z

WardBrian
Sep 16, 2021
Maintainer Author

Out of curiousity I replaced the error-emitting code with

    match Semantic_check.semantic_check_program ast with
    | Result.Ok p -> emit_warnings_and_return_ast p
    | Result.Error es ->
        List.iter ~f:(Errors.pp_semantic_error Fmt.stderr) es ;
        exit 1

to actually see all the errors, and ran the tests. Here are the diffs: https://gist.github.com/WardBrian/afba10757e1058c359cd97bae8e1febf

It mostly seems like it repeats the same error multiple times in many cases, but there are some interesting additional errors caught

2 replies

SteveBronder Sep 17, 2021
Maintainer

(For the record: I think this is probably the correct behavior -- reporting multiple errors is very rarely useful in my experience, because [for example] one mis-typed function definition can generate a lot of errors, but only the first one is really relevant.)

Can you run a dedup on the list of errors before printing? imo I kind of like the above. Another good example of throwing multiple errors nicely would be if I had multiple functions that tried using signatures that did not exist. Though that also gets really weird and tricky, like in gcc they continue printing out errors and just assume the return of the failed function call is int. That generates a lot of nasty / not useful errors later when that object gets used though so I'd like if we had something nicer to handle that situation. If it's not hard to allow for throwing multiple errors then, seems good. But I'm not familiar with the time/reward ratio here so I'll differ to others

WardBrian Sep 17, 2021
Maintainer Author

Removing duplicates would need to decide if it wants to remove duplicates by error, by location, or by both, but would be doable in theory. My opinion in general is that this is messy and usually not as helpful as you'd like. take for example a really easy mistake

real x = 1;
int z_start = x +1;
int z_2 = x+2;
...
int z_end = x+20;

This snippet of code would output 20 error messages, when the likely error is just that x was meant to be int, which would be just as clear as if the program failed with only one error. Reporting extra errors here doesn't serve anyone.

Additionally, if the comment in Validation.ml is to be believed, we lose all but the first error anytime we use bind anyway, and we use it quite a lot (this might have a few false positives, but there are 34 results for >>= in Semantic_check.ml). So I think without a lot more careful use of Validate in the typechecker we would only report some, not all, errors, which isn't ideal.

WardBrian · 2021-09-17T18:58:35Z

WardBrian
Sep 17, 2021
Maintainer Author

I've spent the past day digging around in the typechecker and prototyping an alternative style, so I want to share some of what I've thought about. I want to be clear I'm not trying to start an exception-monad holy war or anything, they are 100% the correct thing for errors that need to be handled and recovered from and a lot of other applications.

But, when the correct behavior for an error is to stop the computation and report the error, which is how our typechecker ultimately behaves right now, I really think it's better to do the simpler solution, even if it's not the most 'functional' thing to do [1].

Here are the final 6 lines of the typechecker at the moment (this function is ~70 lines long):

  Validate.(
    ok mk_typed_prog |> apply_to ufb |> apply_to udb |> apply_to utdb
    |> apply_to upb |> apply_to utpb |> apply_to umb |> apply_to ugb
    |> check_correctness_invariant_validate
    |> get_with
         ~with_ok:(fun ok -> Result.Ok ok)
         ~with_errors:(fun errs -> Result.Error errs))

All of these applys are only necessary because of the Validation construct, and even though I know what this code must output it's very hard to read it and actually understand what it does on the first go.

Here is my quick attempt at the entire main function of a version that raised SemanticError exceptions rather than internal use of an error monad (The other difference here is using a functional map rather than a global mutable symbol table, because I think that is one FP tenet worth holding to if we can, and my original reason for starting this thread was to understand this part of our compiler more to work on things like function overloading):

let check_program_exn
    ( { functionblock= fb
      ; datablock= db
      ; transformeddatablock= tdb
      ; parametersblock= pb
      ; transformedparametersblock= tpb
      ; modelblock= mb
      ; generatedquantitiesblock= gqb
      ; comments } as ast ) : typed_program =
  (* create a new type environment which has only stan-math functions *)
  let tenv = Environment.create () in
  let tenv, typed_fb = check_ostatements_in_block Functions tenv fb in
  let () = check_all_fns_used tenv typed_fb in
  let tenv, typed_db = check_ostatements_in_block Data tenv db in
  let tenv, typed_tdb = check_ostatements_in_block TData tenv tdb in
  let tenv, typed_pb = check_ostatements_in_block Param tenv pb in
  let tenv, typed_tpb = check_ostatements_in_block TParam tenv tpb in
  let _, typed_mb = check_ostatements_in_block Model tenv mb in
  let _, typed_gqb = check_ostatements_in_block GQuant tenv gqb in
  let prog =
    { functionblock= typed_fb
    ; datablock= typed_db
    ; transformeddatablock= typed_tdb
    ; parametersblock= typed_pb
    ; transformedparametersblock= typed_tpb
    ; modelblock= typed_mb
    ; generatedquantitiesblock= typed_gqb
    ; comments }
  in
  check_correctness_invariant ast prog

let check_program ast =
  try Result.Ok (check_program_exn ast) with Errors.SemanticError err ->
    Result.Error err

Now, this approach still ultimately uses a Result monad to actually interface with the rest of the compiler, it just doesn't need to use one internally - all functions would return bare values, or throw an exception which gets caught in check_program. To the rest of the compiler, and to the output to the user, it would be indistinguishable

This style means that you definitely lose the ability to report multiple errors, but in terms of our current behavior it could reproduce it with less (and more readable/modifiable) code.

[1] As a final footnote, I think even if we want to stick with monadic error handling in the typechecker, moving away from our roll-your-own almost-monad into the more standard Result.t would be worthwhile -- you lose everything but the very first error (which is all we need), but that code would have a lot less gymnastics involved than it presently does

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the Validation construct actually buy us? #966

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

What does the Validation construct actually buy us? #966

WardBrian Sep 16, 2021 Maintainer

Replies: 3 comments · 3 replies

nhuurre Sep 16, 2021 Maintainer

WardBrian Sep 16, 2021 Maintainer Author

WardBrian Sep 16, 2021 Maintainer Author

SteveBronder Sep 17, 2021 Maintainer

WardBrian Sep 17, 2021 Maintainer Author

WardBrian Sep 17, 2021 Maintainer Author

WardBrian
Sep 16, 2021
Maintainer

Replies: 3 comments 3 replies

nhuurre
Sep 16, 2021
Maintainer

WardBrian Sep 16, 2021
Maintainer Author

WardBrian
Sep 16, 2021
Maintainer Author

SteveBronder Sep 17, 2021
Maintainer

WardBrian Sep 17, 2021
Maintainer Author

WardBrian
Sep 17, 2021
Maintainer Author