Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST representation changes #234

Merged
merged 1 commit into from
Apr 26, 2021
Merged

Conversation

sonologico
Copy link
Collaborator

@sonologico sonologico commented Apr 13, 2021

Opening the PR to continue the discussion started on #228.

Changes to the AST are:

  • Remove the record wrapper for each node and inline the attribute field
  • Make the block type parametric with the goal of using this flexibility to later have a way to parse and return the AST with source locations and whatever information we might want per node.

Copy link
Collaborator

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few suggestions, but they are just that. Let me know if you think any of them aren't for the better :)

I'm psyched to get this in place!

src/ast.ml Outdated Show resolved Hide resolved
Comment on lines +44 to +61
| Hard_break of 'attr
| Soft_break of 'attr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange at first to see these constructors take attributes, tho we have a good reason. Perhaps a comment on these constructors (or above this type declaration) to explain why we want to old space for attrs throughout would be helpful for future readers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Haven't added a comment yet. Not sure if I want to add a comment here or at the "external interface" level (omd.mli). I'm leaning towards adding it here. I'll add something before we merge this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment. Let me know if it's good enough.

src/ast.ml Outdated
Comment on lines 54 to 55
type raw = (string, attributes) block
type t = (attributes inline, attributes) block
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea for consideration -- one reason we might think about nested modules here is for namespacing like:

module Raw = struct
  type inline = string
  type t = (inline, attributes) block
end

module Plain = struct
  type inline = attributes inline
  type t = (inline, attributes) block
end

module SourceMap = struct
  type loc
  type inline = loc inline
  type t = (inline, loc) block
end

We might also end up with some functors that would apply to any of these ast representations. That might be too speculative, and perhaps not worth the complication currently, but just wanted to float the idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the benefit of it. I don't feel strongly about it, but I would either

  • keep it as is and wait until we have a need for functors (which is what I'd favour)
  • do it in the previous style in which the constructors are part of a functor so that they are also namespaced

src/ast.ml Outdated Show resolved Hide resolved
@shonfeder shonfeder added this to the 2.0 milestone Apr 14, 2021
@sonologico
Copy link
Collaborator Author

Let me know what you think after I have changed the rest of the code.

If we're going with this, I think that removing the open Ast line in some files and using the full path throughout the code would be better. What do you think?

Copy link
Collaborator

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me! Just two small comments.

After having propagated the AST through the code base, what's your feeling on its merits compared to the current representation? Do you feel like it's an improvement to work with?

src/block.ml Show resolved Hide resolved
src/omd.mli Outdated
| Html_block of 'attr * string
| Definition_list of 'attr * 'a def_elt list

type doc = (attributes, attributes inline) block list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about exposing the Ast.t in this interface and then using it here?

Suggested change
type doc = (attributes, attributes inline) block list
type doc = attributes Ast.t list

Copy link
Collaborator Author

@sonologico sonologico Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the thing that is making me consider moving back to using a functor as we were using before in the Ast module. There's no way to expose block without a parameter here. It feels weird to have Omd.block, Omd.t and Omd.doc.

I'd like to either

  1. Pick a different name for block and use the name block for t and update the Ast module to match it
  2. Go back to having the functor in the Ast module as it was before so that block doesn't have a type parameter and we don't need a t.

Let me know what you think and I'll try to wrap up this up by tomorrow so that we can unblock the other things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't quite follow. I think it makes sense to have a these two different type constructors exposed:

  • ('attr, 'a) block: types for block-level elements of any attributes and any content
  • 'attr Ast.t: types for the standard markdown AST, over any attributes

And to have this additional type:

  • doc = attributes t list: the type for the standard representation of a markdown document, with the standard attributes.

Tho I could also see an argument Ast.t being defines as a list of blocks.
(Well, if I were designing from scratch, I'd probably call doc Omd.t, but that's inessential.)

That said, if you think some use of simple functors will improve the interface and make a nicer presentation of types, I wouldn't be opposed. Do you want to sketch out what you have in mind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ast module used a functor. I'm thinking of reintroducing that so that Omd can have the following signature:

(* ...  *)

type 'attr inline =
  | Concat of 'attr * 'attr inline list
  | Text of 'attr * string
  | Emph of 'attr * 'attr inline
  | Strong of 'attr * 'attr inline
  | Code of 'attr * string
  | Hard_break of 'attr
  | Soft_break of 'attr
  | Link of 'attr * 'attr inline link
  | Image of 'attr * 'attr inline link
  | Html of 'attr * string

(* ... *)

type 'attr block =
  | Paragraph of 'attr * 'attr inline
  | List of 'attr * list_type * list_spacing * 'attr block list list
  | Blockquote of 'attr * 'attr block list
  | Thematic_break of 'attr
  | Heading of 'attr * int * 'attr inline
  | Code_block of 'attr * string * string
  | Html_block of 'attr * string
  | Definition_list of 'attr * 'attr def_elt list

type doc = attributes block list

With the current setup in my branch, we can't really expose Ast.t in Omd without exposing the Ast module as well (maybe with a restricted signature). I find that this Omd signature, the solution currently in my branch, is awkward:

(* ... *)
type ('attr, 'a) block =
  | Paragraph of 'attr * 'a
  | List of 'attr * list_type * list_spacing * ('attr, 'a) block list list
  | Blockquote of 'attr * ('attr, 'a) block list
  | Thematic_break of 'attr
  | Heading of 'attr * int * 'a
  | Code_block of 'attr * string * string
  | Html_block of 'attr * string
  | Definition_list of 'attr * 'a def_elt list

type 'attr t = ('attr, 'attr inline) block (* equivalent to Ast.t *)

type doc = attributes t list
(** A markdown document *)

I would either change the name of block to something else and change t to block or change t to something else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back the functor to the Ast module. There's no real cons internally besides a heavier weight syntax. I prefer the simpler external interface with a single parameter for the attributes on the block and inline types in Omd.

If you give me your approval, I'll squash the commits and merge this.

@shonfeder
Copy link
Collaborator

If we're going with this, I think that removing the open Ast line in some files and using the full path throughout the code would be better. What do you think?

I'm fine with narrower scoping of the module if you think it's an improvement 👍

@sonologico
Copy link
Collaborator Author

sonologico commented Apr 19, 2021

Looking good to me! Just two small comments.

After having propagated the AST through the code base, what's your feeling on its merits compared to the current representation? Do you feel like it's an improvement to work with?

I think it's an improvement overall.

  • I like having just the variants without the wrapping records. The small downside is that we might want to add a function to pattern match and return the attributes uniformly. I think it's worth it.
  • Adding the type parameter for the attributes has made types a bit noisier. I think it's worth it, because it hasn't made the type harder to work with and it makes adding support for an alternative more detailed representation (with locations, etc) easier.

@sonologico sonologico marked this pull request as ready for review April 24, 2021 18:47
Copy link
Collaborator

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about the functorial code here, but that's inherited :)

I suspect we can maybe find a nicer way of wrangling these types in the future, but I think the external API exposed in terms of the AST won't change!

LGTM!

* Remove the record wrapper for each node and inline the attribute field
* Make the block type parametric with the goal of using this later to have
  a version of the AST with source locations and whatever information we might
  want per node.
@sonologico sonologico merged commit 6156c9d into ocaml:master Apr 26, 2021
@sonologico sonologico deleted the sonologico/alt-ast branch April 26, 2021 22:33
SamB added a commit to SamB/rems-opam-repository that referenced this pull request Feb 10, 2022
As you probably know, leaving it unconstrained can result in opam trying
to build sail with omd 2.0.0~alpha2, which ends up producing an error something
like:

```
File "latex.ml", line 226, characters 6-21:
226 |     | Paragraph elems ->
            ^^^^^^^^^^^^^^^
Error: The constructor Paragraph expects 2 argument(s),
       but is applied here to 1 argument(s)
```

... due to omd 2's revamp of the
AST (<ocaml/omd#234>).

Anyway, it turns out that opam can still end up picking sail 0.13 and
earlier, which (unlike version 0.14) don't yet have this constraint, and
so still fail fail to build in this manner.

I'm not really sure why sail 0.14 chose the constraint "omd<1.4", but in
the interest of monotonicity, I elected to apply the same one retroactively:

``` sh
opam admin add-constraint 'omd<1.4'
```

It may seem alarming to update existing packages in this way, but
apparently this is the state-of-the-art in opam repository
administration at the moment. (Meaning: nobody's managed to figure out
how to avoid needing to do it.)

Apparently, nothing too terrible happens as long as you restrict such
changes to tightening up version constraints, and don't do so in ways
that would prevent something that worked before from working now. Since
this change only excludes two versions of omd that currently
exist (2.0.0~alpha1 and 2.0.0~alpha2), which as we've seen don't work
for sail, we should be fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants