Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment constructor #57

Open
Profpatsch opened this issue May 15, 2017 · 33 comments
Open

Comment constructor #57

Profpatsch opened this issue May 15, 2017 · 33 comments
Labels
documentation Art of how not to explain. enhancement Betterness, without new explicit feature possibilities

Comments

@Profpatsch
Copy link
Contributor

Profpatsch commented May 15, 2017

As far as I can see, there is no way to represent comments in the AST at the moment.

Since I’m generating files that should be human-readable as well, it would be nice to have them (only line comments as far as I’m concerned, but inline/block comments are probably a good idea as well).

@jwiegley
Copy link
Member

You'll probably need to add that support by using an annotated AST (there's a file for supplying annotation in the repository).

@Profpatsch
Copy link
Contributor Author

So the parser just skips comments right now?

@jwiegley
Copy link
Member

Yes, I believe so. I'm not sure if the annotated parser also does, though.

@domenkozar
Copy link
Collaborator

domenkozar commented Jun 27, 2017

I'm out of time these days, but can Ichip-in or help otherwise to make this happen?

This would be ground work for declarative documentation in Nix, to generate docs from the source code.

@Profpatsch
Copy link
Contributor Author

Profpatsch commented Jun 28, 2017

Right now, comments are skipped in the lexer: https://hackage.haskell.org/package/hnix-0.3.4/docs/src/Nix-Parser-Library.html#commentStyle, with CommentStyle.

Comments are tricky:

# function comment
# Something something
{ args /*here’s another comment*/
, moreArgs # but where did it come from
# and more important:
, evenMoreArgs
# where do these comments belong?
}:

/* should I be
   explicitely kept as multiline comment?
*/

# if so, is it necessary to keep the info that this was not a ML-comment?
# Maybe pretty-printing should handle that by looking at the line size?

/* also
 * what about
 * comments formatted like this
 *
 * Should lines be preserved?
 * What about the strange stars in front of each line?

And if the comment suddenly stops using stars?
What then?
*/

# not all that easy on first thought?

{
  # docstring?
  attrWithComment = "foo"; # and line comment

}

Comments seem to be completely removed in the lexer phase, stuff like

{ foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }
/*comment*/

parses into { foo = "23"; } like a charm.

That might make it a tad tricky to integrate them into the parsed AST in a sensible way?

@Profpatsch
Copy link
Contributor Author

Profpatsch commented Jun 28, 2017

I thought about it a bit more.
It doesn’t make very much sense to add comments to NExprF because they are on the lexer level and should probably be added to the annotations:

data Comment = Comment
  { commentType :: CommentType
  , commentContent :: Text
  , commentSpan :: SrcSpan }

data CommentType = SingleLine | MultiLine

data CommentSpan = CommentSpan
  { srcSpan :: SrcSpan
  , comments :: Comment }

type NExprCommentF = AnnF CommentSpan NExprF

Comments are parsed out into a list containing their contents & SrcSpans. Then a second pass can be used to convert this annotation to docstrings (by some more elaborate rules, like looking at the placement of comments and for instance combining multiple single line comments).

Right now the parser always creates a NExprLocF and then strips it away if one only uses nixExpr :: Parser NExpr. If we extend that do the same with comments, I’m not sure if the performance/memory footprint of nixExpr would suffer? Probably depends if the laziness of stripAnnotation <$> nixExprLoc reaches the many (annotateLocation (void $ symbol name))-part in nixExprLoc? I’m not sure.

@jwiegley
Copy link
Member

jwiegley commented Apr 2, 2018

@Profpatsch I like the idea of adding the comments to the annotations. I wouldn't worry about the speed impact just yet.

@deepfire
Copy link
Contributor

deepfire commented Apr 9, 2018

@Profpatsch, I'm slightly worried when you mention docstrings -- since they presumably can't be attached at arbitrary points, and also potentially due to the transformations you mention..

Do you envision being able to have an identity parser/pretty-printer with that scheme?

@Profpatsch
Copy link
Contributor Author

Do you envision being able to have an identity parser/pretty-printer with that scheme?

I don’t really have a vision, just incoherent thoughts. :)

hnix already keeps source spans by default:

parseNixFileLoc :: MonadIO m => FilePath -> m (Result NExprLoc)
type NExprLoc = Fix NExprLocF
type NExprLocF = AnnF SrcSpan NExprF

parseNixFile is just a parseNixFileLoc which throws away its SrcSpans (probably doesn’t even instantiate them into memory given enough laziness).

So I don’t see how something like parseNixFileLocComments :: MonadIO m => FilePath -> m (Result (Fix (AnnF CommentSpan (AnnF SourceSpan NExprF)))) changes the status quo.

@Ekleog
Copy link

Ekleog commented Apr 9, 2018

Hmm, so you're thinking of tying each comment to an AST item, if I understand correctly the type signature? I'm not sure how you'd be able to do it for all the examples you raised in #57 (comment): maybe full-line comments attach to the token after and part-of-line comments attach to the token before?

(for the context, I'm currently trying to implement a tool for indenting nix code that basically parses then pretty-prints it, and comments are likely the last missing piece to integrate, so I'm looking this quite closely :))

@Profpatsch
Copy link
Contributor Author

@Ekleog I suspect you’d have to just decide on what lexer token you assign them to.

The definition of CommentSpan in #57 (comment) is missing a list I think:

data CommentSpan = CommentSpan
  { srcSpan :: SrcSpan
  , comments :: [Comment] }

because of stuff like { foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }

@jwiegley jwiegley added the enhancement Betterness, without new explicit feature possibilities label Apr 12, 2018
@Ekleog
Copy link

Ekleog commented May 9, 2018

Hmm, but then the issue is to decide to which lexer token to assign the comments?

For a few comments it's easy, like:

nothere
here // my comment
nothere
nothere
here /* my comment */
nothere
nothere

// my comment
here
here
// my comment

nothere

But for others even as a human without additional context I couldn't guess to which token the following are to be assigned?

maybehere /* my comment */ orhere
maybehere

// my comment

orhere

Then I guess it's just good enough to autocratically pick one and say “this is it” :)

@taktoa
Copy link
Collaborator

taktoa commented Aug 21, 2018

It's been a long time since I looked at the hnix source code, so I'm not sure how much rearchitecting this would take, but what do you guys think about this way of integrating comments:

  1. Lex the source file into a stream of tokens, each of which has a corresponding source span
  2. Parse this into an AST, but instead of having source spans in the AST, you have offsets into the stream of tokens (or perhaps you have both).
  3. The stream of tokens contains comments, but they don't result in any AST nodes.

This allows the comment information to be fully preserved and accessible without making the AST cumbersome to use.

@costrouc
Copy link

In regards to the announced sprint tomorrow at nixcon it was said that adding parsing for comments is a wanted feature. I just wanted to ask that if this feature be added to https://github.com/NixOS/nix so that other language projects can benefit as well! I myself have been working on a python parser and the one thing that is dropped is comments due to the nix ast parser. Mic92/pythonix#2

@roberth
Copy link

roberth commented Oct 27, 2018

I've had a look at the AST and it seems that for the purpose of formatting nix code reliably, we will need to have access to token spans. This is because the AST simply can not (and should not) represent the comments and/or whitespace in every possible place. Example:

inherit /*a*/ ( /*b*/ foo) bar;

vs Binding constructor:

Inherit !(Maybe r) ![NKeyName r] !SourcePos

Of course preserving all comments is not a requirement for retaining comments, but both a formatter and some kind of comment processing seem to benefit from having a token stream, as @taktoa points out.

I'll see how far I can still get with tokenizing today at the NixCon 2018 hackday.

@Ekleog
Copy link

Ekleog commented Oct 27, 2018

As I see people pointing to formatting nix code reliably, I'll point to https://github.com/Ekleog/nix-bikeshed/ so that people don't need to re-do the same thing twice :)

It's more or less working, mostly missing comments at the time being. (actually, running it is how I found out #152: parsing, generating the file and re-parsing didn't give the same answer, and for my generation)

So if some support for comments could be added, it'd be great! :)

@domenkozar
Copy link
Collaborator

@Ekleog how does it compare to hnix formatter?

@Ekleog
Copy link

Ekleog commented Oct 27, 2018 via email

@roberth
Copy link

roberth commented Oct 27, 2018

Sorry guys, I have given up on lexing. My findings:

  • Error messages will be worse when encountering a lexing error. Most errors may not be lexing errors, but those that are will be awful.
  • The nix lexer has to support string interpolation. Chances that this can be ported to alex are low. Alex might have improved performance compared to parsing with combinators twice (lex-parse + parse)
  • Locations where we can't preserve whitespace/comments in the AST seem unlikely to be very problematic in practice
  • To support a comment in the empty list AST [/* empty because important */] we'll need to use the dual of Ann
  • We can reference the source text in the annotations to recover preceding and following comments on demand without IO
  • Reverse scanning can be done, but it won't be pretty. (Also, a better approach to it should exist and the approach should probably depend on the goal)

@NorfairKing
Copy link

NorfairKing commented Nov 3, 2018

+1

See https://github.com/NorfairKing/nixfmt for an attempt at a normalising formatter.
The only thing missing there are comments, as mentioned in the README.

@domenkozar
Copy link
Collaborator

@NorfairKing @Ekleog now we have three formatters :) It would be good to see what's the difference and just get hnix formatter to do the correct thing (or provide modes of operation).

@NorfairKing
Copy link

NorfairKing commented Nov 4, 2018

@domenkozar https://github.com/NorfairKing/nixfmt just provides a CLI wrapper around the hnix parser and pretty-printer, nothing more. If the hnix parser and pretty printer are fixed to deal with comments, nixfmt will work as intended.

@domenkozar
Copy link
Collaborator

domenkozar commented Nov 4, 2018

You can format by invoking hnix <filepath> :)

@Ekleog
Copy link

Ekleog commented Nov 4, 2018

@domenkozar I think Nix.Pretty doesn't try to format nix code but to pretty-print nix expressions, which is IMO a vastly different objective. For instance, pretty-printing YAML may be just setting it to always-indent like:

foo:
 bar:
  - a
  - b
  - c
 baz:
  - some long stuff
  - other long stuff
  - let us say this overflows max line length

While formatting it would be like:

foo:
 bar: [a, b, c]
 baz:
  - some long stuff
  - other long stuff
  - let us say this overflows max line length

As such, I think the objective of the two projects (Nix.Pretty and nix-bikeshed) don't match, and thus they aren't overlapping :)

As for having hnix do the correct thing, it's another problem, and it could very well call into a formatting library.

@domenkozar
Copy link
Collaborator

The difference would only be setting how long can a line be, so setting it to 1 char would "pretty-print" it. So it could easily be under the same codebase, I think.

@Ekleog
Copy link

Ekleog commented Nov 4, 2018

Well, we totally agree that a formatter can almost do pretty-printing. However I think is that a pretty-printer can't do formatting at all :)

(also, the difference also includes eg. whether ''foo'' can be turned to "foo" which changes the AST and thus can't be included in a pretty-printer, and similar stuff)

@NorfairKing
Copy link

Just my 2 cents: I don't much care about formatting vs pretty-printing. I just want a single normalized form of a given AST that is vaguely good for humans to edit. Something like hindent or gofmt. If that's done with a formatter instead of a pretty-printer: great, but I'm not (personally) interested in bikeshedding what the format would be.

@domenkozar
Copy link
Collaborator

domenkozar commented Nov 4, 2018

@NorfairKing same here. I worked on formatter to do "something sane", but I don't prefer the discussion on the style as much as I value consistency.

The current formatter is used in https://github.com/domenkozar/hnix-lsp and I'd love if hnix would preserve even most basic comments (after lines and in-between).

@Ekleog
Copy link

Ekleog commented Nov 4, 2018 via email

@jwiegley
Copy link
Member

jwiegley commented Nov 5, 2018

One advantage of pretty printing over formatting: smaller diffs. With automated formatting, you never know when it's going to decide to completely rewrite a large block of code because some limit has been reached.

@expipiplus1
Copy link
Collaborator

Did anything ever come of this?

I think that even a simple solution which recognizes comments only in specific places would be quite useful!

@expipiplus1
Copy link
Collaborator

expipiplus1 commented Oct 25, 2020

I've written a pretty basic comment annotator here: https://github.com/expipiplus1/update-nix-fetchgit/blob/2e9df344c40e289920e94c82dbce2d87ae54ecc5/src/Nix/Comments.hs

It simply attaches comments to the expression which immediately precedes them (so multiple expressions may own the same comment). It also only handles single line comments.

Requires a fix for #743 (#744)

@Anton-Latukha
Copy link
Collaborator

Anton-Latukha commented Jul 31, 2021

Would mention the core point of current comment processing:

hnix/src/Nix/Parser.hs

Lines 498 to 506 in 57a55fd

whiteSpace :: Parser ()
whiteSpace = do
put =<< getSourcePos
Lexer.space space1 lineCmnt blockCmnt
where
lineCmnt = skipLineComment' "#"
blockCmnt = Lexer.skipBlockComment "/*" "*/"

So, indeed (as points the topic), currently, HNix parser processes comments as whitespaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Art of how not to explain. enhancement Betterness, without new explicit feature possibilities
Projects
None yet
Development

No branches or pull requests