Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure every CST element has a SOURCE, suggest how to capture whitespace and comments #28

Open
eschulte opened this issue May 24, 2018 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@eschulte
Copy link

I'd like to be able to parse a source file into a tree structure which includes all information required to build the source (including whitespace and comments). This would be easier if every parsed CST element has a valid associated source range. Currently that isn't the case for (at least) QUOTE elements. See:

SEL/LISP-DIFF> (labels ((maptree (fn tree)
                          (if (concrete-syntax-tree:consp tree)
                              (cons
                               (maptree fn (concrete-syntax-tree:first tree))
                               (maptree fn (concrete-syntax-tree:rest tree)))
                              (funcall fn tree))))
                 (maptree #'concrete-syntax-tree:source
                          (with-input-from-string (in "'(1 2 3 4)")
                            (cst-read in))))
(NIL ((2 . 3) (4 . 5) (6 . 7) (8 . 9)))

I've taken a shot at building this using cst-read and record-skipped-input with some custom code to fold everything together and add whitespace. See https://github.com/GrammaTech/sel/blob/master/ast-diff/lisp-diff.lisp#L71, is there a better way?

Thanks!

@scymtym
Copy link
Member

scymtym commented May 24, 2018

There are two separate questions here:

  1. Can't we ensure that every CST node has an associate source range?

    For eclector.concrete-syntax-tree:cst-read we take the position that not all CST nodes have a corresponding source range.

    • One reason are reader macros which can fabricate arbitrary expressions not related to the source at all. The symbol quote in (quote 1 2 3 4) is a simple example of this.

    • Another reason are parts of the list structure that have no representation in the input string:

      (labels ((maptree (fn tree)
                 (if (concrete-syntax-tree:consp tree)
                     (list
                      (funcall fn tree) ; CHANGED
                      (maptree fn (concrete-syntax-tree:first tree))
                      (maptree fn (concrete-syntax-tree:rest tree)))
                     (funcall fn tree))))
        (maptree #'concrete-syntax-tree:source
                 (with-input-from-string (in "'(1 2 3 4)")
                   (eclector.concrete-syntax-tree:cst-read in))))
      ((0 . 10) NIL (NIL ((1 . 10) (2 . 3) (NIL (4 . 5) (NIL (6 . 7) (NIL (8 . 9) NIL)))) NIL))

    A custom client can of course implement a different behavior. To this end, consider the wip-parse-result-protocol-2 branch and in particular https://github.com/robert-strandh/Eclector/blob/wip-parse-result-protocol-2/code/parse-result/second-climacs-test.lisp .

  2. Is there a better way to integrate skipped input into the parse result?

    Again, consider the wip-parse-result-protocol-2 branch and in particular https://github.com/robert-strandh/Eclector/blob/wip-parse-result-protocol-2/code/parse-result/second-climacs-test.lisp .

    There is a good chance we will do something very similar in the master branch to better support clients other than cst-read.

@scymtym scymtym added the question Further information is requested label May 24, 2018
@scymtym scymtym self-assigned this May 24, 2018
@eschulte
Copy link
Author

eschulte commented Jun 8, 2018

Thank you for the pointer to that branch. I hope this functionality makes its way into master. Using the branch and adopting the example you cited, I was able to get exactly what I wanted (a tree structure with sufficient information to fully reproduce the full text of the source file) with a much simpler implementation. See https://github.com/GrammaTech/sel/blob/wip-lisp-diff-uses-eclector-parse-result-protocol-2/ast-diff/lisp-diff.lisp

@scymtym scymtym closed this as completed in c3de63d Aug 9, 2018
@eschulte
Copy link
Author

eschulte commented Sep 6, 2019

To followup, over a year later we're still using this branch. Any chance this functionality will (or maybe has already) make its way to master?

@scymtym
Copy link
Member

scymtym commented Sep 6, 2019

Any chance this functionality will (or maybe has already) make its way to master?

This has not happened yet, but several steps towards that goal are underway.

@scymtym scymtym reopened this Sep 6, 2019
@scymtym
Copy link
Member

scymtym commented Sep 6, 2019

To clarify, the parse-result protocol has made it into master, but I'm not sure it fully supports this use-case.

@scymtym
Copy link
Member

scymtym commented Sep 7, 2019

I made GrammaTech/sel#11. Let me know if that works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants