-
-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build out rule crawling mechanisms #3717
Conversation
I'll do a proper review later. One quick thought about |
...and linting
5bfa7a7
to
9a37bfa
Compare
Go on? The problem I see is that at the moment a segment isn't aware of it's parent. All the references go down the tree not up. While crawling we do have good access to the parent, but the previous raw segment might not be in the parent, it might be several layers up. Combine that with more aggressive skipping, where we don't process every raw segment anyway - then the easiest way to get the previous raw would be to use |
Codecov Report
@@ Coverage Diff @@
## main #3717 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 176 178 +2
Lines 13466 13603 +137
==========================================
+ Hits 13466 13603 +137
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Definitely not a necessity. It was a good optimization for the current design. We can get rid of it for now. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! A few small tactical suggestions, but overall direction is 💯.
@WittierDinosaur, are there any specific rules you've noticed use a lot of runtime that could benefit from using the new crawl behavior? Interested in testing this at some point on some of your large SQL files?
Co-authored-by: Barry Hart <barrywhart@yahoo.com>
Co-authored-by: Barry Hart <barrywhart@yahoo.com>
Co-authored-by: Barry Hart <barrywhart@yahoo.com>
Co-authored-by: Barry Hart <barrywhart@yahoo.com>
+ Don't duplicate the work of L001
89c39f6
to
108cffc
Compare
I think the performance issue comes from continuously building and appending to long sequences (every raw segment in the file). That's pretty expensive whether a list or a tuple is used. We don't need to address it in this PR, which significantly improves performance in other ways that probably drown out the impact of this. Just sharing the thought as a possibility for future performance work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! Lots of code to review, but I tried to identify any red flags. The existing test cases plus the "rules critical error" checks tend to be very helpful, but I did find a few potential issues where an IndexError may be triggered.
@@ -79,7 +80,7 @@ def _handle_preceding_inline_comments(before_segment, anchor_segment): | |||
if s.is_comment | |||
and s.name != "block_comment" | |||
and s.pos_marker.working_line_no | |||
== anchor_segment.pos_marker.working_line_no | |||
== anchor_segment.raw_segments[-1].pos_marker.working_line_no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible raw_segments
is empty and will fail with IndexError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question 🤔 . I'm going to have to look into that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this one - we can actually be sure that anchor_segment
is actually a RawSegment
itself. Calling .raw_segments
on a RawSegment
just returns [self]
, so this can just be reduced to anchor_segment.pos_marker.working_line_no
. I've updated the mypy type hints accordingly and simplified the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - I'm wrong. anchor_segment
isn't always a RawSegment
(found that out in the now failing tests). However, raw_segments
will always have length - either a segment will have children, or it will return [self]
on .raw_segments
if it's a Rawsegment
.
Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/core/parser/segments/base.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update test/core/parser/segments_base_test.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/rules/L053.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/core/rules/context.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/core/rules/context.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/core/rules/crawlers.py Co-authored-by: Barry Hart <barrywhart@yahoo.com> Update src/sqlfluff/rules/L010.py Co-authored-by: Barry Hart <barrywhart@yahoo.com>
a1b1252
to
4ab0855
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I fixed one small typo in a docstring. Good to merge once the build passes.
Excellent work!! 🎉🥳
This builds on the work started with
CrawlBehavior
in the base rules file. I've extended that into a new module with a few variants on the idea and some more flexibility for specific rules. More specifically the newest logic is inSegmentSeekerCrawler
which more aggressively prunes the tree if segments of particular types do not exist anywhere within a segment or it's children. On the less clever side, I've also introducedRootOnlyCrawler
which will be relevant for rules which want to handle their own crawling entirely (which is what I'm thinking with reflow application, see #3673). To facilitate this,RuleContext
has now also been moved (mostly untouched) to it's own file.To achieve this I've done a few other things:
SegmentMetaclass
to allow caching of_class_types
.BaseSegment
for accessing types.repr()
method forLintResult
and several places with additional logging to help debugging.CrawlBehavior
class..is_final_segment()
which is now unused (I thought about this one where I could just put apragma: no cover
on it - but we could always recreate this method if needed later from the git history).context.raw_segment_pre
has been removed, partly because it's hard to reimplement in this model but also because the new, more efficient, crawling is already a lot more efficient.L016
L039
L052
L053
L063
This is a BIG pull request, but I'm not sure there's a simple way of doing it in less than one big push.