-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better concrete syntax semantics #671
Conversation
replace_node = "xs", | ||
replace = "@xs, 2" | ||
replace_node = "*", | ||
replace = "println2(@xs, 2)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since :[x]
can match multiple sequential siblings, this would be an infinite loop if we only rewrote :[x]
src/tests/test_piranha_java.rs
Outdated
@@ -463,7 +463,10 @@ fn test_multiple_code_bases() { | |||
// Note that we expect 2 matches because we have 2 code bases, and each code base has 1 match. | |||
// We also have another codebase `folder_3` but the `paths_to_codebase` does not include it. | |||
assert_eq!(output_summaries.len(), 2); | |||
assert_frequency_for_matches(&output_summaries, &HashMap::from([("match_import", 2)])); | |||
assert_frequency_for_matches(&output_summaries, &HashMap::from([("match_import", 4)])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snippet
package org.piranha.examples;
import java.util.List;
[0, 0] - [4, 0]
[package_declaration] [0, 0] - [0, 29]
[scoped_identifier] [0, 8] - [0, 28]
scope: [scoped_identifier] [0, 8] - [0, 19]
scope: [identifier] [0, 8] - [0, 11]
name: [identifier] [0, 12] - [0, 19]
name: [identifier] [0, 20] - [0, 28]
[import_declaration] [2, 0] - [2, 22]
[scoped_identifier] [2, 7] - [2, 21]
scope: [scoped_identifier] [2, 7] - [2, 16]
scope: [identifier] [2, 7] - [2, 11]
name: [identifier] [2, 12] - [2, 16]
name: [identifier] [2, 17] - [2, 21]
Now has two ways to match. Entire the entire sequential children of the import_declaration
node, or the last node of the program
node
src/models/concrete_syntax.rs
Outdated
&& recursive_matches[var_name].text.trim() != current_node_code.trim() | ||
{ | ||
return (HashMap::new(), false); | ||
let mut should_match = find_next_sibling(&mut tmp_cursor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determine whether the next recursive call to get_matches_for_node
should match any node. If there are no "siblings" left, then we should not match anything
src/models/concrete_syntax.rs
Outdated
let mut last_node = the_node.child(the_node.child_count() - 1); | ||
if let Some(last_node_index) = indx { | ||
last_node = the_node.child(last_node_index); | ||
matched = matched && (last_node_index != child_incr || the_node.child_count() == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit convoluted and should be refactored.
Here we say, we match if:
- we were able to align the sequence with the templates
- the sequence is of length > 1, UNLESS the node only has one child
src/models/concrete_syntax.rs
Outdated
/// | ||
/// 1. Initialize cursor to the first child and iterate through siblings. | ||
/// 2. Use `get_matches_for_node` to attempt matching the template against the subtree starting at each sibling. | ||
/// 3. If a match is found, determine the range of matched nodes and return the match mapping, status, and range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status?
src/models/concrete_syntax.rs
Outdated
/// 4. If no match is found, return an empty mapping, false status, and None for range. | ||
pub(crate) fn match_sequential_siblings( | ||
cursor: &mut TreeCursor, source_code: &[u8], meta: &ConcreteSyntax, | ||
) -> (HashMap<String, CapturedNode>, bool, Option<Range>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could make a struct for this
src/models/concrete_syntax.rs
Outdated
/// 3. If a match is found, determine the range of matched nodes and return the match mapping, status, and range. | ||
/// 4. If no match is found, return an empty mapping, false status, and None for range. | ||
pub(crate) fn match_sequential_siblings( | ||
cursor: &mut TreeCursor, source_code: &[u8], meta: &ConcreteSyntax, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to cs
src/models/concrete_syntax.rs
Outdated
let range = Range::from_siblings(cursor.node().range(), last_node.unwrap().range()); | ||
return ( | ||
mapping, | ||
last_node_index != child_incr || parent_node.child_count() == 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add a comment around this ?
Just extract this condition into a variable, document it and pass the variable.
src/models/concrete_syntax.rs
Outdated
|
||
if let (mut recursive_matches, true) = | ||
get_matches_for_node(&mut tmp_cursor, source_code, &meta_advanced) | ||
let mut tmp_cursor = cursor.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codument - why this clone
src/models/concrete_syntax.rs
Outdated
let mut should_match = find_next_sibling(&mut tmp_cursor); // Advance the cursor to match the rest of the template | ||
let mut is_final_sibling = false; | ||
loop { | ||
let mut walkable_cursor = tmp_cursor.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
src/models/concrete_syntax.rs
Outdated
parent_node | ||
.children(&mut parent_node.walk()) | ||
.enumerate() | ||
.find_map(|(i, child)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use filter
instead of if
inside find_map(...)
src/models/matches.rs
Outdated
@@ -321,6 +321,15 @@ impl Range { | |||
end_point: position_for_offset(source_code.as_bytes(), mtch.end()), | |||
} | |||
} | |||
|
|||
pub(crate) fn from_siblings(left: tree_sitter::Range, right: tree_sitter::Range) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this function like merging two ranges?
Co-authored-by: Ameya Ketkar <94497232+ketkarameya@users.noreply.github.com>
src/models/concrete_syntax.rs
Outdated
cursor: &mut TreeCursor, source_code: &[u8], cs: &ConcreteSyntax, | ||
) -> Option<MatchResult> { | ||
let parent_node = cursor.node(); | ||
let mut child_incr = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be renamed to child_seq_match_start
Overview
This pull request introduces two significant changes to the semantics of the concrete syntax matching algorithm. These changes enhance the flexibility and expressiveness of the pattern matching with concrete syntax.
Change 1: Semantics of Template Variables
:[var]
Previous Semantics
Previously, template variables
:[var]
could only match entire subtrees. This means it was not possible to match sequences of sibling nodes. For example, if matching[var] = 0;
againstint x = 0;
, the match would fail becauseint
andx
are sibling nodes and not part of the same subtree.New Semantics
Now template variables can match multiple sequential siblings at the same level. So matching
[var] = 0;
withint x = 0;
is possible.Change 2: Semantics of Matching Concrete Template
Previous Semantics
Previously, the algorithm required that the concrete template match an entire node, and only one node. This restriction limited the pattern matching to single statements rather than sequences of statements. For example:
This template could never match, because it corresponds to two sibling nodes, rather than a single subtree. For example, the snippet
has this tree representation.
Notice that
int x = 0
; andx++
; are sibling nodes. The template would never match just ONE node.New Semantics
Now, the template can match entire sequential sibling nodes. This allows the template to match sequences of statements.
For example, the following code:
Code:
matches the following template.
Template:
and the alignment is as follow (loosely)