Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better concrete syntax semantics #671

Merged
merged 25 commits into from
Jul 1, 2024
Merged

Better concrete syntax semantics #671

merged 25 commits into from
Jul 1, 2024

Conversation

danieltrt
Copy link
Collaborator

@danieltrt danieltrt commented Jun 3, 2024

Overview

This pull request introduces two significant changes to the semantics of the concrete syntax matching algorithm. These changes enhance the flexibility and expressiveness of the pattern matching with concrete syntax.

Change 1: Semantics of Template Variables :[var]

Previous Semantics

Previously, template variables :[var] could only match entire subtrees. This means it was not possible to match sequences of sibling nodes. For example, if matching [var] = 0; against int x = 0;, the match would fail because int and x are sibling nodes and not part of the same subtree.

graphviz-3

New Semantics

Now template variables can match multiple sequential siblings at the same level. So matching
[var] = 0; with int x = 0; is possible.

graphviz-4

Change 2: Semantics of Matching Concrete Template

Previous Semantics

Previously, the algorithm required that the concrete template match an entire node, and only one node. This restriction limited the pattern matching to single statements rather than sequences of statements. For example:

int :[var] = 0;
:[var]++;

This template could never match, because it corresponds to two sibling nodes, rather than a single subtree. For example, the snippet

{
  int x = 0;
  x++;
  print(x);
}

has this tree representation.

graphviz-5

Notice that int x = 0; and x++; are sibling nodes. The template would never match just ONE node.

New Semantics

Now, the template can match entire sequential sibling nodes. This allows the template to match sequences of statements.

For example, the following code:

Code:

// just a regular loop in C++
int some = 0; 
while(some < 100) { 
    float length = 3.14;
    float area = length * length;
    some++; 
}
print(some);

matches the following template.

Template:

int :[var] = 0; 
while(:[var] < 100) { 
    :[body]
    
    :[var]++; 
}

and the alignment is as follow (loosely)

graphviz-6

replace_node = "xs",
replace = "@xs, 2"
replace_node = "*",
replace = "println2(@xs, 2)"
Copy link
Collaborator Author

@danieltrt danieltrt Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since :[x] can match multiple sequential siblings, this would be an infinite loop if we only rewrote :[x]

@@ -463,7 +463,10 @@ fn test_multiple_code_bases() {
// Note that we expect 2 matches because we have 2 code bases, and each code base has 1 match.
// We also have another codebase `folder_3` but the `paths_to_codebase` does not include it.
assert_eq!(output_summaries.len(), 2);
assert_frequency_for_matches(&output_summaries, &HashMap::from([("match_import", 2)]));
assert_frequency_for_matches(&output_summaries, &HashMap::from([("match_import", 4)]));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet

package org.piranha.examples;

import java.util.List;
[0, 0] - [4, 0]
[package_declaration] [0, 0] - [0, 29]
[scoped_identifier] [0, 8] - [0, 28]
    scope: [scoped_identifier] [0, 8] - [0, 19]
        scope: [identifier] [0, 8] - [0, 11]
        name: [identifier] [0, 12] - [0, 19]
   name: [identifier] [0, 20] - [0, 28]
[import_declaration] [2, 0] - [2, 22]
    [scoped_identifier] [2, 7] - [2, 21]
        scope: [scoped_identifier] [2, 7] - [2, 16]
            scope: [identifier] [2, 7] - [2, 11]
            name: [identifier] [2, 12] - [2, 16]
    name: [identifier] [2, 17] - [2, 21]

Now has two ways to match. Entire the entire sequential children of the import_declaration node, or the last node of the program node

&& recursive_matches[var_name].text.trim() != current_node_code.trim()
{
return (HashMap::new(), false);
let mut should_match = find_next_sibling(&mut tmp_cursor);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determine whether the next recursive call to get_matches_for_node should match any node. If there are no "siblings" left, then we should not match anything

let mut last_node = the_node.child(the_node.child_count() - 1);
if let Some(last_node_index) = indx {
last_node = the_node.child(last_node_index);
matched = matched && (last_node_index != child_incr || the_node.child_count() == 1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit convoluted and should be refactored.

Here we say, we match if:

  1. we were able to align the sequence with the templates
  2. the sequence is of length > 1, UNLESS the node only has one child

@danieltrt danieltrt changed the title [WIP] Better concrete syntax semantics Better concrete syntax semantics Jun 4, 2024
src/models/concrete_syntax.rs Outdated Show resolved Hide resolved
///
/// 1. Initialize cursor to the first child and iterate through siblings.
/// 2. Use `get_matches_for_node` to attempt matching the template against the subtree starting at each sibling.
/// 3. If a match is found, determine the range of matched nodes and return the match mapping, status, and range.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status?

/// 4. If no match is found, return an empty mapping, false status, and None for range.
pub(crate) fn match_sequential_siblings(
cursor: &mut TreeCursor, source_code: &[u8], meta: &ConcreteSyntax,
) -> (HashMap<String, CapturedNode>, bool, Option<Range>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could make a struct for this

/// 3. If a match is found, determine the range of matched nodes and return the match mapping, status, and range.
/// 4. If no match is found, return an empty mapping, false status, and None for range.
pub(crate) fn match_sequential_siblings(
cursor: &mut TreeCursor, source_code: &[u8], meta: &ConcreteSyntax,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to cs

let range = Range::from_siblings(cursor.node().range(), last_node.unwrap().range());
return (
mapping,
last_node_index != child_incr || parent_node.child_count() == 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a comment around this ?
Just extract this condition into a variable, document it and pass the variable.

src/models/concrete_syntax.rs Show resolved Hide resolved

if let (mut recursive_matches, true) =
get_matches_for_node(&mut tmp_cursor, source_code, &meta_advanced)
let mut tmp_cursor = cursor.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codument - why this clone

let mut should_match = find_next_sibling(&mut tmp_cursor); // Advance the cursor to match the rest of the template
let mut is_final_sibling = false;
loop {
let mut walkable_cursor = tmp_cursor.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

parent_node
.children(&mut parent_node.walk())
.enumerate()
.find_map(|(i, child)| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use filter instead of if inside find_map(...)

@@ -321,6 +321,15 @@ impl Range {
end_point: position_for_offset(source_code.as_bytes(), mtch.end()),
}
}

pub(crate) fn from_siblings(left: tree_sitter::Range, right: tree_sitter::Range) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function like merging two ranges?

cursor: &mut TreeCursor, source_code: &[u8], cs: &ConcreteSyntax,
) -> Option<MatchResult> {
let parent_node = cursor.node();
let mut child_incr = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be renamed to child_seq_match_start

@danieltrt danieltrt requested review from dvmarcilio, stefanheule and yuxincs and removed request for dvmarcilio June 14, 2024 00:43
@danieltrt danieltrt merged commit 3f79dcb into master Jul 1, 2024
10 checks passed
@danieltrt danieltrt deleted the better_concrete branch July 2, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants