Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String escapes #3

Merged
merged 4 commits into from
Jul 26, 2021
Merged

String escapes #3

merged 4 commits into from
Jul 26, 2021

Conversation

dimbleby
Copy link
Contributor

Some comments

  • This all looks right to me, but I'm a treesitter noob so don't trust me on that
  • I couldn't figure out how to exclude comment sequences from unquoted strings, so I haven't
  • I wasn't sure whether it was expected to include negative tests in the corpus ie to include the things that should fail to parse and verify that we get an error. This doesn't seem to be the done thing in a couple of parsers that I looked at, so I didn't do it here either
  • neovim by default doesn't link errors to any highlight group (because it's too annoying while you're typing apparently) so the query for errors doesn't do anything unless your colorscheme explicitly adds something for TSError.

@Hubro Hubro self-requested a review July 25, 2021 22:10
Copy link
Owner

@Hubro Hubro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! 😁

There seems to be some errors produced when parsing the standard models though:

$ ./scripts/test-parsing-yang-modules.sh
Total parses: 368; successful parses: 363; failed parses: 5; success percentage: 98.64%

Could you check that out?

@dimbleby
Copy link
Contributor Author

dimbleby commented Jul 25, 2021

Oh, interesting.

Looks like the problematic files have patterns, like:

  typedef ieNameType {
    type string {
      length "1..max";
      pattern "\S+";
    }
    description "Type for Information Element names.  Whitespaces
      are not allowed.";
  }

But according to 6.1.3:

The backslash MUST NOT be followed by any other character.

which would make that double-quoted pattern completely not OK.

9.4.5 says:

The "pattern" statement, which is an optional substatement to the
"type" statement, takes as an argument a regular expression string,
as defined in [XSD-TYPES].

Not too sure how these parts of the spec can be reconciled, except perhaps by interpreting a "regular expression string" to be somehow outside of the normal rules for strings. Or perhaps we have discovered that these modules really are faulty.

Awkward!

@dimbleby
Copy link
Contributor Author

YangModels/yang#366 suggests that the parser is right and the models are wrong.

@dimbleby
Copy link
Contributor Author

So I think probably the way to resolve this is to exclude those faulty models from that test. What do you think?

@Hubro
Copy link
Owner

Hubro commented Jul 26, 2021

One thing I'm not entirely clear on, is the string argument for pattern interpreted just like a normal string, or is it a special case? Does the YANG compiler complain about "\S+" or does it work? 🤔 I mean, how does a standard module get published without even compiling...

In any case I'm fine with excluding them as you have done.

One thing though, if I pull this branch and run npm run build it modifies stuff:

tree-sitter-yang on  string-escapes [!] is 📦 v0.0.1 via ⬢ v14.17.3 via 🐍 pyenv via 🦀 v1.53.0
➜ gs
On branch string-escapes
Your branch is up to date with 'origin/string-escapes'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   src/parser.c
	modified:   src/tree_sitter/parser.h

no changes added to commit (use "git add" and/or "git commit -a")

Did you generate the parser before you committed? Or is there something environmental that gives us different C output?


diff --git a/src/parser.c b/src/parser.c
index fefc9e0..409ef25 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -179,7 +179,7 @@ enum {
   alias_sym_enum_value = 152,
 };

-static const char *ts_symbol_names[] = {
+static const char * const ts_symbol_names[] = {
   [ts_builtin_sym_end] = "end",
   [sym__word] = "_word",
   [anon_sym_module] = "module",
@@ -1113,7 +1113,7 @@ enum {
   field_submodule_name = 4,
 };

-static const char *ts_field_names[] = {
+static const char * const ts_field_names[] = {
   [0] = NULL,
   [field_module_block] = "module_block",
   [field_module_name] = "module_name",
diff --git a/src/tree_sitter/parser.h b/src/tree_sitter/parser.h
index a3a87bd..cbbc7b4 100644
--- a/src/tree_sitter/parser.h
+++ b/src/tree_sitter/parser.h
@@ -102,8 +102,8 @@ struct TSLanguage {
   const uint16_t *small_parse_table;
   const uint32_t *small_parse_table_map;
   const TSParseActionEntry *parse_actions;
-  const char **symbol_names;
-  const char **field_names;
+  const char * const *symbol_names;
+  const char * const *field_names;
   const TSFieldMapSlice *field_map_slices;
   const TSFieldMapEntry *field_map_entries;
   const TSSymbolMetadata *symbol_metadata;

@dimbleby
Copy link
Contributor Author

I can't see any justification for treating pattern strings as a special case: so far as I can see those models are simply bad.

Presumably we have slightly different versions of tree-sitter CLI. I am usuing v0.20.0, which I have compiled locally.

The other idea I had re fixing up the testing of parsing standard YANG modules was that we could patch them to be correct before testing. Slightly more trouble than excluding the bad models completely, but maybe better?

@Hubro Hubro merged commit 89b0397 into Hubro:master Jul 26, 2021
@Hubro
Copy link
Owner

Hubro commented Jul 26, 2021

Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants