Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotated Text for C# and VB #1775

Closed
AdamSpeight2008 opened this issue Aug 6, 2018 · 32 comments
Closed

Annotated Text for C# and VB #1775

AdamSpeight2008 opened this issue Aug 6, 2018 · 32 comments

Comments

@AdamSpeight2008
Copy link

AdamSpeight2008 commented Aug 6, 2018

I've be think about abstracting out the internal definitions of String / Char syntax nodes in C# and VB.
So that it allows us raise some validation from syntax to semantics, as well as acting as base to help implement addition textual constructs.

Syntax Layout (Draft)

syntax AnnotatedText
{
  .Prefix  : Optional<Prefix>
  .Body    : Content
  .Postfix : Optional<Postfix>

  abstract syntax PostFix
  {
    .Content : OneOrMore<VirtualCharacter>
  }
  abstract syntax Postfix
  {
    .Chars : OneOrMore<VirtualChar>
  }
  abstract syntax Content
  {
    .Chars   : ZeroOrMore<VirtualCharacter>
    .Content : Optional<SyntaxNode>
   
  }
}
String_Content : AnnotatedText.Content<String> { }

Existing Definitions as AnnotatedText

StringLiteral        : AnnotatedText with { .Prefix  = '",
                                            .Postfix = '"',
                                            .Content : String_Content
					  }
CharacterLiteral_VB  : AnnotatedText with { .Prefix  = '"',
                                            .Postfix = '"' ( 'c' | 'C' ),
                                            .Content : String_Content
                                          }
CharacterLiteral_CS  : AnnotatedText with { .Prefix  = ''',
                                            .Postfix = ''',
                                            .Content : String_Content
                                          }
Interpolated         : AnnotatedText with { .Prefix  = '$' '"',
                                            .Postfix = '"',
                                            .Content : Interpolated_Content
                                          }
Verbatim             : AnnotatedText With { .Prefix  = '@' '$',
                                            .Postfix = '"',
                                            .Content : Verbatim_Content
                                          }
VerbatimInterpolated : AnnotatedText with { .Prefix= ('$' '@') | ('@' '$'),
                                            .Postfix= '"'
                                            .Content : Verbatim_Interpolated_Content
                                          }

Reference: VirtualChar

Semantic Validation

Dim c0 = " "c
Dim c1 = ""c
         ' ""c 
         ' ~~ Invalid Char Literal: One Character was expected.
         ' ""c
         '   ~ String Literal has an unexpected Char Literal character following it.
Dim c2 = "abc"c
         ' "abc"c Char Literal was expecting on one character within the quote.
         '   ~~   Invalid
         ' "abc"c
         '      ~ String Literal has an unexpected Char Literal character following it.

Which allows possible codefixes to be offered. Remove Invalid Characters and Remove Char Literal Character.

Prefix

The Prefix part of the design allows the IDE to support contextual Intellisense, compile-time validation and aid the colourisation.

Content

This provide specific contextual parsing / syntax.

               String_Content : AnnotatedText.Content<String> { }
             Verbatim_Content : AnnotatedText.Content< ...> { }
Verbatim_Interpolated_Content : AnnotatedText.Content< ...> { }
         Interpolated_Content : AnnotatedText.Content< ...> { }

Examples

GUID Literal

                 GUID_Content : AnnotatedText.Content<GUID> { }

Guid : AnnotatedText with { .Prefix =: 'g' | 'G',
                            .Postfix = '"'
                            .Content : GUID_Content
                          }
var guid = g"00000000000000000000000000000000"; /* typed as Guid */
AnnotatedText g = g"00000000000000000000000000000000";

JSON

                 JSON_Content : AnnotatedText.Content<JToken> { }
JSON : AnnotatedText with { .Prefix =: "#JSON#" '"',
                            .Postfix = '"'
                            .Content : JSON_Content
                          }
Dim v1 As String = #JSON#"{
first: 0,
second@ ['s1', /*comment*/ 's2']
}"
@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@AdamSpeight2008

I think that it's overcomplicated and under performs. For example, in your JSON example, due to generic parsing of the string you eliminate the ability to include double quotes. Strict JSON only accepts double quotes for strings. So you could switch to using single quotes as the delimiters, but then you preclude the ability to include single quotes within strings. You'd need a second escape mechanism just to support that, twice over as it also needs to support escaped characters embedded in the JSON also.

For normal strings that might require validation I'd much prefer the simple analyzer approach that can be tuned to know when a String literal might contain JSON either by recognizing known method invocation patterns (e.g. JObject.Parse("foo")) or with accompanying no-op helper methods (e.g. string json = JSON("foo")). The language doesn't need to be bogged down with this.

@AdamSpeight2008
Copy link
Author

AdamSpeight2008 commented Aug 6, 2018

@HaloFour
Escape sequences are already supported by the the existing parser eg \" or "", I'm not proposing changing that.

In the JSON example the content could be plain old contents of string.
Or it could be provided through an extension plugin parser, which validates the contents annotated string as if it was JSON, thus supplying additional information about the string content, resulting better validation at compile-time / editing-time. Additional resultant structural representation of the JSON or or structured text being accessible via the .Content property. This provides a mechanism for providing embedded languages.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@AdamSpeight2008

I'm not proposing changing that.

You have to propose changing that if you're planning on having custom prefixes and postfixes. That would completely change the meaning of " within a string, thus the meaning of \". And if the custom prefix/postfix is or contains \ then who knows what that means.

@bondsbw
Copy link

bondsbw commented Aug 6, 2018

I like the idea of custom string literals. Custom escape sequences, compile-time and/or runtime errors, and custom parsing can really add value and enable embedded DSLs.

The compile-time side would probably need to be something that could plug into Roslyn akin to an analyzer. Then there wouldn't be a need for the custom syntax. (Of course, you could then create the custom syntax on top of that using such an embedded DSL.)

@CyrusNajmabadi
Copy link
Member

@AdamSpeight2008 This feels very unnecessary for something at the csharplang level. This seems to be about tooling/APIs. Why would it not be sufficient to just have /*json*/"{ a: 1 }" for example?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Aug 6, 2018

Note: what would be interesting (and i think is in line with what @HaloFour is saying) would be if you could have a language construct that then allowed you to control things like escaping/quotation. Because then you could embed something like html, without having to figure out how to escape every quote character you will definitely encounter.

If htis doesn't really address that, then this is effectively mostly a request to make public the work i'm doing over in the linked PR. That would be more appropriate over at dotnet/roslyn as this is asking for a plugin API for DSLs. I'm definitely supportive of that. However, it would be driven elsewhere.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@CyrusNajmabadi

cough #89 cough 😁

@iam3yal
Copy link
Contributor

iam3yal commented Aug 6, 2018

@CyrusNajmabadi

Why would it not be sufficient to just have /json/"{ a: 1 }" for example?

Semantically, this feels wrong, I understand why you would suggest that but asking why it wouldn't be sufficient is really weird coming from you which to me feels almost as if you're trolling, regardless, here are 4 reasons why it wouldn't be sufficient:

  1. Comments are meant to be used to comment code or writes notes about code not to provide hints to productivity tools, this is absurd because there's no agreement nor standard around it, you chose to use /*json*/ but R# actually expects /*language=json*/ and go figure what the next tool will expect.

  2. Comments have no structure and are error prone, you aren't going to get any intellisense for the available hints or anything at all really.

  3. If tooling aren't going to remove comments then people will because they might not be aware that it's actually a hint that supposed to improve developers' productivity.

  4. Finally, how it's supposed to work when multiple productivity tools/editors have their own way of hinting that a string is JSON?

Something like this should be addressed by the language and in my opinion this is a step in the right direction, combining #89 and #1452 we could do something like this [Json] @({ "a": 1 }) which is pretty neat even though for this trivial case the attribute might not be needed.

@HaloFour
Copy link
Contributor

HaloFour commented Aug 6, 2018

@eyalsk

Finally, how it's supposed to work when multiple productivity tools/editors have their own way of hinting that a string is JSON?

Source-only attributes just adds another way that tools/editors have for hinting that the string is JSON. 😄

@iam3yal
Copy link
Contributor

iam3yal commented Aug 6, 2018

@HaloFour

Source-only attributes just adds another way

I'm puzzled.

@CyrusNajmabadi
Copy link
Member

Semantically, this feels wrong, I understand why you would suggest that but asking why it wouldn't be sufficient is really weird coming from you which to me feels almost as if you're trolling, regardless, here are 4 reasons why it wouldn't be sufficient:

Not trolling. It was a genuine question.

Comments are meant to be used to comment code or writes notes about code not to provide hints to productivity tools, this is absurd because there's no agreement nor standard around it, you chose to use /json/ but R# actually expects /language=json/ and go figure what the next tool will expect.

Your second sentence completely contradicts the first. You say they're not supposed to provide hints to productivity tools... and then you show how they're exactly used to provide hints to productivity tools :)

Comments have no structure and are error prone, you aren't going to get any intellisense for the available hints or anything at all really.

Why not? We can have intellisense do whatever we want it to do. Also, using comments to provide extra info has a long history of prior art. TypeScript still uses this as a core part of its language/tooling story.

If tooling aren't going to remove comments then people will because they might not be aware that it's actually a hint that supposed to improve developers' productivity.

I don't get this. So what if users remove it? If it's not helping them, then they'll remove it and will be fine. If it's helping them... they won't remove it :)

Finally, how it's supposed to work when multiple productivity tools/editors have their own way of hinting that a string is JSON?

Easy. Have those tools update themselves to support the community conventions. I mean... how would you expect it to work with Adam's suggestion? Why/how would resharper understand #json#?

@CyrusNajmabadi
Copy link
Member

Source-only attributes just adds another way that tools/editors have for hinting that the string is JSON.

Agreed. Humorously enough, part of the discussion around source-attributes is that they're simply 'trivia'. That way from a language/impl perspective you can have them between any tokens. They would effectively just be comments, but perhaps with a slight bit of structure to them to help drive consistency across usage.

if we had that, then i would simply say:

Use a source-only attribute to mark the string as being json. [[json]]"{ a: 1 }".

I don't see hte need for a specific type of string literal. Just have a generalized annotation ability (and, absent that, just use a comment). It's more than sufficient for the needs of intellisense.

@iam3yal
Copy link
Contributor

iam3yal commented Aug 7, 2018

@CyrusNajmabadi

Not trolling. It was a genuine question.

Okay. :)

Your second sentence completely contradicts the first. You say they're not supposed to provide hints to productivity tools... and then you show how they're exactly used to provide hints to productivity tools :)

lol, it's merely poor wording on my part but what I meant is that comments shouldn't be used for hints but they are by some productivity tools because we don't have a better alternative.

Why not? We can have intellisense do whatever we want it to do. Also, using comments to provide extra info has a long history of prior art. TypeScript still uses this as a core part of its language/tooling story.

Are you referring to JSDoc? JSDoc was defined prior to TypeScript and it supports it because it has to, people that migrate their codebases from JavaScript to TypeScript would want to do it gradually and it wouldn't be sensible for a tool that supposed to be a superset of JavaScript to just ignore it, especially when TypeScript can take advantage over it and improve the experience but JSDoc is also well defined unlike in the case of C# where it's really arbitrary at least for the time being and each vendor has its own way to add hints.

@gafter
Copy link
Member

gafter commented Aug 7, 2018

I don't see any C# language change request or proposal here. Am I not understanding, or is this in the wrong github repository?

@iam3yal
Copy link
Contributor

iam3yal commented Aug 7, 2018

@CyrusNajmabadi

I don't get this. So what if users remove it? If it's not helping them, then they'll remove it and will be fine. If it's helping them... they won't remove it :)

The point is to have a well defined approach that helps everyone and allow developers to be more productive everywhere regardless to the tools they choose to use, the point for these hints to be part of the code not foreign to the code.

Removing and adding comments will just add contention between developers, especially, in open source projects where it's not too far fetched for developers to use different tools.

Easy. Have those tools update themselves to support the community conventions. I mean... how would you expect it to work with Adam's suggestion? Why/how would resharper understand #json#?

I actually don't expect it to work out of the box.

If we could come up with a convention then it might be good enough for the IDE and productivity tools to recognize it and offer intellisense and color highlighting but I really can't see how this "community conventions" will happen.

I don't see hte need for a specific type of string literal. Just have a generalized annotation ability (and, absent that, just use a comment). It's more than sufficient for the needs of intellisense.

If this is sufficient then why some attributes that are merely hints to the compiler are not comments? I know there's no motivation to add attributes everywhere or similar concept into the language atm because there aren't enough if at all use-cases to support it besides what we're discussing here that can be solved by abusing comments and introducing conventions.

It's not just intellisense, it's highlighting too with something like attributes everywhere you get this for free with comments it's not really free.

@AdamSpeight2008 AdamSpeight2008 changed the title [Proposal] Annotated Text Annotated Text for C# and VB Aug 7, 2018
@AdamSpeight2008
Copy link
Author

Using comments in VB would be a little trickier as it doesn't have comment blocks eg /* Comment */
Everything after the comment marker upto the end of line is treated as part of the comment text.
Would it be beneficial to add this form of comment into VB, to this "convention"?

@AdamSpeight2008
Copy link
Author

@HaloFour commented:
You have to propose changing that if you're planning on having custom prefixes and postfixes. That would completely change the meaning of " within a string, thus the meaning of ". And if the custom prefix/postfix is or contains \ then who knows what that means.

There are consider the same as plain old strings, just extended by a prefix (which helps the parser etc ).
Since the parser would know about prefix first, so can be branched on so that contextually understanding of the character within the quotes.

  Output = switch( prefix )
  {
    case "$":  Parse_As_Interpolation_String()
    case "@":  Parse_As_Verbatim_String()
    case "$@":
    case "@$": Parse_As_InterpolationVerbatim_String()
    default:
      var valid = parsers.TryGetKey(prefix, out parser )
      if( valid )
      {
        parser( );
      }
      else
      {
        Parse_As_Plain_Old_String()
      }
  }

@bondsbw
Copy link

bondsbw commented Aug 7, 2018

@gafter This proposal enables custom string delimiters. Relaxing that, I don't think it would necessitate a language change.

@CyrusNajmabadi
Copy link
Member

If we could come up with a convention then it might be good enough for the IDE and productivity tools to recognize it and offer intellisense and color highlighting but I really can't see how this "community conventions" will happen.

Well... the IDE is going to support /*lang=regex*/ and /*language=regex*/ soon. So... community conventions are happening. I'm also for allowing /*regex*/ to be enough.

I'm really not seeing how it will be much different from typing [[regex]].

@CyrusNajmabadi
Copy link
Member

lol, it's merely poor wording on my part but what I meant is that comments shouldn't be used for hints but they are by some productivity tools because we don't have a better alternative.

Ok... but what makes the alternative better? At best i can get a marginal benefit claim here. But it's reallllllly razor thin. So i'd rather just work with what we have and what works in the ecosystem today rather than have to spend a bunch of effort to get back to the point we could already be at today.

@CyrusNajmabadi
Copy link
Member

Are you referring to JSDoc? JSDoc was defined prior to TypeScript and it supports it because it has to, people that migrate their codebases from JavaScript to TypeScript would want to do it gradually and it wouldn't be sensible for a tool that supposed to be a superset of JavaScript to just ignore it, especially when TypeScript can take advantage over it and improve the experience but JSDoc is also well defined unlike in the case of C# where it's really arbitrary at least for the time being and each vendor has its own way to add hints.

I am not referring to jsdoc. I am referring to // directives which TS has had in the language since pre-1.0. Even now *in TS 3.0) they're adding a new <reference lib="..." /> directive that is simply a well known pattern placed into a comment to help drive compilation. TypeScript simply calls these pragmas, and it's been a well established way to drive extensibility in the compiler and elsewhere.

@CyrusNajmabadi
Copy link
Member

There are consider the same as plain old strings, just extended by a prefix (which helps the parser etc ).
Since the parser would know about prefix first, so can be branched on so that contextually understanding of the character within the quotes.

This is a very dangerous thing to support, because there is no guarantee your 'parser' function will support all the requirements necessary for incremental parsing.

It would also require the internal implementation of the lexer and parser (which are complex) be made publicly available. These subsystems are quite complex due to many factors (incremental parsing, nested xml support, preprocessors, etc.), and i do not think there is any realistic way they could be exposed for extensibility purposes without a massive amount of effort being thrown into them.

@CyrusNajmabadi
Copy link
Member

It's not just intellisense, it's highlighting too with something like attributes everywhere you get this for free with comments it's not really free.

You rarely get anything for free :)

Even if htis was 'attributes everywhere' it would not be free. You would still have to update pretty much all (or at least 'most') IDE features to support this new concept. You think it's free because someone (often me :D) goes off and does all the work to support that feature.

But that someone (again me, in this case :D) went and showed that all that work can also be done using comments as the signifier, and without requiring any new syntax.

@bondsbw
Copy link

bondsbw commented Aug 7, 2018

@CyrusNajmabadi To me it depends on whether the custom string parser could affect C# parsing, say by providing a different escape literal.

If so, I would prefer not to use a comment for the annotation since I don't feel it is appropriate for the reader to ignore the annotation.

@CyrusNajmabadi
Copy link
Member

@bondsbw Agreed. If this actually changes string parsing, i think it needs to be a new construct of some sort. However, for changing string parsing, it would then need to be something very limited that we can control (i.e. something like @HaloFour 's proposal). Otherwise, all bets are off in terms of how we could make things like incremental parsing work.

Note: i really don't know how to effectively expose offering different escape literals. The best i can think of is that you support custom delimiters. Then, while parsing, you treat it like a verbatim literal, where there are no escapes. And until you hit your delimiter, every character you run into is just a plain char.

@iam3yal
Copy link
Contributor

iam3yal commented Aug 7, 2018

@CyrusNajmabadi I trust your judgement and expertise in this area and if you think that comments are just as good I'm down with it. ❤️

@CyrusNajmabadi
Copy link
Member

I'm just stating that i think they're "good enough" especially for the case where this isn't actually changing/enabling anything different lexically/syntactically. I was able to personally get this working in just a few days (though it's now been 8 months just waiting on reviews ... :( ). So, to me, the marginal benefit of a language feature here is minimal. :)

@jpierson
Copy link

I found this proposal after considering making one myself. I've recently started using the lit-html for front end Javascript development and realized that it is founded on the concept of tagged template literals which I hadn't realized were part of the newish template literal feature which basically similar to C# string interpolation. The main difference with a tagged template literal is that it allows one to specify a function, referred to as a "tag", as a prefix to the literal which takes the template and the parameters as arguments and allows returning any type of value as a result. Think something like string.Format(...) but being able to define your own and not requiring the result to be a string.

Example lit-html in ES6:

const helloTemplate = (name) => html`<div>Hello ${name}!</div>`;

Having this feature in the .NET language space (C#, F#, VB?) but having it work to produce a custom literal at compile time would be nice. My particlar use case would be to allow defining a custom literal of sorts for utf8 strings as defined in the NStack library.

Example potential literal for ustring:

var myUtf8String = utf8`My String with each char as only a single byte.`;

Perhaps like with lit-html in Javascript this could also allow for opening up the idea of embedded DSLs in C# through the use of templates which I personally would be more interested in compared to the traditional razor (cshtml) templates used in traditional ASP.NET development.

@CyrusNajmabadi
Copy link
Member

I hadn't realized were part of the newish template literal feature which basically similar to C# string interpolation. The main difference with a tagged template literal is that it allows one to specify a function, referred to as a "tag", as a prefix to the literal which takes the template and the parameters as arguments and allows returning any type of value as a result. Think something like string.Format(...) but being able to define your own and not requiring the result to be a string.

You can already do this today in C#. in JS When you write: someTag `1${2 + 3}4```` then that just translates to: someTag(strings, ...exprs), or, in this case: someTag(["1", "4"], 2 + 3)```

The equivalent in C# is to just write:

someTag($"1{2 + 3}4");

// ...
void someTag(FormattableString s)
{
    var format = s.Format;
    var args = s.GetArguments();
    // do wahtever you want
}

The only difference between JS and C# here is one of syntax. JS allows tag `template` . Whereas C# uses Tag($"template").

@CyrusNajmabadi
Copy link
Member

Example potential literal for ustring:

You could do that today in C# as: Utf8($"whatever you want")

@jpierson
Copy link

jpierson commented Aug 22, 2018 via email

@CyrusNajmabadi
Copy link
Member

That's an entirely different request :) It's completely unrelated to tag or templates and whatnot. :)

Feel free to open an issue on that (if it doesn't already exist).

@dotnet dotnet locked and limited conversation to collaborators Dec 3, 2024
@333fred 333fred converted this issue into discussion #8756 Dec 3, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants