Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(C) C23's digit separators are highlighted incorrectly #4054

Closed
Melkor-1 opened this issue May 19, 2024 · 7 comments
Closed

(C) C23's digit separators are highlighted incorrectly #4054

Melkor-1 opened this issue May 19, 2024 · 7 comments
Labels
bug help welcome Could use help from community language

Comments

@Melkor-1
Copy link
Contributor

Describe the issue

C23 has added digit specifiers to the language (', just like C++). They are not highlighted correctly in the code. It seems as if the engine treats it as the start of a string/character

Which language seems to have the issue?
c

Are you using highlight or highlightAuto?

Whichever one StackExchange uses.

Sample Code to Reproduce

#define _POSIX_C_SOURCE 200'809L
#define _XOPEN_SOURCE   700

Expected behavior

It should recognize it as a digit separator instead of a character or a string opening quote.

Additional context

This was found at this code review post: https://codereview.stackexchange.com/revisions/292149/6

And it was fixed by such a hack:

#define _POSIX_C_SOURCE 200'809L // '
@Melkor-1 Melkor-1 added bug help welcome Could use help from community language labels May 19, 2024
@joshgoebel
Copy link
Member

Does our C++ grammar properly support this already. If so might be an easy port.

@LarnuUK
Copy link

LarnuUK commented Aug 20, 2024

@joshgoebel based on the release over on Stack Overflow, neither C or C++ support the grammar.

@abarkat99
Copy link

abarkat99 commented Aug 20, 2024

Looks like both C++ and C already have support for digit separators. For example the regex for NUMBERS in C++ allow usage of single quotes as seprators:

const NUMBERS = {
className: 'number',
variants: [
// Floating-point literal.
{ begin:
"[+-]?(?:" // Leading sign.
// Decimal.
+ "(?:"
+"[0-9](?:'?[0-9])*\\.(?:[0-9](?:'?[0-9])*)?"
+ "|\\.[0-9](?:'?[0-9])*"
+ ")(?:[Ee][+-]?[0-9](?:'?[0-9])*)?"
+ "|[0-9](?:'?[0-9])*[Ee][+-]?[0-9](?:'?[0-9])*"
// Hexadecimal.
+ "|0[Xx](?:"
+"[0-9A-Fa-f](?:'?[0-9A-Fa-f])*(?:\\.(?:[0-9A-Fa-f](?:'?[0-9A-Fa-f])*)?)?"
+ "|\\.[0-9A-Fa-f](?:'?[0-9A-Fa-f])*"
+ ")[Pp][+-]?[0-9](?:'?[0-9])*"
+ ")(?:" // Literal suffixes.
+ "[Ff](?:16|32|64|128)?"
+ "|(BF|bf)16"
+ "|[Ll]"
+ "|" // Literal suffix is optional.
+ ")"
},
// Integer literal.
{ begin:
"[+-]?\\b(?:" // Leading sign.
+ "0[Bb][01](?:'?[01])*" // Binary.
+ "|0[Xx][0-9A-Fa-f](?:'?[0-9A-Fa-f])*" // Hexadecimal.
+ "|0(?:'?[0-7])*" // Octal or just a lone zero.
+ "|[1-9](?:'?[0-9])*" // Decimal.
+ ")(?:" // Literal suffixes.
+ "[Uu](?:LL?|ll?)"
+ "|[Uu][Zz]?"
+ "|(?:LL?|ll?)[Uu]?"
+ "|[Zz][Uu]"
+ "|" // Literal suffix is optional.
+ ")"
// Note: there are user-defined literal suffixes too, but perhaps having the custom suffix not part of the
// literal highlight actually makes it stand out more.
}
],
relevance: 0
};

From my understanding the actual issue is that the contains for PREPROCESSOR does not include NUMBERS but does have STRINGS causing single quotes to be considered the start of a string:

contains: [
{
begin: /\\\n/,
relevance: 0
},
hljs.inherit(STRINGS, { className: 'string' }),
{
className: 'string',
begin: /<.*?>/
},
C_LINE_COMMENT_MODE,
hljs.C_BLOCK_COMMENT_MODE
]

I am not familiar with Highlight.js but my guess is that adding NUMBERS in PREPROCESSOR.contains should fix the issue

@joshgoebel
Copy link
Member

I am not familiar with Highlight.js but my guess is that adding NUMBERS in PREPROCESSOR.contains should fix the issue

I think that would do a bit too much (for some directives), but that's the right idea. I think the pre-processors may need to be handled more precisely... for example #define x CODE might deserve full highlighting (for CODE) but not others... I'm trying to think of what false positive cases we might have.

@Dxuian
Copy link
Contributor

Dxuian commented Aug 27, 2024

is this still open
i recently made a commit that solved this issue i believe i will post a follow up

@Dxuian
Copy link
Contributor

Dxuian commented Aug 27, 2024

@joshgoebel upon further inspection #4094
handles all digit seperators. I believe we can mark this issue as resolved
image

@joshgoebel
Copy link
Member

Resolved by #4094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help welcome Could use help from community language
Projects
None yet
Development

No branches or pull requests

5 participants