Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed hex decimals on C #4094

Merged
merged 10 commits into from
Aug 25, 2024
Merged

fixed hex decimals on C #4094

merged 10 commits into from
Aug 25, 2024

Conversation

Dxuian
Copy link
Contributor

@Dxuian Dxuian commented Aug 19, 2024

Resolves #4065

Changes

Changed the regex for hex numbers to accomdate decimal numbers
(first pr pls be nice 😁😁)
(https://github.com/highlightjs/highlight.js/issues/4065)

Checklist

before
after
Checklist

  • Added markup tests, or they don't apply here because...
  • Updated the changelog at CHANGES.md

@Dxuian Dxuian marked this pull request as draft August 19, 2024 00:35
@Dxuian Dxuian marked this pull request as ready for review August 19, 2024 00:45
],
relevance: 0
};
const NUMBERS = {
Copy link
Member

@joshgoebel joshgoebel Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabbing/spacing here should not have changed...

Copy link
Contributor Author

@Dxuian Dxuian Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have fixed the indentation in the new commits

@joshgoebel
Copy link
Member

Also we really should have some tests here to validate both the old behavior and confirm the new behavior.

Could we perhaps pull over the number-literals tests from cpp and modify them to be C specific?

@Dxuian
Copy link
Contributor Author

Dxuian commented Aug 19, 2024

Could we perhaps pull over the number-literals tests from cpp and modify them to be C specific?

@joshgoebel i have also come up with a new test and expect files for number literals in C since all C++ number literals cannot be directly ported over due to c(gcc) not having digit seprators
the testing mainly modified the C++ files and its mostly the same
but i dont know why C doesnt have a separate testing file as of yet despite some compatibility issues in C++ and C code which is why i went with a new test file for C
I believe we should have a seperate testing file for all others features in C too
and i urge the maintainers to open a issue for the same

@Dxuian
Copy link
Contributor Author

Dxuian commented Aug 20, 2024

i have also noticed some keywords in the C keywords file that dont belong there for e.g -'auto' these work in c++ and not c.
i would really want to work on correcting the C/C++ thing

@joshgoebel
Copy link
Member

C keywords file that dont belong there for e.g -'auto' these work in c++ and not c.

Please make sure your knowledge is 100% up-to-date before offering to start removing keywords. auto is totally valid since C23 and does what you'd expect.

@joshgoebel
Copy link
Member

I believe we should have a seperate testing file for all others features in C too ... and i urge the maintainers to open a issue for the same

More tests would be welcome, but I don't think we need a separate issue for "add more tests to C"... but perhaps a meta-issue could be good - it would just be a placeholder for all grammars tagged with "good for beginners". Many grammars could use additional tests.

Comment on lines 60 to 63
{ begin: '\\b(0b[01\']+)' },
{ begin: '(-?)\\b([\\d\']+(\\.[\\d\']*)?|\\.[\\d\']+)((ll|LL|l|L)(u|U)?|(u|U)(ll|LL|l|L)?|f|F|b|B)' },
{ begin: '(-?)(0[xX][a-fA-F0-9]+(?:\'[a-fA-F0-9]+)*(?:\\.[a-fA-F0-9]*(?:\'[a-fA-F0-9]*)*)?(?:[pP][-+]?[0-9]+)?)' },
{ begin: '(-?)\\b\\d+(?:\'\\d+)*(?:\\.\\d*(?:\'\\d*)*)?(?:[eE][-+]?\\d+)?' }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now if we could please just switch to match vs begin and literal regex: /.../ (vs strings)... then this is good to go I think. I'm also happy to do this cleanup myself, just might be a little while.

Copy link
Contributor Author

@Dxuian Dxuian Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshgoebel i've changed the match to begin , i also managed to correct for a issue in detecting hex longs that went previously undetected.
please review if any changes are needed
before
after

],
{ match: /\b(0b[01']+)/ },
{ match: /(-?)\b([\d']+(\.[\d']*)?|\.[\d']+)((ll|LL|l|L)(u|U)?|(u|U)(ll|LL|l|L)?|f|F|b|B)/ },
{ match: /(-?)(0[xX][a-fA-F0-9]+(?:'[a-fA-F0-9]+)*(?:\.[a-fA-F0-9]*(?:'[a-fA-F0-9]*)*)?(?:[pP][-+]?[0-9]+)?(l|L)?(u|U)?)/ },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want need \b here also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did see a redundant \b and ive removed it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these were needed to prevent false positives... like say g2324 where the 234 would be identified as a number without the \b, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test
the current behavior seems to work fine but ig we can do with a \b just to be sure but wouldnt that take a performance hit. should i add \b just to be sure or is it fine ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2024-08-24 at 8 39 40 PM

An example of false positives, wouldn't these be valid variable names? The \b is needed to prevent this.

wouldnt that take a performance hit.

Not really, no - I don't think so - but we don't really talk about regex performance here outside of security concerns (run-away exponential regex, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i add \b

Yep. I think we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
@joshgoebel ive fixed the false positive issue please review changes

Copy link

Build Size Report

Changes to minified artifacts in /build, after gzip compression.

3 files changed

Total change +86 B

View Changes
file base pr diff
es/languages/c.min.js 1.93 KB 1.97 KB +44 B
highlight.min.js 8.22 KB 8.22 KB -1 B
languages/c.min.js 1.93 KB 1.97 KB +43 B

@joshgoebel joshgoebel merged commit 07ee351 into highlightjs:main Aug 25, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(C/C++) Hex float literal hilit
2 participants