Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: replace custom ASCII validation with simdutf one #46271

Merged
merged 1 commit into from
Jan 21, 2023

Conversation

addaleax
Copy link
Member

No description provided.

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jan 19, 2023
Copy link
Member

@tniessen tniessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have (naively) assumed that many compilers would vectorize contains_non_ascii_slow.

src/string_bytes.cc Outdated Show resolved Hide resolved
@addaleax
Copy link
Member Author

@tniessen Me too! Fwiw, I’m not really trying to optimize anything here (who even uses 'ascii'…) but rather just get rid of code we don’t need anymore.

@richardlau
Copy link
Member

The commit title is missing the "d" from "simdutf".

@addaleax addaleax changed the title src: replace custom ASCII validation with simutf one src: replace custom ASCII validation with simdutf one Jan 19, 2023
@addaleax addaleax added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 19, 2023
@addaleax addaleax force-pushed the simdutf-containsnonascii branch from 5e740f8 to 8df6b9d Compare January 19, 2023 16:13
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 19, 2023
@nodejs-github-bot
Copy link
Collaborator

@@ -634,7 +581,8 @@ MaybeLocal<Value> StringBytes::Encode(Isolate* isolate,
}

case ASCII:
if (contains_non_ascii(buf, buflen)) {
if (simdutf::validate_ascii_with_errors(buf, buflen).error) {
Copy link
Member

@lpinca lpinca Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using the error version? Is the common case to have invalid ASCII?

Copy link
Member Author

@addaleax addaleax Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The with_errors variant bails out early if it detects invalid ASCII, instead of running the entire string, so my thought was that it would match the performance profile of the previous code here best.

The common case is to not use this branch (ASCII) at all :)

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 19, 2023
@addaleax addaleax added the commit-queue Add this label to land a pull request using GitHub Actions. label Jan 19, 2023
}

return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete aside: I have a distinct memory of writing this code, it looks like how I would write such code, yet git blame attributes it to Isaac S... huh. The human mind is a fickle thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis I double-checked out of interest (and also because it looks like code from you), and … you did! e325ace is all yours 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel vindicated now! Thanks for digging that up, Anna. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written almost ten years ago, and you still remember your code. I've nothing but respect for all of you.

@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jan 21, 2023
@nodejs-github-bot nodejs-github-bot merged commit 6913140 into nodejs:main Jan 21, 2023
@nodejs-github-bot
Copy link
Collaborator

Landed in 6913140

@addaleax addaleax deleted the simdutf-containsnonascii branch January 23, 2023 10:14
ruyadorno pushed a commit that referenced this pull request Feb 1, 2023
PR-URL: #46271
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
@ruyadorno ruyadorno mentioned this pull request Feb 1, 2023
juanarbol pushed a commit that referenced this pull request Mar 3, 2023
PR-URL: #46271
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
@juanarbol juanarbol mentioned this pull request Mar 3, 2023
juanarbol pushed a commit that referenced this pull request Mar 5, 2023
PR-URL: #46271
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants