-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request to improve performance of stringifying in internal codes #2498
Comments
|
One caveat I found nodejs/node#17431. %TypedArray%.prototype.subarray suffers from an upstream bug and plays slow. However, I think %TypedArray%.prototype.toString will be faster than Buffer.prototype.toString, thanks to the saved language switch |
When is utf8Slice called? And what faster option does Uint8Array provide? |
Do you have a benchmark? |
I think this should be fixed in Node if it's faster. |
|
So if Buffer.toString is slow and has no room to improve, can we skip it as possible as we can. Line 227 in e218fc6
If |
The question is how to achieve that... you would need to somehow determine that a specific buffer corresponds to a specific string. |
We have to constraints in our favor:
|
You are taking small buffers and converting them into v8 strings. It is highly likely that the bottleneck is in the speed of creating v8 strings from an array of bytes. We can do some profiling to verify... Regarding
We can compare Bun with Node. Node: default: 128.842ms Bun: 58.57ms So Bun is 2x faster. Regarding the second half...
We can compare Bun with Node. Node: default: 128.932ms Bun: 58.88ms So Bun is 2x faster, again. So void BindingData::DecodeUTF8(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args); // list, flags
CHECK_GE(args.Length(), 1);
if (!(args[0]->IsArrayBuffer() || args[0]->IsSharedArrayBuffer() ||
args[0]->IsArrayBufferView())) {
return node::THROW_ERR_INVALID_ARG_TYPE(
env->isolate(),
"The \"list\" argument must be an instance of SharedArrayBuffer, "
"ArrayBuffer or ArrayBufferView.");
}
ArrayBufferViewContents<char> buffer(args[0]);
bool ignore_bom = args[1]->IsTrue();
bool has_fatal = args[2]->IsTrue();
const char* data = buffer.data();
size_t length = buffer.length();
if (has_fatal) {
auto result = simdutf::validate_utf8_with_errors(data, length);
if (result.error) {
return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA(
env->isolate(), "The encoded data was not valid for encoding utf-8");
}
}
if (!ignore_bom && length >= 3) {
if (memcmp(data, "\xEF\xBB\xBF", 3) == 0) {
data += 3;
length -= 3;
}
}
if (length == 0) return args.GetReturnValue().SetEmptyString();
Local<Value> error;
MaybeLocal<Value> maybe_ret =
StringBytes::Encode(env->isolate(), data, length, UTF8, &error);
Local<Value> ret;
if (!maybe_ret.ToLocal(&ret)) {
CHECK(!error.IsEmpty());
env->isolate()->ThrowException(error);
return;
}
args.GetReturnValue().Set(ret);
}
The relevant part is this... Local<Value> error;
MaybeLocal<Value> maybe_ret =
StringBytes::Encode(env->isolate(), data, length, UTF8, &error); It calls... {
val = String::NewFromUtf8(isolate,
buf,
v8::NewStringType::kNormal,
buflen);
Local<String> str;
if (!val.ToLocal(&str)) {
*error = node::ERR_STRING_TOO_LONG(isolate);
}
return str;
} ... which calls.... i::Isolate* i_isolate = reinterpret_cast<internal::Isolate*>(v8_isolate); \
ENTER_V8_NO_SCRIPT_NO_EXCEPTION(i_isolate); \
API_RCS_SCOPE(i_isolate, class_name, function_name); \
if (length < 0) length = StringLength(data); \
i::Handle<i::String> handle_result = \
NewString(i_isolate->factory(), type, \
base::Vector<const Char>(data, length)) \
.ToHandleChecked(); \
result = Utils::ToLocal(handle_result); which calls... inline i::MaybeHandle<i::String> NewString(i::Factory* factory,
NewStringType type,
base::Vector<const char> string) {
if (type == NewStringType::kInternalized) {
return factory->InternalizeUtf8String(string);
}
return factory->NewStringFromUtf8(string);
} And down, down we go... turtles all the way down. So yeah, it is likely a fairly expensive process. And as long as you must create a v8 string, I don't see much in the way of possible optimization. It costs what it costs. |
I think the point here is that we might not have to create the strings. We could e.g hash the buffer and re use strings. Or some other way to lookup common header names from buffer to string. |
You can use the fact that UTF-8 validation can be done nearly for free in Node directly in a buffer thanks to @anonrig. |
Do we even need to utf validate header strings? Shouldn't they be ASCII? I think llhttp might already validate that? |
So I was thinking something like: function headerToString(buf) {
if (buf[0] === CHAR_C) {
if (buf.equals(BUF_CONTENT_LENGTH) return 'content-length'
} else {
return buf.toString()
}
}
function headerToString(buf) {
const hash = hash(buf) // Sufficiently unique hash
return lookup.get(hash) ?? buf.toString()
} |
I think I could use a tree structure. Benchmark// bench.mjs
import { bench, group, run } from "mitata";
const buffer = Buffer.from("content-type");
const decoder = new TextDecoder();
bench("noop", () => {});
bench("noop", () => {});
bench("noop", () => {});
bench("noop", () => {});
bench("noop", () => {});
bench("noop", () => {});
class Tree {
#node;
constructor() {
this.#node = {};
}
append(value) {
const target = Buffer.from((value = value.toLowerCase()));
let node = this.#node;
for (let i = 0; i < target.length; ++i) {
const key = target[i];
node[key] ??= {};
if (0x61 <= key && key <= 0x7a) {
node[key - 32] ??= node[key];
}
node = node[key];
}
node.value = value;
}
lookup(buffer) {
const length = buffer.length;
let node = this.#node;
for (let i = 0; i < length; ++i) {
const key = buffer[i];
if (!(key in node) || node === undefined) return null;
node = node[key];
}
return node === undefined ? null : node.value;
}
}
const tree = new Tree();
tree.append("content-type");
const fromCharCode = String.fromCharCode;
group("lookup", () => {
bench("String.fromCharCode.apply", () => {
fromCharCode.apply(null, buffer);
});
bench("Buffer#toString", () => {
buffer.toString("latin1");
});
bench("new TextDecoder", () => {
new TextDecoder().decode(buffer);
});
bench("TextDecoder", () => {
decoder.decode(buffer);
});
bench("tree.lookup", () => {
tree.lookup(buffer);
});
});
group("lookup with toLowerCase", () => {
bench("String.fromCharCode.apply", () => {
fromCharCode.apply(null, buffer).toLowerCase();
});
bench("Buffer#toString", () => {
buffer.toString("latin1").toLowerCase();
});
bench("new TextDecoder", () => {
new TextDecoder().decode(buffer).toLowerCase();
});
bench("TextDecoder", () => {
decoder.decode(buffer).toLowerCase();
});
bench("tree.lookup", () => {
tree.lookup(buffer);
});
});
await new Promise((resolve) => setTimeout(resolve, 7000));
await run();
|
What about |
I can work on this, what do you want me to do? |
You tree lookup method seems like a good first try. |
Let's go with that. |
This is not necessary, because even if it is uppercase, it already has a lowercase reference. |
Bug Description
The cpp binding function, utf8slice called by Buffer.toString() is accounted for a big proportion in the CPU profile resulted from benchmarking a HTTP proxy using undici.
By using Uint8Arrray, we can eliminat the overhead incurred by the language transition
Reproducible By
Expected Behavior
Logs & Screenshots
Environment
node v22
linux
Additional context
I hope the members can provide some guide and more importantly caveats to applying this idea
The text was updated successfully, but these errors were encountered: