Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› Bug Report β€” CPU profiling reports entire execution as (program) #1754

Closed
mrbbot opened this issue Mar 1, 2024 · 0 comments Β· Fixed by #2558
Closed

πŸ› Bug Report β€” CPU profiling reports entire execution as (program) #1754

mrbbot opened this issue Mar 1, 2024 · 0 comments Β· Fixed by #2558

Comments

@mrbbot
Copy link
Contributor

mrbbot commented Mar 1, 2024

Hey! πŸ‘‹ When investigating cloudflare/workers-sdk#4409, I observed that CPU profiles from workerd contain a single (program) entry for the entire duration of the profile. To reproduce this...

# config.capnp
using Workerd = import "/workerd/workerd.capnp";

const config :Workerd.Config = (
  services = [
    ( name = "main", worker = .worker ),
  ],
  sockets = [
    ( name = "http", address = "*:8080", http = (), service = "main" ),
  ]
);

const worker :Workerd.Worker = (
  modules = [
    ( name = "./index.mjs", esModule = embed "index.mjs" )
  ],
  compatibilityDate = "2024-01-01",
  compatibilityFlags = ["nodejs_compat"],
);
// index.mjs
import { Buffer } from "node:buffer";

const encoder = new TextEncoder();

async function pbkdf2Derive(password) {
  const passwordArray = encoder.encode(password);
  const passwordKey = await crypto.subtle.importKey(
    "raw", passwordArray, "PBKDF2", false, ["deriveBits"]
  );
  const saltArray = crypto.getRandomValues(new Uint8Array(16));
  const keyBuffer = await crypto.subtle.deriveBits(
    { name: "PBKDF2", hash: "SHA-256", salt: saltArray, iterations: 1_000_000 },
    passwordKey, 256
  );
  return Buffer.from(keyBuffer).toString("base64");
}

export default {
  async fetch(request, env, ctx) {
    return new Response(await pbkdf2Derive("hello!"));
  }
}
// stress.mjs
const total = 1_000;
for (let i = 0; i < total; i++) {
  const response = await fetch("http://127.0.0.1:8080");
  await response.arrayBuffer();
  if (i % 10 === 0) console.log(`${(i * 100 / total).toFixed(0)}%`);
}

...then run...

$ npx workerd serve config.capnp --verbose --control-fd=1 --watch --inspector-addr=127.0.0.1:9229 &
$ node stress.mjs

...then visit chrome://inspect to open DevTools and start a CPU profile for a few seconds. For me, the result looks something like...

Screenshot 2024-03-01 at 14 33 30

If I run the following similar code in Node directly with node --inspect-brk index-node.mjs, visit chrome://inspect and start a CPU profile I see something like...

// index-node.mjs
import { Buffer } from "node:buffer";
import { webcrypto as crypto } from "node:crypto";

const encoder = new TextEncoder();

async function pbkdf2Derive(password) {
  const passwordArray = encoder.encode(password);
  const passwordKey = await crypto.subtle.importKey(
    "raw", passwordArray, "PBKDF2", false, ["deriveBits"]
  );
  const saltArray = crypto.getRandomValues(new Uint8Array(16));
  const keyBuffer = await crypto.subtle.deriveBits(
    { name: "PBKDF2", hash: "SHA-256", salt: saltArray, iterations: 1_000_000 },
    passwordKey, 256
  );
  return Buffer.from(keyBuffer).toString("base64");
}

const total = 1_000;
for (let i = 0; i < total; i++) {
  console.log(await pbkdf2Derive("hello!"));
  if (i % 10 === 0) console.log(`${(i * 100 / total).toFixed(0)}%`);
}

Screenshot 2024-03-01 at 14 33 15

I get that Node probably implements more of these APIs in JavaScript, hence the deeper flame graphs, but I'd still expect to see pbkdf2Dervie() in the workerd profile's flame graph.

harrishancock added a commit that referenced this issue Jun 6, 2024
V8's profiler does not expect isolates to be locked from different threads. Pull request #710 moved inspector protocol handling to a separate thread, meaning profiling was started and stopped on the inspector thread, not the Worker's request connection thread. If I remember right, the result of this is that the profiler ends up sampling the inspector thread, which doesn't normally run JavaScript, and causes a bunch of "(program)" placeholder spans to show up in the profile.

We previously discovered this issue with our internal runtime, which has always been heavily multi-threaded, and patched around the problem. This commit copies the two relevant patches from our internal repo.

Fixes #1754.
harrishancock added a commit that referenced this issue Aug 20, 2024
This commit adds a regression test for #1754, based on @mrbbot's original reproduction.
harrishancock added a commit that referenced this issue Aug 20, 2024
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction.

I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage.

I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well.

Fixes #1754.
harrishancock added a commit that referenced this issue Aug 20, 2024
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction.

I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage.

I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well.

Fixes #1754.
harrishancock added a commit that referenced this issue Aug 20, 2024
This commit adds a regression test for the CPU profiling bug reported in #1754 and fixed in #2497. It is based on @mrbbot's original reproduction.

I backported the test to the commit just prior to #2497, and confirmed that the test caught the original breakage.

I wrote the test in JavaScript, because it seemed to have the richest ecosystem of tools for working with the Chrome Devtool Protocol. I originally intended to use Playwright to initiate a CDP connection to workerd, but it seemed to make too many assumptions that it was connecting to a browser. I tried the [chrome-remote-interface](https://github.com/cyrus-and/chrome-remote-interface) library next, and it seemed to work well.

Fixes #1754.
@harrishancock harrishancock reopened this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants