Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dispatcher infrastructure. #2785

Closed
wants to merge 1 commit into from

Conversation

afinch7
Copy link
Contributor

@afinch7 afinch7 commented Aug 15, 2019

ref #2730

core/isolate.rs Outdated Show resolved Hide resolved
@afinch7
Copy link
Contributor Author

afinch7 commented Aug 17, 2019

I've have deno_core_http_bench working, but performance is awful currently.

Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.41ms  238.05us  11.67ms   93.35%
    Req/Sec     3.56k   277.97     3.91k    90.50%
  70784 requests in 10.01s, 3.44MB read
Requests/sec:   7074.11
Transfer/sec:    352.32KB

I don't even need to run a test on master to tell you that this is worse than 60% performance loss.

@@ -204,7 +193,7 @@ impl Isolate {

let shared = SharedQueue::new(RECOMMENDED_SIZE);

let needs_init = true;
let needs_init = AtomicBool::new(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be atomic now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can run init without a mutable reference, but I just realized this getting checked every time we run poll is going to slow things way down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to revert this entirely, but I'm going to wait for the time being just in case I need it somewhere else.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 17, 2019

Got it sorted. Calling the op id accessor for every dispatch was slowing things down the most, but that atomic also accounted for about 15%.
This branch now

Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   286.08us   76.34us   2.36ms   93.77%
    Req/Sec    17.35k     1.61k   20.74k    64.85%
  348752 requests in 10.10s, 16.96MB read
Requests/sec:  34529.76
Transfer/sec:      1.68MB

Master

Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   286.14us   77.99us   2.46ms   94.04%
    Req/Sec    17.41k     1.61k   21.46k    71.78%
  349712 requests in 10.10s, 17.01MB read
Requests/sec:  34624.60
Transfer/sec:      1.68MB

Mostly within margin of error as far as I can tell. Maybe slightly in favor of master.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 17, 2019

Now that performance is sorted this is coming along well. Here are just a few nice features this will add(mainly for embedders):

  • Dispatcher registries can be shared between isolates. This should make it trivial to implement multithreaded work stealing for even faster web servers.
  • Dispatcher registries can be shared with the multiple dispatchers and isolates at the same time(and still function correctly), so we can register new dispatchers during dispatch easily.
  • The dispatcher registry for a isolate can be swapped with a &mut reference to the isolate. This won't propagate to js code as easily yet. The information from the accessor Deno.core.ids will always be up to date, but checking this for every op call is very costly(see above).
  • Dispatcher instances can be shared between dispatcher registries.

I'm working on a better way to notify js of the current op id for any give op. Maybe something like this?

let listenOpId;
Deno.core.ids.builtins.OpListen = (id: number | undefined) => {
  listenOpId = id;
} 

/** Listens on 0.0.0.0:4500, returns rid. */
function listen() {
  if (!listenOpId) {
    throw new Error("Op not registered yet");
  }
  return sendSync(listenOpId, -1);
}

This should also allow async ops to await the op id they require with some resolvables.

I'm going to be focusing on getting cli working for now though.

Also I'm not sure if you care much, but I had to drop the whole trait NamedOpDispatcher: OpDispatcher thing. Turns out rust won't let you implement a foreign trait on a generic(impl<T: LocalDispatcherTrait> OpDispatcher for T).

@ry
Copy link
Member

ry commented Aug 17, 2019

This is really a mouthful: Deno.core.ids.builtins.OpListen

Can this be shortened to Deno.ops.listen ?

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 18, 2019

This is really a mouthful: Deno.core.ids.builtins.OpListen

Can this be shortened to Deno.ops.listen ?

I still want to include top level namespaceing(builtins), and OpListen can be changed in the op implementation(and the namespace, builtins, is also configurable).

I did remove the core prefix, and rename ids to ops, so the new layout looks like this: Deno.ops.namespace.name.

Also this might be void complaint, since it now supports namespace aliasing:

const ops = Deno.ops.someReallyLongNamespace;

let listenOpId;
ops.OpListen = id => {
  listenOpId = id;
}

/** Listens on 0.0.0.0:4500, returns rid. */
function listen() {
  return sendSync(listenOpId, -1);
}

It also still supports directly reading op ids:

const ops = Deno.ops.someReallyLongNamespace;

let listenOpId;

/** Listens on 0.0.0.0:4500, returns rid. */
function listen() {
  if (listenOpId === undefined) {
    listenOpId = ops.OpListen;
  }
  return sendSync(listenOpId, -1);
}

This will still cost some time to evaluate the if, so it's not ideal. This feature is mainly designed for debug/testing.

@bartlomieju
Copy link
Member

bartlomieju commented Aug 18, 2019

@afinch7 why is this bit needed?

  if (listenOpId === undefined) {
    listenOpId = ops.OpListen;
  }

Can't it be accessed directly via Deno.ops.namespace.name?

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 18, 2019

@afinch7 why is this bit needed:

  if (listenOpId === undefined) {
    listenOpId = ops.OpListen;
  }
```?

Can't it be accessed directly via `Deno.ops.namespace.name`?

You can, but if you care about performance at all you really shouldn't. This is implemented as a get accessor, so it's basically equivalent to calling a extra function for every dispatch(a pretty expensive one at that). It's slow for the same reason that for (let i = 0; i < something.length; i++) {} is slower than const length = something.length; for (let i = 0; i < length; i++) {}

@bartlomieju
Copy link
Member

@afinch7 why is this bit needed:

  if (listenOpId === undefined) {
    listenOpId = ops.OpListen;
  }
```?

Can't it be accessed directly via `Deno.ops.namespace.name`?

You can, but if you care about performance at all you really shouldn't. This is implemented as a get accessor, so it's basically equivalent to calling a extra function for every dispatch(a pretty expensive one at that). It's slow for the same reason that for (let i = 0; i < something.length; i++) {} is slower than const length = something.length; for (let i = 0; i < length; i++) {}

I always thought that V8 can optimize away such loops... Is there a big difference in performance using first approach?

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 18, 2019

i < something.length is a conditional, and in order to evaluate it we need to know the value of something.length which is some form of accessor. Since we evaluate i < something.length again each cycle(in case the length of something changed), this means we are calling the accessor for something.length every cycle. If we set const length = something.length we don't have to call the accessor every cycle anymore.

In our case for op ids it's a huge difference like 1/4-1/5 the speed if not slower. If you look at the benchmarks above the first one at the top is send(Deno.ops.builtins.OpListen, args, buf) and the second one is:

if (listenOpId === undefined) {
  listenOpId = ops.OpListen;
}
send(listenOpId, args, buf)

(plus another smaller optimization <10%)

@bartlomieju
Copy link
Member

In our case for op ids it's a huge difference like 1/4-1/5 the speed if not slower. If you look at the benchmarks above the first one at the top is send(Deno.ops.builtins.OpListen, args, buf) and the second one is:

To be honest that sounds really strange to have 20% perf hit on object lookup. I wonder if that might be related to #2758. I still need to drill down a bit more into this PR.

@afinch7 afinch7 marked this pull request as ready for review August 18, 2019 22:01
@afinch7 afinch7 force-pushed the new_dispatch_system branch from 676d1ff to 571ac0f Compare August 18, 2019 22:39
@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

Don't merge this yet. I think I reintroduced the bug from #2776. I will see if I can get that one figured out tomorrow. It seems to only occur on linux during deno bundle https://deno.land/std/http/file_server.ts.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

Benchmarks from the last passing build

"req_per_sec": {
    "deno_tcp": 49632, 
    "deno_core_single": 56111, 
    "node_proxy_tcp": 20290, 
    "deno_proxy_tcp": 18218, 
    "deno_core_multi": 58596, 
    "deno_tcp_current_thread": 46730, 
    "hyper": 56331, 
    "node_tcp": 57969, 
    "deno_http": 18762, 
    "node_http": 25049, 
    "deno_proxy": 1670, 
    "node_proxy": 2709
  }, 
  "max_latency": {
    "deno_tcp": 8.94, 
    "deno_core_single": 0.582, 
    "node_proxy_tcp": 1.19, 
    "deno_proxy_tcp": 5.49, 
    "deno_core_multi": 1.13, 
    "deno_tcp_current_thread": 9.05, 
    "hyper": 0.421, 
    "node_tcp": 0.743, 
    "deno_http": 2.78, 
    "node_http": 1.94, 
    "deno_proxy": 28.09, 
    "node_proxy": 9.73
  }, 

@ry
Copy link
Member

ry commented Aug 19, 2019

I think I reintroduced the bug from #2776.

My theory is that the bug is present in master, we just see it rarely. Appveyor builds fail probably 50% of the time - I think with a segfault - it may be this bug.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

I just ran ./target/release/deno bundle https://deno.land/std/http/file_server.ts locally, and it created a segfault on the first run. This is clearly a bug, but I'm not sure if it applies to master.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

This bug doesn't apply to master, but that means that we know more specifically it's something I included in this PR.

@ry
Copy link
Member

ry commented Aug 19, 2019

I've never seen it in master, but that doesn't mean it's not there. It's hard to imagine introducing memory bugs with the change from #2776 ....

Bartek is debugging in #2794

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

Never mind about this being specific to this branch. Not so sure about that anymore. I just added some debug prints and rebuilt it, and now it works fine.

Either I just got really lucky, or it's a problem where the same code will result in good and bad binaries randomly. I only added debug prints to //js/write_file.ts, so it can only be an issue in the build of typescript library(I doubt this would cause a segfault) or build for cli rust code(I would put my money here). Maybe a bug in rustc or the linker?

My main problem at this point is getting another "bad" binary. I've reverted all my debug prints and everything, and rebuilt it about a dozen times and it's worked fine every time. This one seems rare enough that it shouldn't be a problem(<1/10 build is faulty), but still really want to figure it out at some point.

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

There is really only one slight issue still left in this commit, we can't safely make context sensitive calls from op id notifiers:

Deno.ops.builtins.listen = (id) => {
  // This shouldn't work although in some cases
  // it may occur while another call has a valid libdeno::deno::UserDataScope
  // but we shouldn't rely on this.
  Deno.core.dispatch(id, someData);
}

or

Deno.ops.builtins.listen = (id) => {
  // Same problem here
  const import = import(someFile);
  import.setOpId(id);
}

I think this can be solved, but I don't think it's really considered critical. It's not an intended use case, so I think we should punt this for now.

@afinch7 afinch7 force-pushed the new_dispatch_system branch from df98322 to 1eb723d Compare August 19, 2019 22:42
@afinch7
Copy link
Contributor Author

afinch7 commented Aug 19, 2019

Finally got it to compile a faulty binary again. Here is a comparison of deno -L debug bundle with two binaries compiled from the same code: https://gist.github.com/afinch7/e8ec7339a4cdea683a5ee68b8ebcedc1

@afinch7 afinch7 changed the title [WIP] New dispatcher infrastructure. New dispatcher infrastructure. Aug 20, 2019
@afinch7
Copy link
Contributor Author

afinch7 commented Aug 20, 2019

I'm now pretty sure this weird segfault has something to do with the addition of sendSyncMinimal, but I don't really have a good way to test that theory. I changed most of the rust side of things, and the issue still came up. This PR should be ready for a review. I'm going to continue to poke at this problem to see if I can find a specific solution.

@ry
Copy link
Member

ry commented Aug 21, 2019

@afinch7 I want to add at least one JSON op before doing this refactor (see #2799) - so we can ensure the abstractions are good for that use-case. Hopefully this rebase won't be too bad. I'll try to land the JSON op today.

I've recently implemented JSON ops in my dev repo and have a good idea of how they should look:
https://github.com/ry/deno_typescript/blob/master/deno_typescript/ops.rs#L45-L73
https://github.com/ry/deno_typescript/blob/master/deno_typescript/compiler_main.js#L152-L156

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 22, 2019

@ry I see there are already a half a dozen or more already in the works(#2801). It might be easier to land this first(after you land the first json ops #2799), and build these refactored ops ontop of the new dispatch system to begin with. I'm not sure what you want to happen first though.

@ry
Copy link
Member

ry commented Aug 22, 2019

@afinch7 It's probably more work, but I want to let the JSON ops settle in a bit before prescribing a new dispatch interface. I think we can find a lot of mini clean ups on that front and it won't be hard to then later modify the ops to use a dispatcher trait as you've outlined above. I've tried to leave the code very dumb and ripe for abstraction.

I apologize because it massively conflicts with this patch - I'll help getting this rebased after #2799 (and any other conversions that are ready to land at the time).

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 22, 2019

I don't think I'm going to bother keeping anything outside of core(just going to start fresh). I had to rebase most of my changes to //cli and //js over #2788, and I really don't like how some of it turned out. Long story short I think this is for the better. I just need to know when you are ready to land this, so I avoid mixing in reverted/removed stuff.

I figure now is a good time for a preliminary review of my changes to core, since I don't plan on changing much in core anymore.

I've also been working on more in depth embedding example for core. It's a multi worker tcp server using work stealing. I have a feeling that the main speed limit in deno right now is the speed of a single v8 isolate. This should give us a better understanding of the speed limiters on the rust side.

@ry
Copy link
Member

ry commented Aug 29, 2019

+4,344 −2,049

:<

@afinch7
Copy link
Contributor Author

afinch7 commented Aug 29, 2019

@ry I'm a little bothered by that too, but I think the ability to split our ops into standalone crates is worth this cost. I'm going to work on the cli implementation a bit here, and maybe come up with a way to reduce the boilerplate.

@afinch7 afinch7 force-pushed the new_dispatch_system branch from 51a7c58 to 447d07a Compare August 29, 2019 18:44
@afinch7
Copy link
Contributor Author

afinch7 commented Aug 29, 2019

This latest commit removed a lot of the complexity in the cli implementation. Op implementations looked like this:

pub struct OpMetrics {
  state: ThreadSafeState,
}

impl OpMetrics {
  pub fn new(state: ThreadSafeState) -> Self {
    Self { state }
  }
}

impl OpDispatcher for OpMetrics {
  fn dispatch(&self, control: &[u8], buf: Option<PinnedBuf>) -> CoreOp {
    wrap_json_op(
      move |_args, _zero_copy| {
        let m = &self.state.metrics;

        Ok(JsonOp::Sync(json!({
          "opsDispatched": m.ops_dispatched.load(Ordering::SeqCst) as u64,
          "opsCompleted": m.ops_completed.load(Ordering::SeqCst) as u64,
          "bytesSentControl": m.bytes_sent_control.load(Ordering::SeqCst) as u64,
          "bytesSentData": m.bytes_sent_data.load(Ordering::SeqCst) as u64,
          "bytesReceived": m.bytes_received.load(Ordering::SeqCst) as u64
        })))
      },
      &self.state,
      control,
      buf,
    )
  }
}

impl Named for OpMetrics {
  const NAME: &'static str = "metrics";
}

Now the same op looks like this:

pub struct OpMetrics;

impl DenoOpDispatcher for OpMetrics {
  fn dispatch(
    &self,
    state: &ThreadSafeState,
    control: &[u8],
    buf: Option<PinnedBuf>,
  ) -> CoreOp {
    wrap_json_op(
      move |_args, _zero_copy| {
        let m = &state.metrics;

        Ok(JsonOp::Sync(json!({
          "opsDispatched": m.ops_dispatched.load(Ordering::SeqCst) as u64,
          "opsCompleted": m.ops_completed.load(Ordering::SeqCst) as u64,
          "bytesSentControl": m.bytes_sent_control.load(Ordering::SeqCst) as u64,
          "bytesSentData": m.bytes_sent_data.load(Ordering::SeqCst) as u64,
          "bytesReceived": m.bytes_received.load(Ordering::SeqCst) as u64
        })))
      },
      control,
      buf,
    )
  }
}

impl Named for OpMetrics {
  const NAME: &'static str = "metrics";
}

@hayd
Copy link
Contributor

hayd commented Aug 29, 2019

Perhaps a silly question, but why is Named a separate trait? Could NAME be moved to DenoOpDispatcher?

@ry
Copy link
Member

ry commented Aug 29, 2019

@afinch7 I'm fine with a little extra code, but this is +2000 LoC without any significant functionality changes. I think we need to consider a different approach. For example, rather than traits, what if we register ops using closures?

cc @piscisaureus advice would be appreciated

@afinch7 afinch7 force-pushed the new_dispatch_system branch 4 times, most recently from 3e4c3f4 to 2e32bd9 Compare September 3, 2019 16:02
@afinch7 afinch7 force-pushed the new_dispatch_system branch 3 times, most recently from dbe541f to f4b817e Compare September 10, 2019 17:07
@afinch7 afinch7 closed this Sep 10, 2019
@afinch7 afinch7 force-pushed the new_dispatch_system branch from f4b817e to acee194 Compare September 10, 2019 17:43
@afinch7 afinch7 reopened this Sep 10, 2019
@afinch7 afinch7 force-pushed the new_dispatch_system branch 3 times, most recently from 660a0d5 to cc66714 Compare September 10, 2019 17:57
@afinch7
Copy link
Contributor Author

afinch7 commented Sep 10, 2019

@ry I'm going to punt implementing this in json ops for now. I did at least implement for dispatch minimal ops. This should help keep the size a lot more manageable.

@afinch7 afinch7 force-pushed the new_dispatch_system branch 2 times, most recently from 01406ff to baf24e1 Compare September 10, 2019 20:03
@ry
Copy link
Member

ry commented Oct 2, 2019

Closing because old. This PR was very helpful in starting a discussion - we've since landed
a569be8
75eeac0
ffbf0c2
which partially address what this PR was going for, but in more piecemeal bits. We still have more work to do in cleaning up the dispatch infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants