Multi-threaded setup #189

chr4 · 2016-05-12T16:05:10Z

I've been running the internal osrm-backend C++ API in the past, and recently came across that it's not supposed to be run in production. On my way to migrate to node-osrm, I came across an issue: I can't get a setup to work that uses more than ~1.5 CPU threads.

As nodejs is single-threaded, I was using the nodejs internal cluster module, and I also tried load-balancing requests using haproxy to two different instances of node-osrm on the same machine.
I'm using var osrm = new OSRM(); to access the map data via SHM (loaded via osrm-datastore).

While both approaches succeed in spawning multiple instances of osrm-node and all instances are retrieving and processing requests, the overall CPU load on the server is still roughly the same as with the single node-osrm setup. I therefore assume that there's a lock somewhere in the osrm-node -> libosrm techstack preventing parallel use.

How do you guys run node-osrm in production? Are you just relying on a single core, or is there something I missed?

The text was updated successfully, but these errors were encountered:

danpat · 2016-05-12T18:31:06Z

@chr4 node-osrm uses libuv to maintain a worker thread pool - route requests run async.

You should construct a single var osrm = new OSRM();, then re-use that in your request handler, node-osrm handles the threading.

By default, libuv should set the thread pool size equal to your number of CPUs. However, we usually see better utilization by setting it to 1.5x.

Try adding this to your JS code before creating the new OSRM() object:

process.env.UV_THREADPOOL_SIZE = Math.ceil(require('os').cpus().length * 1.5);

chr4 · 2016-05-18T10:36:54Z

Thanks for the hints!
I'm using a single instance of var osrm = new OSRM(); shared by the request handler.

Unfortunately, regardless of the UV_THREADPOOL_SIZE (and regardless whether I export it on the shell or set it in node.js), max CPU usage of the node process is ~160% (on a 16 core machine).

osrm module was installed using npm install osrm

Here's the code I'm using (missing code just assembles the config hash):

var express = require('express');
var OSRM = require('osrm');

var app = express();
var osrm = new OSRM();

[...]

function getDrivingDirections(req, res) {
    [...]

    osrm.route(config, function(err, result) {
        if (err)  e
            returnres.json({
                error: err.message
            });
        } else {
            return res.json(result);
        }
    });
}

app.get('/route/v1/driving/:coordinates', getDrivingDirections);

console.log('Listening on port: ' + 5000);
app.listen(5000);

Any further hints?

danpat · 2016-05-18T13:40:01Z

@chr4 Sounds like you might be I/O bound at some layer.

Almost all of the routing data is loaded into RAM, except the coordinate index (the .fileIndex file). Every coordinate you supply in a request will cause at least one read from that file.

If you're not running on an SSD disk, try that. If you've got enough RAM, you can also try moving everything into a ramdisk (e.g. using tmpfs).

chr4 · 2016-05-18T13:56:19Z

I'm using an SSD RAID with a lot more than 700mb/s (probably > 1gb/s) . Furthermore, the old setup using osrm-routed happily uses all cores, even though the system is on slower hardware. Shouldn't the IO issue also affect the osrm-routed machine?

top and iotop also display an iowait/ usage of below 0.1% when firing around 1000 requests (64 in parallel)

Any further ideas?

To make sure I got it right: Even the use of osrm-datastore requires reads to the .fileIndex file?

danpat · 2016-05-18T15:13:34Z

@chr4 Yes, even with osrm-datastore, the .fileIndex file remains on disk. But you're right, that doesn't sound like your problem.

I'm not sure what else to suggest at this stage. There's a bottleneck somewhere that you're going to need to track down. Have you tried removing the OSRM call and see if your Node process can create sufficient CPU load by itself?

TheMarex · 2016-05-18T23:13:35Z

@chr4 actually we tried to reproduce this and saw a similar behavior. @daniel-j-h is working on a fix that should utilize the all the cores using nan::AsyncWorker instead of hooking into the libuv thread pool directly.

chr4 · 2016-05-19T10:25:35Z

Thanks for looking into this!
Your explanation sounds like this is an libuv issue?

Let me know if you have something to try out, or in case you need any other feedback.

TheMarex · 2016-05-19T13:54:17Z

We haven't figured out exactly what changed in the libuv behavior since the 0.10.x series but it seems like it doesn't play nicely anymore uv_queue_work(uv_default_loop(), ..). It looks like it actually uses UV_THREADPOOL_SIZE threads since the node process on our server has the right amount of threads, but only uses two of those.

daniel-j-h · 2016-05-19T16:32:19Z

For the record, properly using NAN's AsyncWorker abstractions fix this. I got a prototype working yesterday and will now re-write our Node.js bindings accordingly.

This fixes concurrency issues reported in #189. References: - https://github.com/nodejs/nan#asynchronous-work-helpers - https://github.com/nodejs/nan/blob/master/doc/asyncworker.md#api_nan_async_worker - https://github.com/nodejs/nan/blob/master/doc/asyncworker.md#api_nan_async_queue_worker

daniel-j-h · 2016-05-22T02:41:01Z

@chr4 I worked on this during the last days and my pull request already landed in master making CPUs happy again. Can you try if it resolves your issues and if so close this ticket.

chr4 · 2016-05-23T10:54:59Z

Wow, that was quick! Is the fix released in 5.1.1 or do I need to build master from source?
I'll check the new version this week.

TheMarex · 2016-05-23T13:19:33Z

@chr4 it's in 5.1.1, should be good to just npm install.

chr4 · 2016-05-25T08:59:21Z

I tried out version 5.1.1 from npm install osrm, unfortunately I have exactly the same issue as before, max ~160% CPU usage for the node process, with no iowait.

Is there anything I have to change in my .js implementation to use the new changes?

Out of curiosity: The CHANGELOG of osrm-backend indicates improvements for osrm-routed. Does this mean that it might be recommended for production in the future?

TheMarex · 2016-05-25T14:39:10Z

Does this mean that it might be recommended for production in the future?

No these were external contributions.

Can you test the following for me?

Go into the node-osrm repository do:

npm install
make shm

Then execute the following script and watch CPU usage:

var OSRM = require('.');
var berlin_path = require('./test/osrm-data-path').data_path;

var osrm = new OSRM(berlin_path);
var options = {
    coordinates: [[13.393252,52.542648],[13.39478,52.543079],[13.397389,52.542107]],
    timestamps: [1424684612, 1424684616, 1424684620]
};
for(var i = 0; i < 10000; ++i)
{
    osrm.match(options, function(err, response) {
      console.log(i);
    });
}

EDIT: Sorry used the wrong script.

chr4 · 2016-05-25T15:09:36Z

npm install and make shm work, but starting the script results in the following error:

$ node test.js                                                                                              
module.js:327
    throw err;
    ^

Error: Cannot find module '/opt/tmp/lib/binding/osrm.node'
    at Function.Module._resolveFilename (module.js:325:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/opt/tmp/lib/osrm.js:6:29)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Module.require (module.js:353:17)

When using make to compile it by myself

export CXX=g++ # won't compile with clang++

I get the following:

/opt/tmp/test.js:4
var osrm = new OSRM(berlin_path);
           ^

TypeError: Invalid file paths given!

I tried pointing to the .osrm manually:

/opt/tmp/test.js:4
var osrm = new OSRM('./test/data/berlin-latest.osrm');
           ^

TypeError: Invalid file paths given!

The .osrm file exists though:

ls -hal test/data/berlin-latest.osrm
-rw-r--r-- 1 root root 8.4M May 25 17:02 test/data/berlin-latest.osrm

TheMarex · 2016-05-25T15:45:46Z

@chr4 sorry about that. You should probably do: git checkout v5.1.1 first since master doesn't have binaries. Just to be sure also run make clean - seems like there are some outdated data files.

chr4 · 2016-06-01T09:47:59Z

Sorry to bother again, but I get exactly the same error messages after doing a fresh git clone and git checkout v5.1.1 before running npm install make shm node test.js.

This time though, it seems to work after running export CXX=g++; make

top now shows a usage of ~500%, which is not all 16 cores used, but a lot more than just 1.5.

When running my server.js in the v5.1.1 environment, CPU usage stays below ~200%, though.

Next steps? Let me know if there's anything else that I can test out/ provide to help.

TheMarex · 2016-06-01T12:49:04Z

top now shows a usage of ~500%, which is not all 16 cores used, but a lot more than just 1.5.

For the test-script from above? Hm. Can you try to add

process.env.UV_THREADPOOL_SIZE = Math.ceil(require('os').cpus().length * 1.5);

at the very top of the script?

When running my server.js in the v5.1.1 environment, CPU usage stays below ~200%, though.

This could be caused by something saturating the main thread of your node process with blocking calls that are not OSRM. A good way to test this is to replace all OSRM calls with async timeouts like:

var express = require('express');
var OSRM = require('osrm');

var app = express();
var osrm = new OSRM();

[...]

function getDrivingDirections(req, res) {
    [...]
    setTimeout(function() {
            return res.json(someDummyResponse);
    }, 100);
}

app.get('/route/v1/driving/:coordinates', getDrivingDirections);

console.log('Listening on port: ' + 5000);
app.listen(5000);

Expected behavior of the above script: It still maxes out at 200% CPU. If you can now magically push beyond the 200% we might have another performance bug on our hand with the node bindings. (maybe validation?)

chr4 · 2016-06-01T13:31:43Z

Setting UV_THREADPOOL_SIZE helps a little - I tried increasing it up to factor 3.5, but I can't get it beyond 600-700% CPU usage (on 16 cores listed in /proc/cpuinfo).

You're right. When wrapping osrm calls into setTimeout(), the CPU usage is roughly the same as before (~200%). I wasn't entirely sure what you meant with async timeouts, this is what I tried:

function getDrivingDirections(req, res) {
    [...]

    setTimeout(function() {
        osrm.route(config, function(err, result) {
            if (err) {
                return res.json({
                    error: err.message
                });
            } else {
                return res.json(result);
            }
        });
    }, 100);
}

chr4 · 2016-06-22T09:11:59Z

Update: I just replaced node-osrm with osrm-routed for testing purposes and osrm-routed saw the same limits as node-osrm regarding only 2-3 CPU cores being used.
This doesn't happen in our legacy setup using osrm-backend-0.3.x, and I'm not aware of any fundamental differences regarding the server setup.
Any further ideas for your side? Do you think this could be an osrm-backend issue?

TheMarex · 2016-06-22T09:41:47Z

Any further ideas for your side? Do you think this could be an osrm-backend issue?

Hm a few questions:

Do you use shared memory (osrm-datastore + osrm-routed) or internal memory (only osrm-routed)?
Do you specify the -t parameter to osrm-routed? Does setting it to 16 help?
Does this also occur if you manually run a list of requests against osrm-routed directly, or is this with your production workload? Congestion could happen before requests actually hit osrm-routed.

To narrow this down you could also try 4.9.1 as the last pre-5 release (where a lot of things changed that could maybe cause regressions).

chr4 · 2016-06-22T10:04:23Z

Yes. I ran osrm-datastore, then benchmarked node-osrm, shut down node-osrm and started osrm-routed (using --shared-memory true)
Yes, I'm using --threads 16. (The osrm cookbook defaults to the number of available cores)
I have a benchmark script that runs production-like queries. Where would the congestion bottleneck be most likely? The old setup has a small Jetty service that proxies the requests to osrm-routed, while the new setup is sending the queries directly to node-osrm (resp. osrm-routed). I tried setting up a simple nginx reverse-proxy in front of osrm-routed, but that didn't make any difference.

We never used 4.9.x in production, only for tests (production still uses ancient 0.3.x). I suppose even more things changed between 0.3.x and 5.0.x :))

For the record: The new setup is approx. as fast as the old one (that uses all the cores), so it's not like it is overall slower. I just imagine it could be like three times as fast, if we'd get it to use all cores available.

TheMarex · 2016-06-22T10:12:46Z

Where would the congestion bottleneck be most likely?

If the script that sends the requests is too slow (because it is not concurrent for example) the server can't reach full load.

If I get a sample of the requests I can try to reproduce this on my machines.

chr4 · 2016-06-22T13:15:07Z

We're generating 1000 requests and then firing them multi-threaded. I've generated a set to test for you (sorry, hope this won't blow up Github comments (does Github Markup supports spoilers?) :))

https://gist.github.com/chr4/f404f5bdfe81fe27fce1d33f037391e3

chr4 · 2016-06-22T13:29:58Z

You might be right. When using siege to benchmark, CPU usage goes up to 500%. While it's still not using all cores available, this is clearly a big improvement. There might be some sort of other limitation. Could it even be the TCP stack itself? Even when firing the requests from localhost, it won't get beyond the 5 core mark. How much usage can you get on your system?

TheMarex · 2016-06-22T14:28:39Z

@chr4 these requests seem to have Longitude and Latitude swapped? The order for v5 is lon,lat not lat,lon.

chr4 · 2016-06-22T14:59:22Z

Oh, my bad, sorry. I wasn't sure when generating the URLs, I updated the post with the gist with a link to an updated list.

This obviously affects the results: CPU usage sometimes shortly spikes to 700-800% (for like 0.1s), average is around 200% again, though.

daniel-j-h · 2016-06-23T12:41:58Z

With OSRM v5.2.5 via npm install osrm (binaries at ./node_modules/osrm/lib/binding/), germany-latest.osm.pbf from Geofabrik. Setup is running multiple wget -i in parallel on your routes:

1/ osrm-routed --threads 16 germany-latest.osrm

2/ osrm-datastore germany-latest.osrm then osrm-routed --threads 16 --shared-memory

daniel-j-h · 2016-06-23T13:05:04Z

To be clear here: this shows osrm-routed and not a node-osrm-based JavaScript server.

This still shows that your statement

I just replaced node-osrm with osrm-routed for testing purposes and osrm-routed saw the same limits as node-osrm regarding only 2-3 CPU cores being used.

lets me think your bottleneck is not in OSRM at all.

daniel-j-h · 2016-10-20T22:26:28Z

I'm closing this as resolved for us. We haven't heard back from you in a while.

Feel free to test with 5.4 and open a new issue if you still see performance issues.

danpat · 2018-09-03T06:21:21Z

Small note: if you're not interested in examining the OSRM object tree, but only care about sending JSON over the network (i.e. you're using node-osrm along with expressjs to make a HTTP server), then this PR: Project-OSRM/osrm-backend#5189 will likely be of interest.

f3d0r · 2018-09-03T19:55:23Z

Thanks for the advice! Is there anything I can do to allow SSH-type behavior with the node.js wrapper for OSRM? Also (this might be better in a separate issue), but is it possible to use MLD-compiled .osrm files for node-osrm?

daniel-j-h mentioned this issue May 19, 2016

Refactors Bindinds to use Async Workers #193

Merged

daniel-j-h closed this as completed Oct 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threaded setup #189

Multi-threaded setup #189

chr4 commented May 12, 2016

danpat commented May 12, 2016

chr4 commented May 18, 2016 •

edited

Loading

danpat commented May 18, 2016

chr4 commented May 18, 2016

danpat commented May 18, 2016

TheMarex commented May 18, 2016

chr4 commented May 19, 2016

TheMarex commented May 19, 2016

daniel-j-h commented May 19, 2016

daniel-j-h commented May 22, 2016

chr4 commented May 23, 2016

TheMarex commented May 23, 2016

chr4 commented May 25, 2016

TheMarex commented May 25, 2016 •

edited

Loading

chr4 commented May 25, 2016

TheMarex commented May 25, 2016

chr4 commented Jun 1, 2016

TheMarex commented Jun 1, 2016

chr4 commented Jun 1, 2016

chr4 commented Jun 22, 2016

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016 •

edited

Loading

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016 •

edited

Loading

chr4 commented Jun 22, 2016

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016

daniel-j-h commented Jun 23, 2016 •

edited

Loading

daniel-j-h commented Jun 23, 2016

daniel-j-h commented Oct 20, 2016

danpat commented Sep 3, 2018

f3d0r commented Sep 3, 2018

Multi-threaded setup #189

Multi-threaded setup #189

Comments

chr4 commented May 12, 2016

danpat commented May 12, 2016

chr4 commented May 18, 2016 • edited Loading

danpat commented May 18, 2016

chr4 commented May 18, 2016

danpat commented May 18, 2016

TheMarex commented May 18, 2016

chr4 commented May 19, 2016

TheMarex commented May 19, 2016

daniel-j-h commented May 19, 2016

daniel-j-h commented May 22, 2016

chr4 commented May 23, 2016

TheMarex commented May 23, 2016

chr4 commented May 25, 2016

TheMarex commented May 25, 2016 • edited Loading

chr4 commented May 25, 2016

TheMarex commented May 25, 2016

chr4 commented Jun 1, 2016

TheMarex commented Jun 1, 2016

chr4 commented Jun 1, 2016

chr4 commented Jun 22, 2016

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016 • edited Loading

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016 • edited Loading

chr4 commented Jun 22, 2016

TheMarex commented Jun 22, 2016

chr4 commented Jun 22, 2016

daniel-j-h commented Jun 23, 2016 • edited Loading

daniel-j-h commented Jun 23, 2016

daniel-j-h commented Oct 20, 2016

danpat commented Sep 3, 2018

f3d0r commented Sep 3, 2018

chr4 commented May 18, 2016 •

edited

Loading

TheMarex commented May 25, 2016 •

edited

Loading

chr4 commented Jun 22, 2016 •

edited

Loading

chr4 commented Jun 22, 2016 •

edited

Loading

daniel-j-h commented Jun 23, 2016 •

edited

Loading