Skip to content
This repository has been archived by the owner on Jun 8, 2024. It is now read-only.

Multi-threaded setup #189

Closed
chr4 opened this issue May 12, 2016 · 32 comments
Closed

Multi-threaded setup #189

chr4 opened this issue May 12, 2016 · 32 comments

Comments

@chr4
Copy link

chr4 commented May 12, 2016

I've been running the internal osrm-backend C++ API in the past, and recently came across that it's not supposed to be run in production. On my way to migrate to node-osrm, I came across an issue: I can't get a setup to work that uses more than ~1.5 CPU threads.

As nodejs is single-threaded, I was using the nodejs internal cluster module, and I also tried load-balancing requests using haproxy to two different instances of node-osrm on the same machine.
I'm using var osrm = new OSRM(); to access the map data via SHM (loaded via osrm-datastore).

While both approaches succeed in spawning multiple instances of osrm-node and all instances are retrieving and processing requests, the overall CPU load on the server is still roughly the same as with the single node-osrm setup. I therefore assume that there's a lock somewhere in the osrm-node -> libosrm techstack preventing parallel use.

How do you guys run node-osrm in production? Are you just relying on a single core, or is there something I missed?

@danpat
Copy link
Member

danpat commented May 12, 2016

@chr4 node-osrm uses libuv to maintain a worker thread pool - route requests run async.

You should construct a single var osrm = new OSRM();, then re-use that in your request handler, node-osrm handles the threading.

By default, libuv should set the thread pool size equal to your number of CPUs. However, we usually see better utilization by setting it to 1.5x.

Try adding this to your JS code before creating the new OSRM() object:

process.env.UV_THREADPOOL_SIZE = Math.ceil(require('os').cpus().length * 1.5);

@chr4
Copy link
Author

chr4 commented May 18, 2016

Thanks for the hints!
I'm using a single instance of var osrm = new OSRM(); shared by the request handler.

Unfortunately, regardless of the UV_THREADPOOL_SIZE (and regardless whether I export it on the shell or set it in node.js), max CPU usage of the node process is ~160% (on a 16 core machine).

osrm module was installed using npm install osrm

Here's the code I'm using (missing code just assembles the config hash):

var express = require('express');
var OSRM = require('osrm');

var app = express();
var osrm = new OSRM();

[...]

function getDrivingDirections(req, res) {
    [...]

    osrm.route(config, function(err, result) {
        if (err)  e
            returnres.json({
                error: err.message
            });
        } else {
            return res.json(result);
        }
    });
}

app.get('/route/v1/driving/:coordinates', getDrivingDirections);

console.log('Listening on port: ' + 5000);
app.listen(5000);

Any further hints?

@danpat
Copy link
Member

danpat commented May 18, 2016

@chr4 Sounds like you might be I/O bound at some layer.

Almost all of the routing data is loaded into RAM, except the coordinate index (the .fileIndex file). Every coordinate you supply in a request will cause at least one read from that file.

If you're not running on an SSD disk, try that. If you've got enough RAM, you can also try moving everything into a ramdisk (e.g. using tmpfs).

@chr4
Copy link
Author

chr4 commented May 18, 2016

I'm using an SSD RAID with a lot more than 700mb/s (probably > 1gb/s) . Furthermore, the old setup using osrm-routed happily uses all cores, even though the system is on slower hardware. Shouldn't the IO issue also affect the osrm-routed machine?

top and iotop also display an iowait/ usage of below 0.1% when firing around 1000 requests (64 in parallel)

Any further ideas?

To make sure I got it right: Even the use of osrm-datastore requires reads to the .fileIndex file?

@danpat
Copy link
Member

danpat commented May 18, 2016

@chr4 Yes, even with osrm-datastore, the .fileIndex file remains on disk. But you're right, that doesn't sound like your problem.

I'm not sure what else to suggest at this stage. There's a bottleneck somewhere that you're going to need to track down. Have you tried removing the OSRM call and see if your Node process can create sufficient CPU load by itself?

@TheMarex
Copy link
Member

@chr4 actually we tried to reproduce this and saw a similar behavior. @daniel-j-h is working on a fix that should utilize the all the cores using nan::AsyncWorker instead of hooking into the libuv thread pool directly.

@chr4
Copy link
Author

chr4 commented May 19, 2016

Thanks for looking into this!
Your explanation sounds like this is an libuv issue?

Let me know if you have something to try out, or in case you need any other feedback.

@TheMarex
Copy link
Member

We haven't figured out exactly what changed in the libuv behavior since the 0.10.x series but it seems like it doesn't play nicely anymore uv_queue_work(uv_default_loop(), ..). It looks like it actually uses UV_THREADPOOL_SIZE threads since the node process on our server has the right amount of threads, but only uses two of those.

@daniel-j-h
Copy link
Member

For the record, properly using NAN's AsyncWorker abstractions fix this. I got a prototype working yesterday and will now re-write our Node.js bindings accordingly.

@daniel-j-h
Copy link
Member

@chr4 I worked on this during the last days and my pull request already landed in master making CPUs happy again. Can you try if it resolves your issues and if so close this ticket.

@chr4
Copy link
Author

chr4 commented May 23, 2016

Wow, that was quick! Is the fix released in 5.1.1 or do I need to build master from source?
I'll check the new version this week.

@TheMarex
Copy link
Member

@chr4 it's in 5.1.1, should be good to just npm install.

@chr4
Copy link
Author

chr4 commented May 25, 2016

I tried out version 5.1.1 from npm install osrm, unfortunately I have exactly the same issue as before, max ~160% CPU usage for the node process, with no iowait.

Is there anything I have to change in my .js implementation to use the new changes?

Out of curiosity: The CHANGELOG of osrm-backend indicates improvements for osrm-routed. Does this mean that it might be recommended for production in the future?

@TheMarex
Copy link
Member

TheMarex commented May 25, 2016

Does this mean that it might be recommended for production in the future?

No these were external contributions.

Can you test the following for me?

Go into the node-osrm repository do:

npm install
make shm

Then execute the following script and watch CPU usage:

var OSRM = require('.');
var berlin_path = require('./test/osrm-data-path').data_path;

var osrm = new OSRM(berlin_path);
var options = {
    coordinates: [[13.393252,52.542648],[13.39478,52.543079],[13.397389,52.542107]],
    timestamps: [1424684612, 1424684616, 1424684620]
};
for(var i = 0; i < 10000; ++i)
{
    osrm.match(options, function(err, response) {
      console.log(i);
    });
}

EDIT: Sorry used the wrong script.

@chr4
Copy link
Author

chr4 commented May 25, 2016

npm install and make shm work, but starting the script results in the following error:

$ node test.js                                                                                              
module.js:327
    throw err;
    ^

Error: Cannot find module '/opt/tmp/lib/binding/osrm.node'
    at Function.Module._resolveFilename (module.js:325:15)
    at Function.Module._load (module.js:276:25)
    at Module.require (module.js:353:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/opt/tmp/lib/osrm.js:6:29)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Module.require (module.js:353:17)

When using make to compile it by myself

export CXX=g++ # won't compile with clang++

I get the following:

/opt/tmp/test.js:4
var osrm = new OSRM(berlin_path);
           ^

TypeError: Invalid file paths given!

I tried pointing to the .osrm manually:

/opt/tmp/test.js:4
var osrm = new OSRM('./test/data/berlin-latest.osrm');
           ^

TypeError: Invalid file paths given!

The .osrm file exists though:

ls -hal test/data/berlin-latest.osrm
-rw-r--r-- 1 root root 8.4M May 25 17:02 test/data/berlin-latest.osrm

@TheMarex
Copy link
Member

@chr4 sorry about that. You should probably do: git checkout v5.1.1 first since master doesn't have binaries. Just to be sure also run make clean - seems like there are some outdated data files.

@chr4
Copy link
Author

chr4 commented Jun 1, 2016

Sorry to bother again, but I get exactly the same error messages after doing a fresh git clone and git checkout v5.1.1 before running npm install make shm node test.js.

This time though, it seems to work after running export CXX=g++; make

top now shows a usage of ~500%, which is not all 16 cores used, but a lot more than just 1.5.

When running my server.js in the v5.1.1 environment, CPU usage stays below ~200%, though.

Next steps? Let me know if there's anything else that I can test out/ provide to help.

@TheMarex
Copy link
Member

TheMarex commented Jun 1, 2016

top now shows a usage of ~500%, which is not all 16 cores used, but a lot more than just 1.5.

For the test-script from above? Hm. Can you try to add

process.env.UV_THREADPOOL_SIZE = Math.ceil(require('os').cpus().length * 1.5);

at the very top of the script?

When running my server.js in the v5.1.1 environment, CPU usage stays below ~200%, though.

This could be caused by something saturating the main thread of your node process with blocking calls that are not OSRM. A good way to test this is to replace all OSRM calls with async timeouts like:

var express = require('express');
var OSRM = require('osrm');

var app = express();
var osrm = new OSRM();

[...]

function getDrivingDirections(req, res) {
    [...]
    setTimeout(function() {
            return res.json(someDummyResponse);
    }, 100);
}

app.get('/route/v1/driving/:coordinates', getDrivingDirections);

console.log('Listening on port: ' + 5000);
app.listen(5000);

Expected behavior of the above script: It still maxes out at 200% CPU. If you can now magically push beyond the 200% we might have another performance bug on our hand with the node bindings. (maybe validation?)

@chr4
Copy link
Author

chr4 commented Jun 1, 2016

Setting UV_THREADPOOL_SIZE helps a little - I tried increasing it up to factor 3.5, but I can't get it beyond 600-700% CPU usage (on 16 cores listed in /proc/cpuinfo).

You're right. When wrapping osrm calls into setTimeout(), the CPU usage is roughly the same as before (~200%). I wasn't entirely sure what you meant with async timeouts, this is what I tried:

function getDrivingDirections(req, res) {
    [...]

    setTimeout(function() {
        osrm.route(config, function(err, result) {
            if (err) {
                return res.json({
                    error: err.message
                });
            } else {
                return res.json(result);
            }
        });
    }, 100);
}

@chr4
Copy link
Author

chr4 commented Jun 22, 2016

Update: I just replaced node-osrm with osrm-routed for testing purposes and osrm-routed saw the same limits as node-osrm regarding only 2-3 CPU cores being used.
This doesn't happen in our legacy setup using osrm-backend-0.3.x, and I'm not aware of any fundamental differences regarding the server setup.
Any further ideas for your side? Do you think this could be an osrm-backend issue?

@TheMarex
Copy link
Member

Any further ideas for your side? Do you think this could be an osrm-backend issue?

Hm a few questions:

  1. Do you use shared memory (osrm-datastore + osrm-routed) or internal memory (only osrm-routed)?
  2. Do you specify the -t parameter to osrm-routed? Does setting it to 16 help?
  3. Does this also occur if you manually run a list of requests against osrm-routed directly, or is this with your production workload? Congestion could happen before requests actually hit osrm-routed.

To narrow this down you could also try 4.9.1 as the last pre-5 release (where a lot of things changed that could maybe cause regressions).

@chr4
Copy link
Author

chr4 commented Jun 22, 2016

  1. Yes. I ran osrm-datastore, then benchmarked node-osrm, shut down node-osrm and started osrm-routed (using --shared-memory true)
  2. Yes, I'm using --threads 16. (The osrm cookbook defaults to the number of available cores)
  3. I have a benchmark script that runs production-like queries. Where would the congestion bottleneck be most likely? The old setup has a small Jetty service that proxies the requests to osrm-routed, while the new setup is sending the queries directly to node-osrm (resp. osrm-routed). I tried setting up a simple nginx reverse-proxy in front of osrm-routed, but that didn't make any difference.

We never used 4.9.x in production, only for tests (production still uses ancient 0.3.x). I suppose even more things changed between 0.3.x and 5.0.x :))

For the record: The new setup is approx. as fast as the old one (that uses all the cores), so it's not like it is overall slower. I just imagine it could be like three times as fast, if we'd get it to use all cores available.

@TheMarex
Copy link
Member

Where would the congestion bottleneck be most likely?

If the script that sends the requests is too slow (because it is not concurrent for example) the server can't reach full load.

If I get a sample of the requests I can try to reproduce this on my machines.

@chr4
Copy link
Author

chr4 commented Jun 22, 2016

We're generating 1000 requests and then firing them multi-threaded. I've generated a set to test for you (sorry, hope this won't blow up Github comments (does Github Markup supports spoilers?) :))

https://gist.github.com/chr4/f404f5bdfe81fe27fce1d33f037391e3

@chr4
Copy link
Author

chr4 commented Jun 22, 2016

You might be right. When using siege to benchmark, CPU usage goes up to 500%. While it's still not using all cores available, this is clearly a big improvement. There might be some sort of other limitation. Could it even be the TCP stack itself? Even when firing the requests from localhost, it won't get beyond the 5 core mark. How much usage can you get on your system?

@TheMarex
Copy link
Member

@chr4 these requests seem to have Longitude and Latitude swapped? The order for v5 is lon,lat not lat,lon.

@chr4
Copy link
Author

chr4 commented Jun 22, 2016

Oh, my bad, sorry. I wasn't sure when generating the URLs, I updated the post with the gist with a link to an updated list.

This obviously affects the results: CPU usage sometimes shortly spikes to 700-800% (for like 0.1s), average is around 200% again, though.

@daniel-j-h
Copy link
Member

daniel-j-h commented Jun 23, 2016

With OSRM v5.2.5 via npm install osrm (binaries at ./node_modules/osrm/lib/binding/), germany-latest.osm.pbf from Geofabrik. Setup is running multiple wget -i in parallel on your routes:

1/ osrm-routed --threads 16 germany-latest.osrm

1

2/ osrm-datastore germany-latest.osrm then osrm-routed --threads 16 --shared-memory

2

@daniel-j-h
Copy link
Member

To be clear here: this shows osrm-routed and not a node-osrm-based JavaScript server.

This still shows that your statement

I just replaced node-osrm with osrm-routed for testing purposes and osrm-routed saw the same limits as node-osrm regarding only 2-3 CPU cores being used.

lets me think your bottleneck is not in OSRM at all.

@daniel-j-h
Copy link
Member

I'm closing this as resolved for us. We haven't heard back from you in a while.

Feel free to test with 5.4 and open a new issue if you still see performance issues.

@danpat
Copy link
Member

danpat commented Sep 3, 2018

Small note: if you're not interested in examining the OSRM object tree, but only care about sending JSON over the network (i.e. you're using node-osrm along with expressjs to make a HTTP server), then this PR: Project-OSRM/osrm-backend#5189 will likely be of interest.

@f3d0r
Copy link

f3d0r commented Sep 3, 2018

Thanks for the advice! Is there anything I can do to allow SSH-type behavior with the node.js wrapper for OSRM? Also (this might be better in a separate issue), but is it possible to use MLD-compiled .osrm files for node-osrm?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants