-
Notifications
You must be signed in to change notification settings - Fork 408
Cluster
Today a Leshan server cannot be easily deployed in a cluster for high-availability and scalability. This page list the modification to be done to Leshan and Californium and propose an example on how to deploy Leshan as a cluster.
The southbound interface is the interface between the device and server communication (CoAP).
If we deploy Leshan as mutiple machine we need a front load balancer for sending the incoming messages to one of the cluster Leshan instance.
Today Leshan accept CoAP+DTLS device communications. It's all based on UDP (maybe later TCP). One of the few load-balancer supporting UDP is LVS - Linux Virtual Server. We will focus on providing a solution compatible with it. (Please if you know another Level 3 load-balancer usable in this situation please mention it!). We don't explore DNS round robin load balancing, because the amount of state to share would be larger (whole DTLS states). We prefer level 3 source IP/port based routing.
- A DTLS connection contains a lot of states (fragment, epoch, handshake states, master key etc..)
- CoAP states: MID, tokens
- LWM2M: registrations, security parameters, observations
If we try to share all of those states we are going to greatly reduce the performance. We should try to keep some on the leshan instance. For example we can say the MID and most of the DTLS states (outside of the ones needed for DTLS resume) can be kept in the Leshan instance because the level3 state-full load-balancer will push all the UDP packets coming from the same source IP/port to the same Leshan instance.
The remaining long term states to share would be: CoAP observations (Tokens), LWM2M registration, and parameters for starting a DTLS handshake.
For a first step DTLS resume can be excluded because we can force the client into a full re-handshake (which is not so costly with PSK schemes).
Leshan: the registration store is already good, the only problem is the Observation registry.
Californium: observation are based on the "exchange" abstraction and is quite difficult to persist, this should be refactored or we also could create a more low level API for receiving .
Scandium: on a second step we can extract the information for re-handshaking and share it.
The northbound interface is the interface between the Leshan cluster and the end user, like an IoT backend.
When you have a device connected to a server, only this server (which contains all the DTLS states) is able to communicate with the device.
So when a northbound interface user send a request it should be "routed" to the server in charge of this client.
The proposed solution is to use fan-out publishing in a broker. The request is published from the user to all the Leshan server instanced and if one of the server sees it's a request for one of it's devices it needs to answer back to the client that it's going to process the request. Once the request is processed the server publish the result in the broker (still in fan-out).
The client put a ticket id (a token) on the request and use it to correlate with the received fan-out responses.
(source: https://docs.google.com/presentation/d/1rGkpmPx_W37ojvlp6SL4oLuCkDlIAdd6XkTyHx2Li5k/edit?usp=sharing)
Since Leshan server exposes its capabilities from a broker interface (redis pub/sub, AMQP, etc..) the demo web UI will not be provided in the Leshan server.
So for demo/testing the leshan-server-demo project can use an in memory implementation of the borker and tunnel this to the javascript UI using web-sockets.
A first cluster implementation will be based on Redis.
This API will be based on redis Pub/Sub.
This API was experimentally implemented but is no more available in master.
This old code is still available in branch cluster
but it is not up to date.
New registration event will be accessible on LESHAN_REG_NEW channel. The payload is the new registration.
{
regDate: 1467883514122,
address: "127.0.0.1",
port: 60071,
regAddr: "0.0.0.0",
regPort: 5683,
lt: 30,
ver: "1.0",
bnd: "U",
ep: "myDevice",
regId: "ELP5Ql2v4b",
objLink: [{
"url": "/","at": {"rt": "oma.lwm2m"}
}, {
"url": "/1/0","at": {}
}, {
"url": "/3/0","at": {}
}, {
"url": "/6/0", "at": {}
}],
addAttr: {},
root: "/",
lastUp: 1467883514122
}
Registration update event will be accessible on LESHAN_REG_UP channel. The payload is the new registration (regUpdated) and the registration update (regUpdate).
{
regUpdate: {
regId: "ELP5Ql2v4b",
address: "127.0.0.1",
port: 60071
},
regUpdated: {
regDate: 1467883514122,
address: "127.0.0.1",
port: 60071,
regAddr: "0.0.0.0",
regPort: 5683,
lt: 30,
ver: "1.0",
bnd: "U",
ep: "myEndpoint",
regId: "ELP5Ql2v4b",
objLink: [{
url: "/", at: {rt: "oma.lwm2m"}
}, {
url: "/1/0", at: {}
}, {
url: "/3/0", at: {}
}, {
url: "/6/0", at: {}
}],
addAttr: {},
root: "/",
lastUp: 1467884189419
}
}
De-registration event will be accessible on LESHAN_REG_DEL channel. The payload is the registration.
{
regDate: 1467883514122,
address: "127.0.0.1",
port: 60071,
regAddr: "0.0.0.0",
regPort: 5683,
lt: 30,
ver: "1.0",
bnd: "U",
ep: "myDevice",
regId: "ELP5Ql2v4b",
objLink: [{
"url": "/","at": {"rt": "oma.lwm2m"}
}, {
"url": "/1/0","at": {}
}, {
"url": "/3/0","at": {}
}, {
"url": "/6/0", "at": {}
}],
addAttr: {},
root: "/",
lastUp: 1467883514122
}
The channel LESHAN_REQ is used to send request. The payload is a ticket for this request (ticket), the destination endpoint (ep) and the request to send (req)
{
ep:"myEndpoint",
ticket:"8c90592249c74a9b8a2da5754145dcc0",
req:{
kind:"read",
path:"/3/0/1",
contentFormat:1541,
}
}
The channel LESHAN_RESP is used to receive response. Several message can be received on this channel.
An Ack message which means that the request is handled by 1 instance in the cluster.
{
ticket:"8c90592249c74a9b8a2da5754145dcc0",
ack:true
}
An Error message which means that an error occurred on the instance which choose to handle the request.
{
ticket:"8c90592249c74a9b8a2da5754145dcc0",
err:{
errorMessage:"an error message",
}
}
A Response message with the response returned by the device.
{
ticket:"8c90592249c74a9b8a2da5754145dcc0",
resp:{
kind: "read",
code: "CONTENT",
node: {
kind:"singleresource",
id:1,
type: "string",
value: "Lightweight M2M Client",
}
}
}
or
{
ticket:"8c90592249c74a9b8a2da5754145dcc0",
resp:{
kind: "read",
code: "NOT_FOUND",
errorMessage:"a custom CoAP error message",
}
}
API to define: Timeout? cancel request ? observe ?
All contributions you make to our web site (including this wiki) are governed by our Terms of Use, so please take the time to actually read it. Your interactions with the Eclipse Foundation web properties and any information you may provide us about yourself are governed by our Privacy Policy.