-
Notifications
You must be signed in to change notification settings - Fork 408
Server Failover
This page aims to explain how to Leshan Server behave after a restart/crash or reboot.
Here is a short description of what could be persisted or not with Leshan.
- At DTLS/Scandium level : By default there is no persisted state. DTLS session or connection is not persisted. Scandium provide a way to persist Session via SessionCache, we never experiment it but you can try it with Leshan.
- At CoAP/Californium level : The only state which could be persisted is about observation. MID/Token for short living request can not be persist.
- At LWM2M/Leshan level : we could persist registrations, observations, bootstrap configuration and security information by implemented corresponding store.
Concretely what does it means ?
A connection in DTLS contains all information needed to decrypt a message. Without any DTLS extension a connection is identified by address/port of foreign peer.
So when an encrypted packet is received, server will search if there is connection associated to this packet, if there is no connection or if connection available does not allow to decrypt packet, packet will be dropped/ignored. So after a restart, all encrypted packet will be dropped. Meaning that clients need to established a new connection by doing a full or abbreviated handshake.
What's happen if server send a request to a client after a restart ?
- For queue mode, this should not really happen as this is always client which initiates the connection.
- For "standard/connected" mode, the LWM2M specification is not clear about that but currently Leshan server will try to established a connection to the client : So Leshan server will act as DTLS client.
A session contains a set of negotiated cryptography parameters from which a connection can be created. 1 connection is linked to only 1 session but 1 session could be used to create several connections.
A full handshake create a new session with a corresponding connection. A session can be reused to create new connection using abbreviated handshake (also called session resumption). Session resumption allow to save some cryptographic parameters transfer and computation (saving some latency, CPU and bandwidth). The benefits is mainly present for X509 where certificates can be very large. For PSK there is almost no benefits...
By default session is not persisted. But as explained above, Scandium provides a way to do that.
Californium handles some states in memory like MID and token. CoAP MID is used to detect duplicates and for optional reliability. CoAP Token is used to match request with response.
Currently Californium does not provide any way to persist this state (except token for notifications).
So what happen after a restart ?
Short answer : "all short living exchange are lost" and "long living exchange(notification) could still be handled".
Detailed answer :
- deduplication of old messages is not possible. Generally this is not an issue as duplicate messages do not cross the DTLS layer as connection is lost...
- retransmission(reliability) of outgoing messages is lost.
- response of request sent before the restart can not be handled.
- for observe, once the relation is established, notification could be handle after a reboot. (see more detailed above)
Registration could be persisted and so could survive to a restart. Observation could be persisted with registration (see more detailed above)
A digression will be necessary to better understand the way Leshan works with notifications.
The CoAP Observe RFC is very strict about request/notification matching over DTLS. Maybe a bit too strict. This brings to frictions with real world and some LWM2M features.
It says :
"All notifications resulting from a GET request with an Observe Option MUST be returned within the same epoch of the same connection as the request."
This means that if you lost the DTLS connection, observe relation is lost...
This problem is discussed at OMA (see here) and IETF (see there).
At IETF, core team seems to be aware about the issue but we don't see any proposition to change this for now.
At OMA, the solution was to strictly linked session lifetime and registration lifetime which does not really respect the CoAP Observe RFC too. This allows notification to be handled after an abbreviated handshake but not after a full-handshake...
Faced with this situation, the Californium team chose to go with a fully customizable way to match requests and responses/notifications. (see EndpointContextMatcher)
At Leshan side, we chose to use the flexibility offered by Californium to relaxed by default the CoAP observe constraint(up to you to make it stricter if you prefer). We currently accept notifications if the DTLS identity is the same as the one linked to the registration. So this means if you do a full handshake, meaning having a new DTLS connection/session, we will accept your notification as long as you are using the good DTLS identity.
This allow to keep observation relationship after a restart even if you don't persist DTLS connection or session.
All contributions you make to our web site (including this wiki) are governed by our Terms of Use, so please take the time to actually read it. Your interactions with the Eclipse Foundation web properties and any information you may provide us about yourself are governed by our Privacy Policy.