Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handshake probe. #1157

Merged
merged 1 commit into from
Jan 8, 2020
Merged

Conversation

boaks
Copy link
Contributor

@boaks boaks commented Dec 2, 2019

Intended to be used, if connectivity may be the cause for missing responses.
If the 1. handshake flight is not responded, the handshake is cancelled and the session still kept.
(Stop handshake after "deep sleep").

Signed-off-by: Achim Kraus achim.kraus@bosch-si.com

@boaks
Copy link
Contributor Author

boaks commented Dec 2, 2019

Extension for 2.1.0

@boaks boaks added this to the 2.1.0 milestone Dec 2, 2019
@sbernard31
Copy link
Contributor

Could explain the probe mode a bit more. This is not clear do me 😕

@boaks
Copy link
Contributor Author

boaks commented Dec 3, 2019

I basically use this for mobile connectivity with DTLS 1.2 connection id and a "mostly stable server". If the mobile peer doesn't get a ACK or response that may have two different causes:

  1. server has lost session (association)
  2. connectivity is lost

The second is sometime hard to detect; the peer's state is connected, but effectively it's not working. In that case, after some retransmissions, the peer starts a handshake. Without this PR that removes on the client the session. If the handshake timesout (though the connection is not working), the peer still requires a new handshake after the connectivity is established again.

With this PR, the probe handshake starts a handshake without removing the session. If some data is received, the session is removed and the handshake gets completed. If no data is received, the peer assumes, that the connectivity is lost (even if it's own state indicates connectivity) and just timesout the request. if the connectivity is established again, just a new request could be send without a handshake. I use this for my android test app and it reduces the handshakes in my test by 80%. (current test run: 2400 request, 114 retransmissions, 3 request failed, 18 connects (12 by restart) => only 6 handshakes (reaching the server) for 2400 request while moving around.)

@sbernard31
Copy link
Contributor

Thx for the clarification, I will review this now

Copy link
Contributor

@sbernard31 sbernard31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sendMessage(final long nanos, final RawData message, ... begin to be a bit too long maybe we could consider to cut it in several method to improve readability.

The new isExpired state in handshaker in not so clear to me, isExpired means that last sent flight timed-out ? If yes why do we need to use a new timestamp attributes why not set the handshaked as expired in DTLSConnector.handleTimeout?

/**
* Force handshake probe before send this message. Doesn't start a
* handshake, if the connector is configured to act as server only.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should explain more what is hanshake probe (or at least add a link to #1157 (comment))

@@ -299,7 +306,8 @@ protected Handshaker(boolean isClient, int initialMessageSeq, DTLSSession sessio
this.session.setMaxTransmissionUnit(maxTransmissionUnit);
this.applicationLevelInfoSupplier = config.getApplicationLevelInfoSupplier();
this.inboundMessageBuffer = new InboundMessageBuffer();

long timeout = config.getRetransmissionTimeout();
this.expiresNanosTimeout = TimeUnit.MILLISECONDS.toNanos(timeout << (config.getMaxRetransmissions() + 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to write this without << ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding a comment be enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the content :
I guess you try to calculate the maximum time to wait before to consider the message as timed-out.
But I'm not sure to understand the math, e.g. for max retransmission = 3
t + t*2 + t*2*2 + t*2*2*2 = t*(1+2+4+8) = t*15 != t*16 = t*2^(3+1) = t <<(3+1)
Did I miss something ?
And by the way the DTLS specification limit the waiting time to 60s.
So maybe it should be more like : min(t,60)+min(t*2,60) + ...

About the form :
If I'm correct about the content, I feel that *2^a is nearer from the original idea than << a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the 60s: OK, do it in a other small PR.

With that, the calculation will be adapted to a "loop", to address the 60s limit.
By the way, the expire timeout should be some larger than the usual sum of the retransmit times to ensure, that usually a handshake timesout prior to expire.

@@ -176,6 +177,11 @@
/** Realtime nanoseconds of last sending a flight */
private long flightSendNanos;

/** Expire realtime nanoseconds */
private long expiresNanos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea about a naming which express that expiresNanos is a timestamp and expiresNanosTimeout a timeout duration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "timeout" used for durations so far?
And I'm not sure, if we name the timestamps special, but I maybe wrong.

Copy link
Contributor

@sbernard31 sbernard31 Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right we do nothing special until now.
It's just I didn't get the meaning of this variable at first sight and I tell myself that maybe naming could help.
(not a strong opinion)

Maybe javadoc is enough ? (not strong opinion too)

* Usually a client handshake removes the session from the connection store
* with the start. Probing removes the session only with the first data
* received back.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding the link to your #1157 (comment) here too

@@ -233,6 +239,7 @@
protected int statesIndex;
protected HandshakeState[] states;

protected boolean probe = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding all the "probe stuff" in Handshaker, it seems to me that this is only used in ResumingClientHandshaker ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you search for the usage, you should find, that it's mainly used in places, where a general Handshaker is available. To add dummy implementations in the base and override them in the ResumingClientHandshaker seems for me to make it more complex. But possible ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I saw that and "dummy implementations" was what I have in mind.
because this avoid to jump from class to class to understand the idea when reviewing (and so understanding the code)
(but this is not a strong opinion)

connection.resetSession();
Handshaker previous = connection.getOngoingHandshake();
if (probing) {
connection.setResumptionRequired(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the end of the week so maybe I'm a bit tired, but the code seems hard to read several line above we just connection.setResumptionRequired(true) if probing is true and now we set it to false because probing is true

And until now I believed that probing was only for abbreviated handshake (so resumption = true) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, the connection.setResumptionRequired(true) was introduced to trigger the later
if (connection.isAutoResumptionRequired(timeout)). That was implemented to support message specific handshake behaviour.

if (probing) {
	connection.setResumptionRequired(false);
} else {
	connection.resetSession();
}

Here the setResumptionRequired(false) in case of probing replaces the resetSession(), which also includes that. The difference is to keep the session.

@boaks
Copy link
Contributor Author

boaks commented Dec 6, 2019

The new isExpired state in handshaker in not so clear to me

That happens on android in "deep sleep". The timers are not cancelled, they are just paused. On wakeup, even after some minutes, they continue. That extension therefore holds them up to try to continue a stale handshake.

@boaks
Copy link
Contributor Author

boaks commented Dec 9, 2019

I addressed some of the comments by a redesign of sendMessage.

See PR #1161 and PR #1160 as partial preparation for this PR.

@boaks boaks force-pushed the add_handshake_probe branch from e1c840f to b9db9f1 Compare December 9, 2019 21:05
@sbernard31
Copy link
Contributor

That happens on android in "deep sleep". The timers are not cancelled, they are just paused. On wakeup, even after some minutes, they continue. That extension therefore holds them up to try to continue a stale handshake.

I'm not sure I get it 🤔.
This means that a java timer will not wake up the device from deep sleep ?

@boaks
Copy link
Contributor Author

boaks commented Dec 10, 2019

I'm not sure I get it thinking.
This means that a java timer will not wake up the device from deep sleep ?

Yes, if android pushs the app in the background, android wakes it up on a conditional scheduled task (here, about every 5 minutes, when connectivity is available). In the sleep time (e.g. 4 minutes) System.nanoTime() is not running and so all timer and jobs of the java scheduler are also not updated.
I also work on the isolation of the expire into a own PR.
But after more intensive tests I found, that it will take some more time to work.
Currently I would appreciate to switch to the x509 certificate stuff (PR #1146), just because that seems to be clearer.

@sbernard31
Copy link
Contributor

Does it means you want I review #1146 ?

@boaks
Copy link
Contributor Author

boaks commented Dec 11, 2019

Does it means you want I review #1146 ?

Yes, since yesterday evening, I think, that will make more sense for now.

@boaks boaks force-pushed the add_handshake_probe branch 7 times, most recently from 6416366 to d934312 Compare December 16, 2019 18:48
@boaks boaks force-pushed the add_handshake_probe branch from d934312 to 91f4841 Compare December 20, 2019 15:53
@boaks boaks force-pushed the add_handshake_probe branch from 91f4841 to 8d23e72 Compare January 2, 2020 18:43
Signed-off-by: Achim Kraus <achim.kraus@bosch-si.com>
@boaks boaks force-pushed the add_handshake_probe branch from 8d23e72 to dabda47 Compare January 6, 2020 07:55
@boaks
Copy link
Contributor Author

boaks commented Jan 7, 2020

In the meantime I fixed my issues.

// on sending, abort, keep connection for new handshake
handshaker.handshakeAborted(new Exception("handshake already expired!"));
} else if (handshaker.isProbing()) {
if (checkOutboundEndpointContext(message, null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not linked to this PR, but maybe a method with a clearer name and without null param could make this clearer.

if (probing) {
// Only reset the resumption trigger, but keep the session for now
// the session will be reseted with the first received data
connection.setResumptionRequired(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can give some more hints, what is hard to get?

The intention of probing is documented in the javadoc of ResumingClientHandshaker.

if (probing) {
	// Only reset the resumption trigger, but keep the session for now
	// the session will be reseted with the first received data
	connection.setResumptionRequired(false);
} else {
	connection.resetSession();
}

And in processRecord ("the session will be reseted with the first received data"):

if (handshaker != null && handshaker.isProbing()) {
	// received record, probe successful
	if (connection.hasEstablishedSession()) {
		connectionStore.removeFromEstablishedSessions(connection.getEstablishedSession(), connection);
	}
	connection.resetSession();
	handshaker.resetProbing();
	LOGGER.debug("handshake probe successful {}", connection.getPeerAddress());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to be clearer.

I saw the documentation in ResumingClientHandshaker 👍

What I understand is that with probing mode, we try to do a handshake and if it failed we didn't remove the original/previous connection. (because maybe we lost connectivity but not connection)

What I didn't get is the line : connection.setResumptionRequired(false);
I knew there is a comment just above, but I didn't get why using probing means "no resumption required"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, no it requires one resumption, but this one is started in the lines afterwards.
Reset that mark, either by connection.setResumptionRequired(false) or
connection.resetSession(), suppresses further resumptions without new trigger.
(see javadoc of connection.resetSession()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I finally get it.
We will initiate a handshake and so in case where resumption was required, we currently take it into account and so we go back in a "normal state"
Generally we go back to this normal state/"no resumption required" using resetSession()
But here we want to keep the session and so we just note that resumption is taking into account.

@@ -83,11 +117,12 @@
* if session, recordLayer or config is <code>null</code>
*/
public ResumingClientHandshaker(DTLSSession session, RecordLayer recordLayer, Connection connection,
DtlsConnectorConfig config, int maxTransmissionUnit) {
DtlsConnectorConfig config, int maxTransmissionUnit, boolean probe) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[API break] It would be easy to not break.

Copy link
Contributor Author

@boaks boaks Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I even don't want someone encourage to create nor implement a own Handshaker.

For me this is one the strange points of revapi:
A public interface may be used for two use-cases:

  • as callback interface, where it is intended to provide own implementations. Such interfaces can't be modified without major release. To add functionality requires additional interfaces (e.g. CloseSupportingConnectionStore) and a merge of these with the next major version.
  • as "functionality expose interface", where it is not intended to provide a own implementation. Here it's very easy to add function without breaking the API.

revapi doesn't reflect this nor provide guidance to deal with. At least I didn't found such guidance. Starting issues in revapi will help out, but as I already also said, for me this "public API definition" has, amoung other tasks, not htat prio, so I don't want to spend more of my time in it. Feel free, to spend your own time. Maybe you start to use revapi for leshan, even if the usage there is more to gain best practice with that tool (or @sophokles73 for hono).

As I already wrote "sometimes", without prior preparing the sources according agreed patterns and rules, it's in my opinion not too helpful, to apply strictness.
The Handshakergot exposed by a request of a user (checkout for the introduction of onInitializeHandshaker). At that time, no one concerned about unknown API design patterns. I also already asked you, to spend your time into investigate for the Handshaker. That is unfortunately time consuming, but would help more than insist in "not changing stuff at all". With agreed patterns and rules, the user's request at that time would have been implemented using such "expose API interfaces" and would have cause more changes, than the actually chosen approach. Fell free to spend your time in provide a proposal for that.

Please don't spend your time in answers without express your opinion on the two interface use-cases and how to setup revapi accordingly.

Copy link
Contributor

@sbernard31 sbernard31 Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uups I didn't expect that you take too much time to answer to this remark. As I said in our private discussion I didn't fight for this, I just notice it. (But maybe you didn't read this and so you could be irritated by my [API break] remarks, sorry about that)

About the 2 interface uses cases, I agree about those 2 ways to use interface.
Even if maybe, for the expose interface way, using a class (facade) to do the job is maybe better. So adding method is not a problem anymore.
With java8 default implementation is a solution too.
So I would say there is many way to handle this and available solution are not the same depending of the situation. That's why I think we can not agreed about patterns or rules before to really start to do the job.

My vision when revapi raise an issue :

  • if there is a simple way to handle it without exception we go with it
  • if there is no solution or the solution is too ugly or complex, we eventually create an exception if this make sense.
    In a general way, we could also discuss with revapi developer as he probably already think about that and could be interested by our feedback.

Maybe you start to use revapi for leshan

I will as soon as I release the 1.0.0. But we plan to consider all the public/protected stuff as part of the API. I plan 1.0.x release but I don't know if there will be a 1.1.x. The 2.0.0 will probably target LWM2M 1.1 and so will come with API break.
So I'm not sure this would help.

I also already asked you, to spend your time into investigate for the Handshaker

I already answer about that (still in the private mail) with the interface pattern way which work but taking into account your remark about extensibility, create a class (facade) is maybe better.

That is unfortunately time consuming, but would help more than insist in "not changing stuff at all".

Here this is a "straw man" of my position, anyway I suppose this remark is the result of irritability generated by the fact you didn't read my last email.

@boaks boaks merged commit 8bfa6a9 into eclipse-californium:master Jan 8, 2020
@boaks boaks deleted the add_handshake_probe branch February 12, 2020 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants