gcoap: Process CON responses #14178

chrysn · 2020-05-30T13:59:07Z

This generalizes the existing code for answering CoAP pings into general
message-layer responses. Such responses are now also sent as a reaction
to CON responses, which can otherwise follow the same code path as
existing other responses.

As a side effect, issues that would crop up when responding to odd empty
requests that have token length set are resolved.

Testing procedure

[edit: further down on #14178 (comment) ]

Issues/PRs references

Fixes #14169 (eventually; currently it only addresses the first parts).

chrysn · 2020-05-30T14:02:48Z

Note to self: There's a quite odd empty_as_response goto location sitting in the code; I presume that was the location where the response was originally envisioned to be sent, but as now we need to track some local state in the function anyway (which type to set), this is easier done in an if at the end, and the goto can go away.

chrysn · 2020-06-03T12:42:03Z

A nice side effect of this is that servers that take longer to create a response than they want to keep the gcoap thread waiting can now use gcoap_send_req to send their response as a CON. (Phrasing that in the documentation in a non-confusing way will be hard, and maybe we'd like to introduce a thin wrapper function just to make the docs easier).

This shows an open bug with the current PR (that CON requests without a response callback could have their memo freed immediately when an empty ACK arrives), but that's a missing optimization there anyway.

sys/net/application_layer/gcoap/gcoap.c

chrysn · 2020-06-29T11:56:54Z

Coming back to this, I think it'll be easiest not to cancel the timeout, but just to set a flag that "yes this was a CON but stop transmitting it's already ACKed". That would work around the race condition, and while this incurs the runtime penalty of having a few timers fire needlessly, it keeps the code in the timers smaller (set a flag vs. compute the total timeout and fiddle with timers).

Next steps on my side are addressing Cenk's comment and implementing that to replace the current WIP head.

chrysn · 2020-06-29T16:02:13Z

This should be reviewable now that I pushed an non-WIP final commit and the...

Testing procedure

As there are no delayed servers in gcoap yet this needs to be tested against an external application, for example aiocoap:

$ git clone https://github.com/chrysn/aiocoap
$ cd aiocoap
$ ./server.py

[edit: added]
As the /other/separate resource this will be tested on is rather large, you'll need to set CONFIG_GCOAP_PDU_BUF_SIZE to about 600. Otherwise, the eventual response CON will be silently dropped by gcoap as it can't receive it in parts, and thus the server will keep transmitting the CON responses. See also #14167.
[/edit]

Watch the network traffic on the bridge with Wireshark (filtering for CoAP) and run RIOT:

$ make -C examples/gcoap all term
> coap get -c fe80::176b:fd74:a58f:ff97%6 5683 /time
gcoap_cli: sending msg ID 14102, 11 bytes
> gcoap: response Success, code 2.05, 16 bytes
00000000  32  30  32  30  2D  30  36  2D  32  39  20  31  37  3A  35  35
> coap get -c fe80::176b:fd74:a58f:ff97%6 5683 /other/separate
gcoap_cli: sending msg ID 14103, 21 bytes
> gcoap: response Success, code 2.05, 189 bytes
00000000  54  68  72  65  65  20  72  69  6E  67  73  20  66  6F  72  20
...
>

The first response arrives immediately and shows as a CON and an ACK (before and after).

The second request shows as a CON GET (almost) immediately followed by an empty ACK. Then almost 3s later, a CON 2.05 and an empty ACK are shown.

Note that:

The response is correctly matched to the request
Gcoap sends an empty ACK for otherwise the server would keep sending
Gcoap does not send a second CON GET, even though the response takes longer than the default retransmission timeout of 2s. (The incoming ACK indicates that the request was received, and that the server takes responsibility for any further retransmission.)

It'd also be nice to test that if the final empty ACK is lost and the server sends another CON response, an RST is generated. I'm not perfectly sure it does right now, and either way I'd like to test it. Do we have provisions for automatically testing against external software, and in particular for introducing package loss at a defined pattern or sequence?

cgundogan · 2020-06-29T16:06:06Z

This should be reviewable now that I pushed an non-WIP final commit

nice, will have a look at it (:

cgundogan

Here is my first batch of comments. In general, the changes look good. IMO, we should find a better solution for handling the request timeout after receiving an empty response. I'll test the code now as outlined in ~~the PR description~~ #14178 (comment).

sys/net/application_layer/gcoap/gcoap.c

cgundogan · 2020-07-07T11:59:58Z

sys/net/application_layer/gcoap/gcoap.c

@@ -256,6 +278,12 @@ static void _on_resp_timeout(void *arg) {
 #endif
        event_timeout_set(&memo->resp_evt_tmout, timeout);

+        if (memo->state == GCOAP_MEMO_WAIT) {


hmm, handling the timeout this way is IMO suboptimal as it depends on the number of configured retransmissions (and how many of them are still left to fire, imagine we are in the last iteration when the empty response arrives). The random component of the backoff calculation also makes it quite indeterministic.

Another solution could be to disarm the timer in _cease_retransmission() and set a new one with a constant (or maybe slightly jittered) offset, like e.g. done with CONFIG_GCOAP_NON_TIMEOUT.

The issue with disarming timers is that that'd open up to a race condition described in #14390 -- that condition already exists, but I'd prefer not to add another case.

I've toyed around a bit with such solutions, but they were invariably more complex than just accepting that a timer fires a few more times than absolutely necessary without doing anything than decrement a counter.

Yes, there's some random component (and one might consider trimming that off the decrement-only case); the way it is phrased now, the CON timeout will happen just when it would have happened if there had not been any ACK. (Conceptually, the application and retransmission timeouts are different things anyway, but it's a practical choice widespread in implementations to conflate them. Thus, sticking with what would happen if there was no ACK is consistent.).

I mean you could just not add the random jitter in the memo->state == GCOAP_MEMO_WAIT case, if that is desirable.

Could, but I doubt it'd make it better. The backup calculation code is already hard to read as it was, I don't expect that skipping the jitter will make the ROM size shrink, and at runtime it'd save the random number generation of an unlikely event at cost of another jump.

When the general raciness is addressed, there'll be a code path that explicitly calculates the remaining timeout and that can then be done bypassing the jitter as it happens at a completely different point, but that's for #14390.

cgundogan · 2020-07-07T13:46:50Z

The second request shows as a CON GET (almost) immediately followed by an empty ACK. Then almost 3s later, a CON 2.05 and an empty ACK are shown.

I followed the test description in #14178 (comment). It seems to work properly on the RIOT side, I was a little bit confused about the unnecessary retransmission that originate from ./server.py. Is that a problem with aiocoap?

See RIOT-OS#14178 (comment) and code guidelines point about parentheses in complex statements.

chrysn · 2020-09-02T13:53:15Z

Hm, my own tests confirm what you've observed, that the client's ACK is not sent, and thus the server keeps retransmitting.

Ah, found it: The typical response of an aiocoap server to the test resource is too long to fit in the recv. That is already tracked in #14167 ; I'm updating the testing instructions to increase CONFIG_GCOAP_PDU_BUF_SIZE.

chrysn · 2020-10-14T15:26:02Z

@cgundogan, is there anything you'd like to have addressed before I rebase this onto the current master? It's been sitting a while, I'd like to get the proxy parts upstreamed, and this is part of the ground work.

benpicco · 2020-10-20T22:19:09Z

I'd say just rebase 😉

chrysn · 2020-10-20T22:57:58Z

Rebased and tested again here. Ran into #14167 again but at least this time I was faster seeing it :-)

benpicco

Is the WIP label still valid?

sys/net/application_layer/gcoap/gcoap.c

benpicco · 2020-10-21T22:01:05Z

sys/net/application_layer/gcoap/gcoap.c

+ * The current implementation does not touch the timers, but merely sets the
+ * memo's state to GCOAP_MEMO_WAIT. This approach needs less complex code at
+ * the cost of the remaining `send_limit` timers firing and some memory not
+ * being freed until the actual response arrives.


What happens if the response never arrives?

The same as if no ACK were sent: When the retransmission timer fires but the retransmission counter has reached zero, the application is informed of the failure. By not touching the rest of the mechanism, things stay exactly as it would have been.

benpicco

I think this is good.
The CON timeout being jittery should not be an issue and if it keeps the code simpler that is preferable.

This generalizes the existing code for answering CoAP pings into general message-layer responses. Such responses are now also sent as a reaction to CON responses, which can otherwise follow the same code path as existing other responses. As a side effect, issues that would crop up when responding to odd empty requests that have token length set are resolved. Contributes-To: RIOT-OS#14169

Simplify the code path and give consistent debug messages.

The actual implementation will follow in a separate commit, this does the groundwork and sets the intention.

This introduces an additional state to the COAP_MEMO_* series to avoid enlarging the memo struct needlessly. While they are documented publicly, practically only the COAP_MEMO_TIMEOUT and COAP_MEMO_RESPONSE are used in communication with the application, as a gcoap_request_memo_t is only handed out in that state.

chrysn added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Area: CoAP Area: Constrained Application Protocol implementations labels May 30, 2020

chrysn requested review from cgundogan, haukepetersen, miri64 and PeterKietzmann as code owners May 30, 2020 13:59

chrysn mentioned this pull request May 30, 2020

Gcoap does not handle separate responses #14169

Closed

cgundogan reviewed Jun 29, 2020

View reviewed changes

sys/net/application_layer/gcoap/gcoap.c Outdated Show resolved Hide resolved

chrysn mentioned this pull request Jun 29, 2020

gcoap: Suspected crosstalk between requests (possible NULL call) #14390

Open

chrysn force-pushed the gcoap-handle-con-response branch from 741d253 to 32e51c0 Compare June 29, 2020 15:50

This was referenced Jun 30, 2020

Gcoap can not send separate responses #14393

Open

gcoap: Send separate response #14395

Draft

gcoap: confirmable messaging plan #7192

Open

cgundogan requested changes Jul 7, 2020

View reviewed changes

chrysn added a commit to chrysn-pull-requests/RIOT that referenced this pull request Sep 2, 2020

fixup! gcoap: Prepare to cease retransmission

d3a218a

See RIOT-OS#14178 (comment) and code guidelines point about parentheses in complex statements.

chrysn added a commit to chrysn-pull-requests/RIOT that referenced this pull request Sep 2, 2020

fixup! gcoap: Prepare to cease retransmission

fb47e17

See RIOT-OS#14178 (comment) and code guidelines point about parentheses in complex statements.

chrysn force-pushed the gcoap-handle-con-response branch from d3a218a to 094e9eb Compare September 2, 2020 13:20

chrysn force-pushed the gcoap-handle-con-response branch from 094e9eb to 9eaa959 Compare October 20, 2020 22:48

chrysn added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 21, 2020

benpicco reviewed Oct 21, 2020

View reviewed changes

chrysn removed the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Oct 22, 2020

benpicco approved these changes Oct 22, 2020

View reviewed changes

chrysn added 4 commits October 22, 2020 19:08

gcoap: Cleanup around empty responses

0a2f976

Simplify the code path and give consistent debug messages.

gcoap: Prepare to cease retransmission

f32d758

The actual implementation will follow in a separate commit, this does the groundwork and sets the intention.

chrysn force-pushed the gcoap-handle-con-response branch from d2011c8 to 462f886 Compare October 22, 2020 17:09

cgundogan approved these changes Oct 23, 2020

View reviewed changes

benpicco merged commit b5cd19e into RIOT-OS:master Oct 23, 2020

chrysn deleted the gcoap-handle-con-response branch October 23, 2020 13:24

Jnae mentioned this pull request Jun 24, 2021

sys/net/gcoap: allow seperate response #16578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcoap: Process CON responses #14178

gcoap: Process CON responses #14178

chrysn commented May 30, 2020 •

edited

Loading

chrysn commented May 30, 2020

chrysn commented Jun 3, 2020

chrysn commented Jun 29, 2020

chrysn commented Jun 29, 2020 •

edited

Loading

cgundogan commented Jun 29, 2020

cgundogan left a comment •

edited

Loading

cgundogan Jul 7, 2020

chrysn Sep 2, 2020

benpicco Oct 22, 2020

chrysn Oct 22, 2020

cgundogan commented Jul 7, 2020

chrysn commented Sep 2, 2020

chrysn commented Oct 14, 2020

benpicco commented Oct 20, 2020

chrysn commented Oct 20, 2020

benpicco left a comment

benpicco Oct 21, 2020

chrysn Oct 22, 2020

benpicco left a comment

gcoap: Process CON responses #14178

gcoap: Process CON responses #14178

Conversation

chrysn commented May 30, 2020 • edited Loading

Testing procedure

Issues/PRs references

chrysn commented May 30, 2020

chrysn commented Jun 3, 2020

chrysn commented Jun 29, 2020

chrysn commented Jun 29, 2020 • edited Loading

Testing procedure

cgundogan commented Jun 29, 2020

cgundogan left a comment • edited Loading

Choose a reason for hiding this comment

cgundogan Jul 7, 2020

Choose a reason for hiding this comment

chrysn Sep 2, 2020

Choose a reason for hiding this comment

benpicco Oct 22, 2020

Choose a reason for hiding this comment

chrysn Oct 22, 2020

Choose a reason for hiding this comment

cgundogan commented Jul 7, 2020

chrysn commented Sep 2, 2020

chrysn commented Oct 14, 2020

benpicco commented Oct 20, 2020

chrysn commented Oct 20, 2020

benpicco left a comment

Choose a reason for hiding this comment

benpicco Oct 21, 2020

Choose a reason for hiding this comment

chrysn Oct 22, 2020

Choose a reason for hiding this comment

benpicco left a comment

Choose a reason for hiding this comment

chrysn commented May 30, 2020 •

edited

Loading

chrysn commented Jun 29, 2020 •

edited

Loading

cgundogan left a comment •

edited

Loading