Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delay() refactoring #6212

Closed
wants to merge 12 commits into from
Closed

delay() refactoring #6212

wants to merge 12 commits into from

Conversation

emelianov
Copy link
Contributor

Refactoring delay to allow to run esp_schedule() and esp_yield()
periodically (not only on delay ends).
This will be helpfull in implementation of Ethernet dirver.

Refactoring delay to allow to run esp_schedule() and esp_yield()
periodically (not only on delay ends).
This will be helpfull in implementation of Ethernet dirver.
@d-a-v
Copy link
Collaborator

d-a-v commented Jun 20, 2019

Per this detailed explanation, delay() is sometimes supposed to be interrupted by esp_schedule() before its end (like the timeout in ClientContext).

This is used in some places in the core, so we also need to

  • rename current delay() to interruptible_delay()
  • identify the delays() supposed to be interrupted and use interruptible_delay() instead
  • be aware of all potential side effects

However this PR is very nice for recurrent scheduled functions to be called when some libraries or sketches call delay(some-long-delay);

(edit: link to ethernet thread)

@d-a-v
Copy link
Collaborator

d-a-v commented Jun 24, 2019

This delay is supposed to be interrupted by esp_schedule() from wifi_dns_found_callback():

@devyte devyte self-requested a review July 5, 2019 03:37
Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there really no other delays that are meant to be interrupted anywhere in the code? I thought some of the TCP timeouts would be like that (WiFiClient or ClientContext etc)

cores/esp8266/core_esp8266_wiring.cpp Outdated Show resolved Hide resolved
@emelianov
Copy link
Contributor Author

Other delays (in ClientContext) updated during #6213

@dok-net
Copy link
Contributor

dok-net commented Nov 14, 2019

@devyte Please, with all due respect, this PR right now is a horrible idea from my POV, given that it would, depending on how exactly it's fixed, either terribly break or at least work against the (cooperative multi-loop() multitasking/multithreading library CoopTask](https://github.com/dok-net/CoopTask).

I cannot be sure to claim that CoopTask is sufficiently evolved to handle what #6213 needs (believing it does so fine is another thing ;-) ), but letting other code run during delay()s is precisely one of the use cases for cooperative multitasking. I'm sorry for the pitch, but you can probably empathize that I'd rather have my implemention get more use rather than being locked out of the ESP8266 core libraries by ongoing work in the opposite direction. An alternate delay-like delay_interruptible() must not happen, please.

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 14, 2019

@dok-net interruptible_delay() can be a weak function that you'd override in your environment.

@dok-net
Copy link
Contributor

dok-net commented Nov 14, 2019

@d-a-v @devyte I think it's not as simple as that, because it's not the delays, but the breaking from the delays that is troubling:

libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp:657:    esp_schedule(); // resume the hostByName function
libraries/ESP8266WiFi/src/ESP8266WiFiScan.cpp:326:        esp_schedule();
libraries/ESP8266WiFi/src/ESP8266WiFiSTA-WPS.cpp:110:    esp_schedule(); // resume the beginWPSConfig function
libraries/ESP8266WiFi/src/include/ClientContext.h:438:            esp_schedule(); // break current delay()
libraries/ESP8266WiFi/src/include/ClientContext.h:539:            esp_schedule(); // break current delay()
libraries/ESP8266WiFi/src/include/ClientContext.h:615:            esp_schedule(); // break current delay()

Once there are multiple concurrent delays in multitasking, the concept of "current delay()" no longer applies. Add on top that the esp_scheduler follows only a minimal 2 task design, sys and cont, and there is no off-the-shelf solution to this problem.
Right now,

// Following delay will be interrupted by connect callback
for (decltype(_timeout_ms) i = 0; _connect_pending && i < _timeout_ms; i++) {
    // Give scheduled functions a chance to run (e.g. Ethernet uses recurrent)
    delay(1);
}

would work without the esp_schedule() calls to cancel the 1ms delay immediately, just using _connect_pending = false; and letting the remaining part of the 1ms expire.
Luckily, also right now, it seems to me that esp_schedule() is not causing harm regardless that cont_yield() is overwritten by CoopTask. To be continued...

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 14, 2019

I think you don't understand @dok-net

We can modify the delay(1) loop that you cite above to call a function that does the same job, and make it weak.

So you can redefine it with whatever you think is good for your library. You know we can adapt, and we often do, so every use case has its chance to live and work.

We can do that for both function delay and delay_interruptible, both can be weak. So you don't need to be so much concerned about that. Ethernet is not yet merged and this PR will probably happen afterwards.

@dok-net
Copy link
Contributor

dok-net commented Nov 14, 2019

@d-a-v Each time there is code that expects esp_schedule() a.k.a. ets_post() to cause a delay() to return prematurely, weak bindings of delay() and friends have no effect. esp_schedule() can't know which of many concurrent delays needs to be shortcut, and the whole work-stealing multitasking breaks down :-(
The loops in place right now that check a flag, and used delay(1) still work well, yes, at the expense of delaying up to 1ms even if the event handler notified that the condition is signaled. I am hoping that this is an acceptable delay, in comparison to the whole lot of user code that can execute in a multitasking setting during 1ms, even on ESP3266. I've measured cycle times under multitasking load (webserver, blinking LED, button reading, some text diagnostics to USB serial and a particular matter sensor on SoftwareSerial) of under 44 microseconds, if there are no uncooperative long-running delayMicrosecond() calls.
In an ideal world, we'd just port the ESP8266 two-tasking to RIOT OS or CoopTasks and use the task and ISR synchronization of those :-) In the meantime, please understand I'm trying to save what I am sure is a much-requested and very useful feature.

@dok-net
Copy link
Contributor

dok-net commented Nov 15, 2019

I'd like to point out that the changes in this PR are contradictory and there's a lack of separation of concern. The change in ESP8266WiFiGeneric.cpp has nothing to do with the delay()/interruptable_delay() changes, as the use of delay(1) is simply made a bit less efficient due to the oneShotMs loop guard.

Also it's interruptible (with an I).

@earlephilhower earlephilhower added the merge-conflict PR has a merge conflict that needs manual correction label Feb 26, 2020
@infrafast
Copy link

infrafast commented Dec 11, 2020

Dear all, I am in early stage to use ethernet driver in my esp8266 and it appears that I faced some issue with the name resolution which takes much longer than via WIFI. Seems that the delay used in the "hostByName" function could be one of the cause. I propose to change it to a non blocking delay using the following patch:

diff --git a/libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp
b/libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cppba07e07 100644
+++ b/libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp
@@ -632,7 +632,8 @@ int ESP8266WiFiGenericClass::hostByName(const char*
aHostname, IPAddress& aResul
          aResult = IPAddress(&addr);
      } else if(err == ERR_INPROGRESS) {
          _dns_lookup_pending = true;
-        delay(timeout_ms);
+        for (int i = 0; i < timeout_ms && !aResult.isSet(); i++)
+            delay(1);
          // will resume on timeout or when wifi_dns_found_callback fires
          _dns_lookup_pending = false;
          // will return here when dns_found_callback fires

@d-a-v d-a-v added this to the 3.0.1 milestone Jun 15, 2021
@d-a-v d-a-v self-assigned this Jun 15, 2021
@d-a-v d-a-v modified the milestones: 3.0.1, 3.1 Jun 16, 2021
@MassiPi
Copy link

MassiPi commented Jul 20, 2021

Dear all, I am in early stage to use ethernet driver in my esp8266 and it appears that I faced some issue with the name resolution which takes much longer than via WIFI. Seems that the delay used in the "hostByName" function could be one of the cause. I propose to change it to a non blocking delay using the following patch:

Hello @d-a-v ,
sad to see this specific modification was delayed to 3.1, since this totally breaks the ethernet usability..
Is it possible to expect this pull request to be implemented before 3.1 (which now is 3%), or should i keep manually modifying locally the source files?
Thanks a lot

dok-net added a commit to dok-net/arduino-esp8266 that referenced this pull request Jul 20, 2021
@d-a-v
Copy link
Collaborator

d-a-v commented Jul 26, 2021

@MassiPi I'm afraid we'll have to wait a bit more,
but you are right, now that new/experimental ethernet drivers are in the core, these delays have to be changed.
3.0.2 should be - I hope - the last regression-bugfix release after 3.0.0.

dok-net added a commit to dok-net/arduino-esp8266 that referenced this pull request Oct 6, 2021
mcspr pushed a commit that referenced this pull request Oct 16, 2021
esp_yield() now also calls esp_schedule(), original esp_yield() function renamed to esp_suspend().

Don't use delay(0) in the Core internals, libraries and examples. Use yield() when the code is
supposed to be called from CONT, use esp_yield() when the code can be called from either CONT or SYS.
Clean-up esp_yield() and esp_schedule() declarations across the code and use coredecls.h instead.

Implement helper functions for libraries that were previously using esp_yield(), esp_schedule() and
esp_delay() directly to wait for certain SYS context tasks to complete. Correctly use esp_delay()
for timeouts, make sure scheduled functions have a chance to run (e.g. LwIP_Ethernet uses recurrent)

Related issues:
- #6107 - discussion about the esp_yield() and esp_delay() usage in ClientContext
- #6212 - discussion about replacing delay() with a blocking loop
- #6680 - pull request introducing LwIP-based Ethernet
- #7146 - discussion that originated UART code changes
- #7969 - proposal to remove delay(0) from the example code
- #8291 - discussion related to the run_scheduled_recurrent_functions() usage in LwIP Ethernet
- #8317 - yieldUntil() implementation, similar to the esp_delay() overload with a timeout and a 0 interval
@mcspr
Copy link
Collaborator

mcspr commented Oct 19, 2021

Closing via #7148

@mcspr mcspr closed this Oct 19, 2021
@mcspr mcspr removed this from the 3.1 milestone Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-conflict PR has a merge conflict that needs manual correction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants