Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP -Audio fastpath #40

Merged
merged 114 commits into from
Jul 13, 2023
Merged

WiP -Audio fastpath #40

merged 114 commits into from
Jul 13, 2023

Conversation

netmindz
Copy link
Collaborator

@netmindz netmindz commented May 4, 2023

No description provided.

softhack007 and others added 9 commits April 30, 2023 18:37
led settings: added hint on max leds per pin
fx.h: faster FRAMERATE_FIXED
*  make local functions "static"
* use fast_ int types where possible
* use native min/max instead of MIN/MAX macros. Macros evaluate each parameter TWICE!!
* adding __attribute__((pure)) and __attribute((const)) to help the compiler optimize
* ws.cpp: reduce max "live leds" in fastpath mode
web response buffer size (corruption when websockets not used)

Aircoookie@432c583
- reduced filter strength, which also reduces delays
- increased I2S buffer count, and reduced I2S cycle time to recover from delays faster.
fastpath: reduced websocket frequency, to have more time for physically attached LEDs.

fingers crossed that this won't break anything....
use 4 RMT channels, then I2S#1, then RMT 5-8. This allows to have up to 5 "fast" LED pins.

"Even though the ESP32 has 8 channels of RMT hardware, using beyond 4 has shown to cause sending delays." (https://github.com/Makuna/NeoPixelBus/wiki/ESP32-NeoMethods#rmt)
* getpixelcolor: attribute pure - it reads memory, but does not write
* some optimizations for SEGMENT.blur() and SEGMENT.fadeToBlackBy()

* FX.c:pp remove double calls to blur() and fade_out()
* FX.cpp: SEGMENT.setUpLeds() added to effects, to enable LED buffering (safe some time because getPixelColor does not need to access NeopixelBus)

* a few other optimizations to safe time and avoid "expensive" operations
* set I2C bus speed to 400kHz (default is 100Khz)

* a few other small optimizations and tweaks
* pio: esp32 V4 builds use "patch5" toolchain version, which contains a few bugfixes especially for memory management.
Copy link
Collaborator Author

@netmindz netmindz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@softhack007 added a few questions and comments. Overall appears that most changes are just about higher frame rates in general than actually audio related

#define FFT_MIN_CYCLE 21 // minimum time before FFT task is repeated. Use with 22Khz sampling
#else
#define FFT_MIN_CYCLE 15 // reduce min time, to allow faster catch-up when I2S is lagging
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any downside to this change? "Compromise" for speed or just masking changes behind flag until we have done enough testing to know if this should just be right value for all uses?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the downside is that the FFT task will block the core for longer (2ms in "standard", vs 7ms in "fastpath").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are using the second core, is that an issue? What else (if anything) is using that core?

@@ -1129,8 +1133,13 @@ class AudioReactive : public Usermod {
{
float sampleAdj; // Gain adjusted sample value
float tmpSample; // An interim sample variable used for calculatioins.
#ifdef WLEDMM_FASTPATH
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any downside to this change? "Compromise" for speed or just masking changes behind flag until we have done enough testing to know if this should just be right value for all uses?

@@ -2243,7 +2243,7 @@ uint16_t mode_colortwinkle() {
}
}
}
return FRAMETIME_FIXED;
return FRAMETIME_FIXED_SLOW;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have all effects that use this constant been updated or just been changing one by one as you test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as I tested it. Some effects seem to be meant to be slow, like Halloween eyes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they not be time-based rather than FPS based?

@@ -57,10 +57,21 @@
#endif

/* Not used in all effects yet */
#if defined(ARDUINO_ARCH_ESP32) && defined(WLEDMM_FASTPATH) // WLEDMM go faster on ESP32
#define WLED_FPS 120
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value that you can then update in the config? Do we have any way to "force" an update to this value so users get the benefit without needing to read release notes to know to change the value?

Copy link
Collaborator

@softhack007 softhack007 May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a very problematic constant ... it seems like just the the default value, however changing this also impacts some "internal clocking" code that may run faster - don't know how exactly that happens.

In general, yes this value should be part of the config. It looks like WLED-internal legacy - maybe older WLED versions only had a define, then later it became a user setting but the old constant still stayed in the code...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this was partly why I asked. I had a feeling it might have been a constant that became not-so-constant when gui option was added, but not 100% of the code changed to use the new runtime value

@@ -160,7 +160,7 @@ void IRAM_ATTR_YN WS2812FX::setPixelColorXY(int x, int y, uint32_t col) //WLEDMM
{
#ifndef WLED_DISABLE_2D
if (!isMatrix) return; // not a matrix set-up
uint16_t index = y * Segment::maxWidth + x;
uint_fast16_t index = y * Segment::maxWidth + x;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these fast changes be in separate branch so they can be pushed upstream too? Is there any downside to using fast? Was trying to read up about it but docs rather vague. Faster but possibly larger memory usage?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are faster, especially if you use the fast types for local vars. If used in global data structures, they might make data incompatible on binary level, as some fast types are actually the next bigger type (like fast16 = int which is 32bit).

The downside of "uint8_t" and friends is that the compiler will insert extra instruction to guarantee correct overflow behaviour. So when overflow is not a concern, "uint_fast16_t" is an advantage over "uint16_t".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to check if I've understood - use fast for local vars where you want speed, but do not need overflow, but might use more ram as it might actually be 32bit?

@@ -220,7 +220,7 @@ CRGBPalette16 &Segment::loadPalette(CRGBPalette16 &targetPalette, uint8_t pal) {
static unsigned long _lastPaletteChange = 0; // perhaps it should be per segment
static CRGBPalette16 randomPalette = CRGBPalette16(DEFAULT_COLOR);
static CRGBPalette16 prevRandomPalette = CRGBPalette16(CRGB(BLACK));
byte tcp[72];
byte tcp[76] = { 255 }; //WLEDMM: prevent out-of-range access in loadDynamicGradientPalette()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like more generic bugfix than fastpath related

Copy link
Collaborator

@softhack007 softhack007 May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is generic 👍 and upstream will start to argue about 4 bytes.

Copy link
Collaborator Author

@netmindz netmindz May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed due to the extra palettes I added?

wled00/FX_fcn.cpp Show resolved Hide resolved
#else
// ESP32 "audio_fastpath" - 8 RMT and 1 I2S channels. RMT 5-8 have sending delays, so use I2S#1 as 5th bus, before going for RMT 5-8
if (num > 8) return I_NONE;
if (num == 4) offset = 2; // use I2S channel 2 as 5th bus. Ladies and Gentlemen, _this_ is a dirty hack.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a hack? What are you trying to achieve and what don't you like about current change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it was something I did in a quick and dirty way, not thinking much about any side-effects.
In fact i've tested this change for some time already, AND it seems to be safe. It still "disrupts" the logic implemented in upstream - for some reason upstream uses I2S for the 9th and 10th LED pin, not sure if there are any downsides coming with the I2S driver.

For speed, I found that RMT5-8 are slow, so this change "injects" I2S#1 (free as we don't need it for audio in) directly after all faster RMT channels have been assigned.

@@ -88,7 +88,9 @@ void _overlayAnalogCountdown()
}

void handleOverlayDraw() {
#if !defined(WLEDMM_FASTPATH) // WLEDMM expensive operation, and most usermods don't draw overlays
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than disabling all compile time, is there a way to add a shortcut return based on any active usermods that actual use this?

This feels like one of the only "compromises" of the build flag

wled00/wled_server.cpp Show resolved Hide resolved
@softhack007
Copy link
Collaborator

@softhack007 added a few questions and comments. Overall appears that most changes are just about higher frame rates in general than actually audio related

Thanks :-) Yes true, its still incomplete. Audio part is coming over the weekend. Right now i've tried to get LEDs to react faster, so when we have less audio latency it will make a difference.

softhack007 and others added 8 commits May 4, 2023 17:09
this should move the main JSON doc into PSRAM on WROVER boards.
* free heap should increase by ~60kb
* PSRAM use should go up by the same amount.
* disables PSRAM fixes that are not necessary any more on "revision 3" boards.
small fix - wrong keyword
* FastLED 3.5.0 #master (up to 50% faster)
* All ESP32 env use NeoPixelBus 2.7.3 (slightly faster)
* Weather usermod: small update, as fastLED wants an explicit conversion from CRGB -> uint32_t
* weather usermod: avoid using palette color #0 = black
due to build errors on some esp32 envs.

Build Environments (esp32_16MB_M_eth)
In file included from wled00/bus_manager.cpp:9:0:
wled00/bus_wrapper.h: In static member function 'static void PolyBus::begin(void*, uint8_t, uint8_t*)':
wled00/bus_wrapper.h:187:61: error: 'DotStarEsp32DmaHspi5MhzMethod' was not declared in this scope
 #define B_HS_DOT_3 NeoPixelBrightnessBus<DotStarBgrFeature, DotStarEsp32DmaHspi5MhzMethod> //hardware HSPI with DMA (ESP32 only)
                                                             ^
wled00/bus_wrapper.h:300:37: note: in expansion of macro 'B_HS_DOT_3'
       case I_HS_DOT_3: (static_cast<B_HS_DOT_3*>(busPtr))->Begin(pins[1], -1, pins[0], -1); break;
                                     ^
wled00/bus_wrapper.h:187:90: error: template argument 2 is invalid
 #define B_HS_DOT_3 NeoPixelBrightnessBus<DotStarBgrFeature, DotStarEsp32DmaHspi5MhzMethod> //hardware HSPI with DMA (ESP32 only)
                                                                                          In file included from wled00/bus_manager.cpp:9:0:
wled00/bus_wrapper.h: In static member function 'static void PolyBus::begin(void*, uint8_t, uint8_t*)':
wled00/bus_wrapper.h:187:61: error: 'DotStarEsp32DmaHspi5MhzMethod' was not declared in this scope
 #define B_HS_DOT_3 NeoPixelBrightnessBus<DotStarBgrFeature, DotStarEsp32DmaHspi5MhzMethod> //hardware HSPI with DMA (ESP32 only)
                                                             ^
wled00/bus_wrapper.h:300:37: note: in expansion of macro 'B_HS_DOT_3'
       case I_HS_DOT_3: (static_cast<B_HS_DOT_3*>(busPtr))->Begin(pins[1], -1, pins[0], -1); break;
                                     ^
wled00/bus_wrapper.h:187:90: error: template argument 2 is invalid
 #define B_HS_DOT_3 NeoPixelBrightnessBus<DotStarBgrFeature, DotStarEsp32DmaHspi5MhzMethod> //hardware HSPI with DMA (ESP32 only)
platformio.ini Outdated
@@ -199,7 +199,8 @@ upload_speed = 115200
# ------------------------------------------------------------------------------
lib_compat_mode = strict
lib_deps =
fastled/FastLED @ 3.5.0
;fastled/FastLED @ 3.5.0
https://github.com/FastLED/FastLED.git#master ;; up to 50% faster
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it perhaps an idea to refer to a "known good" hash, so we get predictable builds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Makes sense 👍

platformio.ini Outdated
makuna/NeoPixelBus @ 2.6.9
${env.lib_deps}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering significant or just accidental change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually ordering is significant. It seems that the first library reference "wins".
This was changed recently in upstream, to ensure that the right library versions are pulled in.

@softhack007 softhack007 changed the title Audio fastpath WiP -Audio fastpath May 11, 2023
After some experiments, it seems that 256 is indeed the optimal buffer length.
softhack007 and others added 29 commits June 13, 2023 13:19
in some setups, it helps to use the I2S#1 driver for the LED pin with most LEDs.
* allow user to control rotation speed (c3 slider)
* preserve accuracy by performing division _after_ multiplication: " (i * speed) / 32", instead of " i * (speed / 32)"
* proper rounding of "map" results, for better visual appearance
* avoid division by zero in map() function
* added option to use original floating point code that features anti-aliasing.
Looks nice, but requires hardware floating point support (ESP32-S3, or "classic" ESP32 dual core).
* npm run build
* upstream fix for XSS vulnerability
* re-enabled old detector which does not detect beats. However something bad may still be better than nothing...

* fixed a typo in RipplePeak and Waterfall effect, which cased wrong configuration of the peak detector.

resolves #43
robustness improvement - parsepacket() will fail if the udp handler still has an active receive buffer.
* Avoid uint16 underflow in width() and height(): stop > start is possible, and means "inactive segment".

* use size_t for ledsrgbSize, _dataLen and _usedSegmentData -  uint16_t could overflow in some situations.

* try to catch attempts to allocate zero bytes (inactive segment => size 0)
* handling of stop = 0 when calculating sizes (avoid unsigned underflow)
* make sure groupLength() is never zero (to avoid div/0)
* gapmaps: check for "(gapSize > 0)" added.

not sure if all the checks are 100% needed, but they will improve robustness in corner cases.
* adding an optional parameter to setBrightness(). ApplyPostAdjustments() will only be called if `immediate=true`. Only ABL will use immediate=true, to ensure electrical safety of equipment.

This allows some optimizations of performance, as ApplyPostAdjustments() is time consuming.

* busses.setBrightness(bri) --> applied to all future pixels (fast, lossless)
* busses.setBrightness(bri, true) --> applied directly to all previously set pixels (slower, lossy)
first version still cased some flickering.
This de-optimization makes LEDs more stable.
this optimization avoids to apply brightness two times .

NeoPixelBusLg has already applied global brightness at sPC. Due to internal working of the Lg bus, its sufficient to only post-apply scaling, and set the new (scaled) brightness for the next frame.
properly handle width=0 OR height=0
it seems that SEGMENT.blur() is the main bottleneck for many 2D effects.

This change optimizes performance of the function:
* avoid to re-write unchanged pixels
* early exit when blur_amount == 0 (=nothing to do)
* use _fast_ types where possible

I've seen up to 20% speedup with this change.
Segments are created/deleted on-the-fly; it seems that "local leds" buffers did not follow properly.
* revise segment copy/move constructors
* de-optimize use of local leds (as they need to be re-allocated when segment size changes)
* minor stability improvements
* reverts global brightness in preview
* can NOT revert brightness reductions from auto brightness limiter.
(from upstream alt-buffer branch)
* prevent drawing into inactive segments
* robustness improvements for transitions
avoid string overflow when constructing filename
cleanup:
- version increase
- clean up debug messages
- minor fixes in platformio.ini
- fix compilation with -D WLED_DISABLE_2D

helping the compiler to optimize:
- a few more __attribute__((pure))
- fx.cpp marked a few local functions as "static"
@softhack007 softhack007 merged commit 9243b90 into mdev Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants