Rewrote library to fix several issues #17

lord-ne · 2021-08-25T03:47:53Z

Fixes the issues #13 and #15. Supersedes the pull requests #12, #14, and #16.

lord-ne · 2021-08-25T03:48:17Z

Marked as draft because I haven't had a chance to test this yet.

lord-ne · 2021-08-25T14:20:20Z

I decided it wasn't a good idea to make a breaking API change on such an inactive repository, so I put back the u32 delay function, and marked it as deprecated.

lord-ne · 2021-08-25T15:27:07Z

Inspecting the assembly output, both long and short delays work well. Constant delays (known at compile time) compile with negligible overhead. Tested with both speed (opt-level = 3) and size (opt-level = "z") optimizations.

Patryk27 · 2021-09-17T15:49:29Z

Hi, thanks for this merge request - I've managed to run the code, but seems like on my atmega328p the delays are 10x as long as they are supposed to be: e.g. delay_ms(100); waits for one second.

I'm using AVR_CPU_FREQUENCY_HZ=20000000 and dropping one zero seems to help, but it kinda feels the value is alright and the calculation is off somewhere.

lord-ne · 2021-09-17T17:59:11Z

I double-checked the math and it looks right to me, and it seemed to work correctly on my 328p. I'm not sure what could be causing this issue. Can you post your code?

Patryk27 · 2021-09-17T20:20:58Z

Sure - my code is nothing more than:

#![feature(llvm_asm)]
#![no_std]
#![no_main]

use ruduino::cores::current::port;
use ruduino::Pin;

#[no_mangle]
pub extern "C" fn main() {
    port::B0::set_output();

    loop {
        port::B0::set_high();
        avr_delay::delay_ms(1000); // in this case it's set to 1s, but in reality takes 10s 
        port::B0::set_low();
        avr_delay::delay_ms(1000);
    }
}

... compiled using:

AVR_CPU_FREQUENCY_HZ=20000000 cargo build --release

... with .cargo/config:

[build]
target = "avr-atmega328p.json"

[unstable]
build-std = ["core"]

Maybe 20 MHz is too much for AVR_CPU_FREQUENCY_HZ? For reference, atmega-hal's Delay seems to be working correctly (also on 20 MHz).

lord-ne · 2021-09-17T20:46:30Z

20MHz shouldn't be causing any overflow. Maybe this is related to the issue with avr_config where it doesn't rebuild when the AVR_CPU_FREQUENCY_HZ environment variable changes?

Patryk27 · 2021-09-18T09:41:33Z

Yeah, I've read about that issue and I've done like a dozen of cargo clean-s to be sure I'm not running stale code 😄 Anyways, I don't have my 328p at hand now - I'll try to re-check when I have some free time; if everything works correctly on your side, then I might've just fused my processor incorrectly (I guess accidentally setting it to use its internal 8 MHz clock could yield similar results).

stappersg · 2022-05-21T08:30:51Z

For what it worth:

Fix build: Use the new rust asm! available in rustc 1.63.0-nightly #18 has been merged
next work is on cannot find macro llvm_asm in this scope blink#38

bombela

Reviewing as requested.

bombela · 2022-06-08T15:59:56Z

src/lib.rs

-    // microseconds
-    let us = ms * 1000;
-    delay_us(us);
+    let ticks: u64 = (u64::from(avr_config::CPU_FREQUENCY_HZ) * u64::from(ms)) / 4_000;


u64 is very costly on AVR. The hardware can only do 8 bits addition, subtraction and multiplication. It can also do 8 bits divide par two. Everything else will be implemented in software with a tiny bit of help from the hardware for carry. So here, we have a u64 multiplication and division. This means that it will require a lot instructions, taking precious flash space. And it will be very slow at runtime.

A nice summary of the instruction set: https://en.wikipedia.org/wiki/Atmel_AVR_instruction_set
(The official doc is the absolute reference, but from a cursory look, it seems to be almost all on the wikipedia page).

avr_config::CPU_FREQUENCY_HZ is a const u32.

So the challenge is to find a way to compute as much as possible in constant expression at compile time. And end up with the least amount of arithmetic at runtime. And most of it around 8 bits.

In my code, I have basically adapted the reference implementation in C. Which cost 16 bits comparison, subtractions and multiplications. The code follows.

/// Works for 4 KHz >= CPU freqency <= 262.14 MHz. /// Maximum 65_535 ms. pub fn busy_loop_ms(ms: u16) { const ITER_PER_MS: u16 = (crate::CPU_FREQUENCY_HZ / 4000) as u16; const MAX_MS_PER_LOOP: u16 = u16::MAX / ITER_PER_MS; const ITERS_PER_MAX_MS: u16 = MAX_MS_PER_LOOP * ITER_PER_MS; let mut ms: u16 = ms; while ms > MAX_MS_PER_LOOP { busy_loop_4_cycles(ITERS_PER_MAX_MS); ms -= MAX_MS_PER_LOOP; } if ms > 0 { busy_loop_4_cycles(ms * ITER_PER_MS); } }

It "feels" like it might be possible to do better still. Note that if the delay value is known at compile time (like delay(150)), and the code is successfully inlined, then most of the operations will be done at compile time. But you cannot count on this without looking at the assembly output and careful tuning of the application code.

I was counting on everything being inlined at compile time. The AVR C implementation that I was looking at uses floats, which are ridiculously slow when emulated at runtime. Since that implementation was depending on compile time evaluation, I felt that it was fine to also do that.

But I agree your implementation is better. The only potential problem I see is that it can only delay for a max of about one minute.

I don't know if it depends on the compile time evaluation, more like it hopes it does it? My naive understanding is compile time evaluation wouldn't be guaranteed in the C version.

The instruction set does have some float multiplication, but not float divisions. I don't know how costly floats are maybe its not that bad? This was written by Atmel, I would hope they know a thing or two about their CPUs.

Float would have no minute limit while retaining high precision for small values. I will investigate and report here.

@bombela
This is from <util/delay.h> (the file defining _delay_ms and _delay_us) included with Microchip Studio:

The functions in this header file are wrappers around the basic busy-wait functions from <util/delay_basic.h>. They are meant as convenience functions where actual time values can be specified rather than a number of cycles to wait for. The idea behind is that compile-time constant expressions will be eliminated by compiler optimization so floating-point expressions can be used to calculate the number of delay cycles needed based on the CPU frequency passed by the macro F_CPU.

In order for these functions to work as intended, compiler optimizations must be enabled, and the delay time must be an expression that is a known constant at compile-time. If these requirements are not met, the resulting delay will be much longer (and basically unpredictable), and applications that otherwise do not use floating-point calculations will experience severe code bloat by the floating-point library routines linked into the application.

Since AVR doesn't have hardware division at all, I think it's basically impossible to make delay_ms and delay_us work with values that aren't known at compile time without a ton of overhead. At the very least, I'd say it isn't a priority for this crate.

The division is not needed at runtime in my example. But it still requires 16bit substraction and multiplication.

But note that even if you call delay with a constant value. The compiler might still move the computation at runtime to reduce code bloat. I encountered this on my last project (attiny24a, 2048 bytes of flash). I used delays to output patterns on a pin, which helped me synchronize debugging on an oscilloscope. The trick was to use only two different delay constants. The compiler was happy to output only two functions.

example, with opt-level=z:

#[no_mangle] pub unsafe extern "C" fn main() { busy_loop_ms(66); busy_loop_ms(67); busy_loop_ms(68); loop {} }

Produces:

main: ldi r24, 66 ldi r25, 0 rcall _ZN12attiny24a_uc3mcu5delay12busy_loop_ms17hdb07dc8da5567d03E ldi r24, 67 ldi r25, 0 rcall _ZN12attiny24a_uc3mcu5delay12busy_loop_ms17hdb07dc8da5567d03E ldi r24, 68 ldi r25, 0 rcall _ZN12attiny24a_uc3mcu5delay12busy_loop_ms17hdb07dc8da5567d03E .LBB3_1: rjmp .LBB3_1 _ZN12attiny24a_uc3mcu5delay12busy_loop_ms17hdb07dc8da5567d03E: ldi r18, 0 ldi r19, 250 .LBB0_1: cpi r24, 33 cpc r25, r1 brlo .LBB0_3 movw r30, r18 ;APP .Ltmp0: sbiw r30, 1 brne .Ltmp0 ;NO_APP sbiw r24, 32 rjmp .LBB0_1 .LBB0_3: cpi r24, 0 cpc r25, r1 breq .LBB0_5 ldi r22, 208 ldi r23, 7 rcall __mulhi3 ;APP .Ltmp1: sbiw r24, 1 brne .Ltmp1 ;NO_APP .LBB0_5: ret

How are you getting that assembly output? I've been using avr-objdump on the .elf output file, and the formatting is much less nice

cargo rustc $(FLAGS) --target=avr-attiny24a.json -- --emit asm echo target/avr-attiny24a/release/deps/*.s

bombela · 2022-06-08T16:25:16Z

src/lib.rs

+/// * 'count' - a u16, the number of times to cycle the loop.
+#[inline(always)]
+#[allow(unused_variables, unused_mut, unused_assignments)]
+fn delay_loop_4_cycles(mut count: u16) {


I had to adapt my own code to the new asm macro recently. I found all the magical register naming in the Rust compiler source code at https://github.com/rust-lang/rust/blob/9ad5d82f822b3cb67637f11be2e65c5662b66ec0/compiler/rustc_target/src/asm/avr.rs.

The documentation of the asm! macro itself can be found there: https://doc.rust-lang.org/nightly/reference/inline-assembly.html.

Volatile is now the default. So I believe the following code should be the correct translation.
Even though we do not care of the final value of _count, we still must declare the register pair as both input and output. The hardware is mutating the register pair, and the compiler should know about this.

/// 4 cycles per iteration. #[inline(always)] pub fn busy_loop_4_cycles(count: u16) { let mut _count = count; // Quiet the linter. unsafe { asm!( "1: sbiw {regpair}, 1", "brne 1b", regpair = inout(reg_iw) _count, ) } }

@bombela please make a merge request for it.

stappersg · 2022-06-08T17:38:13Z

Fixes the issues #13 and #15. Supersedes the pull requests #12, #14, and #16.

That is no longer true due other "progress". The so called progress is a mix of

being blocked by a LLVM ERROR
switch from llvm_asm!() to asm!()
merging merge requests

I hope that there will be smaller Merge Requests that replace this large MR.

lord-ne · 2022-06-08T23:14:33Z

Okay, this is ready for review again. I basically just switched the llvm_asm! to the asm! that's currently in the main repo (here) and tidied up the code a bit.

It only compiles in release mode for some reason (so does the current main branch of the main repo), but that's an issue with compiling core for AVR, not an issue with this code.

@bombela's point about delays that aren't known at compile time still stands, but I think it isn't a priority right now (AVR libc doesn't do it either), we first need to get the crate in a working state.

lord-ne · 2022-06-09T03:08:10Z

I might try something with magic-number division (https://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html) to work with values not known at compile time. But that will definitely be a separate pull request

EDIT: It looks like this wouldn't be that helpful, since we would end up with 64x64->128 but multiplications, which would be very slow anyway.

bombela · 2022-06-09T03:33:24Z

This is fascinating. The math goes over my head sadly. On my side, I am trying to produce something similar to the intrinsic delay_cycles from gcc-avr. With asm! and const generics. I will make a pull request if I ever get something worthwhile.

bombela · 2022-06-09T08:33:55Z

I actually got it working. Here is a gist (https://gist.github.com/bombela/cc90e5c29f3e7667326de4c087c1e148) with the code for reference. I tested all the corner cases I could think of, and it appears to work beautifully. It requires that the delay functions take their parameters via const generic (delay_ms::<42>()). I will try to assemble a PR together someday.

lord-ne · 2022-06-09T13:02:04Z

@bombela

That looks really good. A couple minor points:

I think that these compile time constant delay functions should probably be separate functions. There are still some use cases where the user knows that a value is known at compile time, but rust doesn't consider it valid in a const context for whatever reason (since the compiler's constant propagation is more powerful than rust's tools for determining const-ness). Although admittedly, I'm struggling to think of such cases off the top of my head. Maybe call the functions like delay_ms_const or something like that?
I think the arguments to delay_us, delay_ms, delay_s should be u32, so they can't overflow the Delayer. In general, I think that the Delayer itself should be made private, since it can overflow. And to be honest, does the MUL and DIV part really need to be part of the Delayer, wouldn't it be simpler to just multiply and divide at the call site (that is, in the body of delay_us, delay_ms, and delay_s) and pass in a single CYCLES value? And if it really is a useful operation, make a seperate function that takes in a u32 cycles and mul, so it can't overflow. EDIT: Overflow actually causes a compile-time error, so this isn't really an issue.
Some convenience macro rules so one can just write delay_s!(3 * 60) instead of delay_s::<{ 3 * 60 }>()
You should probably wait until Fix integer overflow in microseconds delay, typos #16 and/or Rewrote library to fix several issues #17 is merged before making another PR.

stappersg · 2022-06-09T21:38:02Z

On Thu, Jun 09, 2022 at 01:34:07AM -0700, François-Xavier Bourlet wrote: I actually got it working. Here is a gist (https://gist.github.com/bombela/cc90e5c29f3e7667326de4c087c1e148) with the code for reference. I tested all the corner cases I could think of, and it appears to work beautifully. It requires that the delay functions take their parameters via const generic (`delay_ms::<42>()`). I will try to assemble a PR together someday.

Request / idea Idea / request In directory examples in the rewrite branch Groeten Geert Stappers -- Silence is hard to parse

stappersg · 2022-06-09T22:44:09Z

That is not for this rewrite.

I closing this merge request. That might be stupid. The idea is getting a fresh start. Use branch rewrite for it.

Rewrite: Prevent overflow from public functions Hopefully are the underlaying commits preserved. Yes, I do fear that "github.com" does `git merge --squash`. What "lord-ne" wrote: This is the rewrite discussed in #17. Its main purpose is to ensure that public functions cannot overflow, by having them delegate to private functions that accept larger parameters. Some work was also put in to mitigating the effects of the delays not being inlined at compile time. Additional a link to #22 are is being asked for good commit messages.

lord-ne mentioned this pull request Aug 25, 2021

Fix integer overflow in microseconds delay, typos #16

Closed

lord-ne marked this pull request as ready for review August 25, 2021 15:11

stappersg mentioned this pull request Jun 8, 2022

Inline asm: add output constraint, add volatile #15

Closed

bombela reviewed Jun 8, 2022

View reviewed changes

lord-ne marked this pull request as draft June 8, 2022 17:50

lord-ne added 2 commits June 8, 2022 17:04

Add delay_count and delay_loop internal functions

3a6ff52

Implement delay, delay_ms, and delay_us

6558a18

lord-ne force-pushed the master branch from 1016f1a to 6558a18 Compare June 8, 2022 23:06

lord-ne marked this pull request as ready for review June 8, 2022 23:07

Fix compilation error

fcdc04d

lord-ne requested a review from stappersg June 9, 2022 08:06

Divide numerator and denominator by their GCD at compile time

7ce7d3b

Merge upstream

52c19dc

stappersg mentioned this pull request Jun 9, 2022

Commit messages of the rewrite #22

Closed

stappersg closed this Jun 9, 2022

bombela mentioned this pull request Jun 10, 2022

Cycle accurate implementation #23

Closed

lord-ne mentioned this pull request Jun 12, 2022

Rewrite: Prevent overflow from public functions #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrote library to fix several issues #17

Rewrote library to fix several issues #17

lord-ne commented Aug 25, 2021 •

edited

Loading

lord-ne commented Aug 25, 2021

lord-ne commented Aug 25, 2021 •

edited

Loading

lord-ne commented Aug 25, 2021 •

edited

Loading

Patryk27 commented Sep 17, 2021

lord-ne commented Sep 17, 2021 •

edited

Loading

Patryk27 commented Sep 17, 2021 •

edited

Loading

lord-ne commented Sep 17, 2021

Patryk27 commented Sep 18, 2021

stappersg commented May 21, 2022

bombela left a comment

bombela Jun 8, 2022 •

edited

Loading

lord-ne Jun 8, 2022 •

edited

Loading

bombela Jun 8, 2022

lord-ne Jun 8, 2022 •

edited

Loading

lord-ne Jun 8, 2022

bombela Jun 8, 2022 •

edited

Loading

bombela Jun 8, 2022

lord-ne Jun 9, 2022

bombela Jun 9, 2022 •

edited

Loading

bombela Jun 8, 2022

stappersg Jun 8, 2022

stappersg commented Jun 8, 2022

lord-ne commented Jun 8, 2022 •

edited

Loading

lord-ne commented Jun 9, 2022 •

edited

Loading

bombela commented Jun 9, 2022 •

edited

Loading

bombela commented Jun 9, 2022

lord-ne commented Jun 9, 2022 •

edited

Loading

stappersg commented Jun 9, 2022 via email

stappersg commented Jun 9, 2022

Rewrote library to fix several issues #17

Rewrote library to fix several issues #17

Conversation

lord-ne commented Aug 25, 2021 • edited Loading

lord-ne commented Aug 25, 2021

lord-ne commented Aug 25, 2021 • edited Loading

lord-ne commented Aug 25, 2021 • edited Loading

Patryk27 commented Sep 17, 2021

lord-ne commented Sep 17, 2021 • edited Loading

Patryk27 commented Sep 17, 2021 • edited Loading

lord-ne commented Sep 17, 2021

Patryk27 commented Sep 18, 2021

stappersg commented May 21, 2022

bombela left a comment

Choose a reason for hiding this comment

bombela Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

lord-ne Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

bombela Jun 8, 2022

Choose a reason for hiding this comment

lord-ne Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

lord-ne Jun 8, 2022

Choose a reason for hiding this comment

bombela Jun 8, 2022 • edited Loading

Choose a reason for hiding this comment

bombela Jun 8, 2022

Choose a reason for hiding this comment

lord-ne Jun 9, 2022

Choose a reason for hiding this comment

bombela Jun 9, 2022 • edited Loading

Choose a reason for hiding this comment

bombela Jun 8, 2022

Choose a reason for hiding this comment

stappersg Jun 8, 2022

Choose a reason for hiding this comment

stappersg commented Jun 8, 2022

lord-ne commented Jun 8, 2022 • edited Loading

lord-ne commented Jun 9, 2022 • edited Loading

bombela commented Jun 9, 2022 • edited Loading

bombela commented Jun 9, 2022

lord-ne commented Jun 9, 2022 • edited Loading

stappersg commented Jun 9, 2022 via email

stappersg commented Jun 9, 2022

lord-ne commented Aug 25, 2021 •

edited

Loading

lord-ne commented Aug 25, 2021 •

edited

Loading

lord-ne commented Aug 25, 2021 •

edited

Loading

lord-ne commented Sep 17, 2021 •

edited

Loading

Patryk27 commented Sep 17, 2021 •

edited

Loading

bombela Jun 8, 2022 •

edited

Loading

lord-ne Jun 8, 2022 •

edited

Loading

lord-ne Jun 8, 2022 •

edited

Loading

bombela Jun 8, 2022 •

edited

Loading

bombela Jun 9, 2022 •

edited

Loading

lord-ne commented Jun 8, 2022 •

edited

Loading

lord-ne commented Jun 9, 2022 •

edited

Loading

bombela commented Jun 9, 2022 •

edited

Loading

lord-ne commented Jun 9, 2022 •

edited

Loading