Various changes to `ChaChaRng` #243

pitdicker · 2018-01-21T07:56:44Z

This PR is based on top of #233.

Parts of the changes here come from my attempts with a with BlockRngCore trait, like #242. And better to finish this seperately, so all functional changes and changes to the documentation can be reviewed better.

The init function contained useful documentation that was not visible because init was not a public function. So I moved that to the documentation of ChaChaRng. Also I wrote some more in a similar style as HC-128.

I added some tests and better documentation for set_counter. Trying to use it to set a nonce is difficult, considering the trouble with endianness and different standards specifying different schemes for the nonce and counter. So I documented it to be not a supported use of set_rounds.

It turned out to be easy to make the number of rounds the core ChaCha algorithm runs configurable, just the addition of a set_rounds method.

And I stumbled on a way to improve the performance, by using an usize for the number of rounds and keeping the core algorithm a seperate function. No idea why it works though.

Benchmarks before:

test gen_bytes_chacha   ... bench:   2,782,004 ns/iter (+/- 5,165) = 368 MB/s
test gen_u32_chacha     ... bench:      10,774 ns/iter (+/- 11) = 371 MB/s
test gen_u64_chacha     ... bench:      21,564 ns/iter (+/- 75) = 370 MB/s

Benchmarks after this and #242:

test gen_bytes_chacha   ... bench:   1,898,515 ns/iter (+/- 6,578) = 539 MB/s
test gen_u32_chacha     ... bench:       8,357 ns/iter (+/- 24) = 478 MB/s
test gen_u64_chacha     ... bench:      16,207 ns/iter (+/- 47) = 493 MB/s

Finally remembered why it is a good idea to wait with generating results until the first use: it makes things like ChaChaRng::new().set_counter(500) possible.

cc @burdges, because I know you have an interest in ChaCha.

dhardy

Looks good!

Just curious, how does ChaCha with 8 or 12 rounds compare to HC128 and ISAAC in terms of performance?

dhardy · 2018-01-21T08:26:27Z

src/prng/chacha.rs

+        // https://tools.ietf.org/html/draft-nir-cfrg-chacha20-poly1305-04
+        let seed = [0u8; 32];
+        let mut rng: ChaChaRng = SeedableRng::from_seed(seed);
+        rng.set_counter(0, 2u64.to_be());


The user needs to call .to_be() on the indexes to get portable results? That is counter-intuitive for a counter. Are you sure it's correct, given that actually set_counter converts each u64 to two u32s via bit-shift? I suspect it's not. (In fact, it's different from your example in the documentation.)

Think we should replace set_counter with something easier to use correctly? Would it be better to use four u32s?

This comes from the test vector, which sets a 96-bit nonce to 00 00 00 00 00 00 00 00 00 00 00 02. I am not sure I got it right with the .to_be() call. Probably should just not add that test, because in the comment I claimed we do not support using it for a nonce.

Think we should replace set_counter with something easier to use correctly? Would it be better to use four u32s?

If we of think ChaCha in terms of an RNG and not a stream cipher, an u128 counter is best. And two u64s an ok alternative. I don't think four u32s are handy to set a counter.

If we really want to set a nonce it should take a byte array, to match from_seed.

Just thought of another reason not to support setting a nonce: we still use it as part of a counter, and not as a constant value. So if you run the RNG long enough, eventually the counter will tick over and increment the nonce., producing 'incorrect' results.

Then probably that test should use:

let counter = 2u64; (counter >> 32) | (counter << 32)

— but I don't understand why the test passes as-is (it sounds like state[15] should be set to 2u32).

To be honest, I'm not really sure where this function would be useful, if at all. Perhaps we should also add:

fn set_nonce(&mut self, counter: u64, nonce: [u8; 8])

So if you run the RNG long enough, eventually the counter will tick over and increment the nonce., producing 'incorrect' results.

The only correct behaviour would be to stop with an error, which conflicts with our primary usage (as a good RNG). If used to encrypt/decrypt a longer byte stream, I guess it has to be up to external code to handle this part correctly (e.g. split into chunks, or just continue encrypting with a note on compatibility).

I opened an issue: dhardy#86

For now just try shifting the nonce as I suggested?

Then probably that test should use:

let counter = 2u64; (counter >> 32) | (counter << 32)

— but I don't understand why the test passes as-is (it sounds like state[15] should be set to 2u32).

Your code misses the endian troubles... But my 2u64.to_be() was wrong also because it does not get converted on BE. rng.set_counter(0, 2u64 << 56); works. But I am quite happy to see the test gone.

A slice [0u8, 0, 0, 2] is the same as 0x0200_0000u32 in little-endian, the format we standardised upon inside the RNG. Together with the u32s that are in opposite order the value is 0x0200_0000_0000_0000u64.

dhardy · 2018-01-21T08:30:16Z

src/prng/chacha.rs

 ///
-/// [1]: D. J. Bernstein, [*ChaCha, a variant of
-/// Salsa20*](https://cr.yp.to/chacha.html)
+/// ChaCha uses add-rotate-xor (ARX) operations as basis. These are safe


"as its basis"

dhardy · 2018-01-21T08:31:21Z

src/prng/chacha.rs

+///
+/// With the ChaCha algorithm it is possible to choose the number of rounds the
+/// core algorithm should run. By default `ChaChaRng` is created as ChaCha20,
+/// with means 20 rounds. The number of rounds is a tradeoff between performance


"which means"

dhardy · 2018-01-21T08:31:36Z

src/prng/chacha.rs

+/// With the ChaCha algorithm it is possible to choose the number of rounds the
+/// core algorithm should run. By default `ChaChaRng` is created as ChaCha20,
+/// with means 20 rounds. The number of rounds is a tradeoff between performance
+/// an security, 8 rounds are considered the minimum to be secure. A different


and security

dhardy · 2018-01-21T08:33:41Z

src/prng/chacha.rs

+/// core algorithm should run. By default `ChaChaRng` is created as ChaCha20,
+/// with means 20 rounds. The number of rounds is a tradeoff between performance
+/// an security, 8 rounds are considered the minimum to be secure. A different
+/// number of rounds can be set with [`set_rounds`].


set using set_rounds, or via

pitdicker · 2018-01-21T12:22:36Z

Good comments, thank you.

how does ChaCha with 8 or 12 rounds compare to HC128 and ISAAC in terms of performance?

Still slowest, but not bad at all (with 8 rounds):

test gen_bytes_chacha   ... bench:     902,419 ns/iter (+/- 15,331) = 1134 MB/s
test gen_bytes_hc128    ... bench:     441,192 ns/iter (+/- 2,454) = 2320 MB/s
test gen_bytes_isaac    ... bench:     710,000 ns/iter (+/- 2,664) = 1442 MB/s
test gen_bytes_isaac64  ... bench:     381,510 ns/iter (+/- 870) = 2684 MB/s

dhardy · 2018-01-24T11:26:45Z

benches/generators.rs

 gen_uint_new!(gen_u64_std, u64, StdRng);
 gen_uint_new!(gen_u64_os, u64, OsRng);

+// Do not test JitterRng like the others by running it RAND_BENCH_N times per,
+// measurement, because it is way to slow. Only run it once


I think it's time you stopped confusing 'to' and 'too'!

You are right. I will try too not make to many mistakes with them anymore. Starting tomorrow 😄.

dhardy · 2018-01-24T11:45:17Z

src/prng/chacha.rs

    }

    /// Refill the internal output buffer (`self.buffer`)
    fn update(&mut self) {
-        core(&mut self.buffer, &self.state);
+        // For some reason extracting this part into a seperate function


pitdicker · 2018-01-24T20:50:32Z

I tried using the code from @PeterReid's implementation of ChaCha, integrating it was super easy.

Results on stable:

test gen_bytes_chacha20 ... bench:   2,088,500 ns/iter (+/- 3,958) = 490 MB/s
test gen_u32_chacha20   ... bench:       9,195 ns/iter (+/- 31) = 435 MB/s
test gen_u64_chacha20   ... bench:      18,828 ns/iter (+/- 67) = 424 MB/s

Results on nightly with #[repr(simd)]:

test gen_bytes_chacha20 ... bench:   1,796,552 ns/iter (+/- 4,974) = 569 MB/s
test gen_u32_chacha20   ... bench:       7,722 ns/iter (+/- 24) = 518 MB/s
test gen_u64_chacha20   ... bench:      15,510 ns/iter (+/- 90) = 515 MB/s

Compared with our current implementation:

test gen_bytes_chacha   ... bench:   1,898,515 ns/iter (+/- 6,578) = 539 MB/s
test gen_u32_chacha     ... bench:       8,357 ns/iter (+/- 24) = 478 MB/s
test gen_u64_chacha     ... bench:      16,207 ns/iter (+/- 47) = 493 MB/s

The result with SIMD can be improved again with ±3% by making the state also #[repr(simd)]. The assembly for SIMD looks pretty much optimal to me. Yet it regresses stable by another 12%.

I am not sure if this change is worth it at the moment, because it is only faster on nightly. But nice to know that our implementation is actually pretty decent already.

dhardy · 2018-01-25T11:19:53Z

Think I'll go ahead and merge manually. It's clear from the Travis failures that GitHub messed up the merge :-(

pitdicker added 5 commits January 21, 2018 08:36

Add tests for ChaChaRng set_counter

17b9571

Make the number of rounds in ChaCha variable

f6ddbe9

Extend ChaChaRng documentation

c22c5d5

Improve performance of ChaChaRng::update

b053638

Fold ChaCha init into from_seed

8abe13a

dhardy reviewed Jan 21, 2018

View reviewed changes

pitdicker added 2 commits January 21, 2018 13:42

Address review comments

feb5b10

Add benchmarks for ChaChaRng with 8 and 12 rounds

4a44964

dhardy mentioned this pull request Jan 22, 2018

ChaCha: nonce and counter dhardy/rand#86

Closed

pitdicker mentioned this pull request Jan 23, 2018

Port new SeedableRng trait #233

Merged

pitdicker added 2 commits January 24, 2018 07:39

Fix for Rust 1.15

c17f7f1

Add back test_chacha_set_counter

fb5b3bb

pitdicker force-pushed the chacha branch from 9c1a877 to fb5b3bb Compare January 24, 2018 06:42

dhardy approved these changes Jan 24, 2018

View reviewed changes

Fix spelling mistakes

12e911c

pitdicker mentioned this pull request Jan 25, 2018

Add EntropySource wrapper #235

Merged

Replace SeedableRng::from_seed with ChaChaRng::from_seed

c320c7d

dhardy merged commit c320c7d into rust-random:master Jan 25, 2018

pitdicker deleted the chacha branch January 25, 2018 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes to `ChaChaRng` #243

Various changes to `ChaChaRng` #243

pitdicker commented Jan 21, 2018

dhardy left a comment

dhardy Jan 21, 2018

pitdicker Jan 21, 2018

dhardy Jan 22, 2018 •

edited

Loading

dhardy Jan 22, 2018

pitdicker Jan 22, 2018

pitdicker Jan 24, 2018

dhardy Jan 21, 2018

dhardy Jan 21, 2018

dhardy Jan 21, 2018

dhardy Jan 21, 2018

pitdicker commented Jan 21, 2018 •

edited

Loading

dhardy Jan 24, 2018

pitdicker Jan 24, 2018

dhardy Jan 24, 2018

pitdicker commented Jan 24, 2018

dhardy commented Jan 25, 2018

Various changes to ChaChaRng #243

Various changes to ChaChaRng #243

Conversation

pitdicker commented Jan 21, 2018

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Jan 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Jan 24, 2018

dhardy commented Jan 25, 2018

Various changes to `ChaChaRng` #243

Various changes to `ChaChaRng` #243

dhardy Jan 22, 2018 •

edited

Loading

pitdicker commented Jan 21, 2018 •

edited

Loading