Out of memory with Kani 0.46.0 on an s2n-quic harness #3030

zhassan-aws · 2024-02-15T17:16:20Z

Kani v0.46.0 runs out of memory on the inet::checksum::tests::differential harness in s2n-quic.

This is an example run that runs out of memory:

https://github.com/aws/s2n-quic/actions/runs/7892365794/job/21538690964

This is an example passing run with Kani 0.45.0:

https://github.com/aws/s2n-quic/actions/runs/7837153634/job/21386277828

The text was updated successfully, but these errors were encountered:

zhassan-aws · 2024-02-15T17:40:12Z

I traced this back to this upstream change:

rust-lang/rust#118578

Replacing all the split_at calls in https://github.com/aws/s2n-quic/blob/cf0b40ec213ba9437c8c4dfac43cca593b15e62f/quic/s2n-quic-core/src/inet/checksum.rs#L408 with split_at_unchecked (which requires using unsafe and adding #![feature(slice_split_at_unchecked)]), verification succeeds.

The upstream change effectively added an extra layer of Option<(&[T], &[T])> that split_at needs to unwrap (which it does through a match). This extra layer seems to result in a significant increase in the symex run time/unwinding. The CBMC stats with Kani 0.46.0 are:

Runtime Symex: 231.01s
size of program expression: 4084816 steps
slicing removed 3624753 assignments
Generated 94180 VCC(s), 36701 remaining after simplification
Runtime Postprocess Equation: 9.21824s
Passing problem to propositional reduction
converting SSA
Runtime Convert SSA: 23.0261s

whereas for a passing run with Kani 0.45.0, the stats are:

Runtime Symex: 11.9535s
size of program expression: 348664 steps
slicing removed 274660 assignments
Generated 20558 VCC(s), 6312 remaining after simplification
Runtime Postprocess Equation: 0.413516s
Passing problem to propositional reduction
converting SSA
Runtime Convert SSA: 1.28765s
Running propositional reduction
Post-processing
Runtime Post-process: 0.145403s

zhassan-aws · 2024-02-15T17:49:20Z

@tautschnig can you take a look?

zhassan-aws · 2024-02-16T01:09:12Z

Replacing all the split_at calls in https://github.com/aws/s2n-quic/blob/cf0b40ec213ba9437c8c4dfac43cca593b15e62f/quic/s2n-quic-core/src/inet/checksum.rs#L408 with split_at_unchecked (which requires using unsafe and adding #![feature(slice_split_at_unchecked)]), verification succeeds.

Actually, just replacing the split_at call in write_sized_generic with split_at_unchecked is sufficient to make verification succeed. This is the complete diff:

diff --git a/quic/s2n-quic-core/src/inet/checksum.rs b/quic/s2n-quic-core/src/inet/checksum.rs
index 551c88b9..9b0843ae 100644
--- a/quic/s2n-quic-core/src/inet/checksum.rs
+++ b/quic/s2n-quic-core/src/inet/checksum.rs
@@ -58,7 +58,7 @@ fn write_sized_generic<'a, const MAX_LEN: usize, const CHUNK_LEN: usize>(
     //# }
 
     while bytes.len() >= MAX_LEN {
-        let (chunks, remaining) = bytes.split_at(MAX_LEN);
+        let (chunks, remaining) = unsafe { bytes.split_at_unchecked(MAX_LEN) };
 
         bytes = remaining;
 
@@ -404,12 +404,12 @@ mod tests {
     // Reduce the length to 4 for Kani until
     // https://github.com/model-checking/kani/issues/3030 is fixed
     #[cfg(any(kani, miri))]
-    const LEN: usize = if cfg!(kani) { 4 } else { 32 };
+    const LEN: usize = if cfg!(kani) { 16 } else { 32 };
 
     /// * Compares the implementation to a port of the C code defined in the RFC
     /// * Ensures partial writes are correctly handled, even if they're not at a 16 bit boundary
     #[test]
-    #[cfg_attr(kani, kani::proof, kani::unwind(9), kani::solver(cadical))]
+    #[cfg_attr(kani, kani::proof, kani::unwind(17), kani::solver(cadical))]
     fn differential() {
         #[cfg(any(kani, miri))]
         type Bytes = crate::testing::InlineVec<u8, LEN>;
diff --git a/quic/s2n-quic-core/src/lib.rs b/quic/s2n-quic-core/src/lib.rs
index 4ec0e3cf..65971201 100644
--- a/quic/s2n-quic-core/src/lib.rs
+++ b/quic/s2n-quic-core/src/lib.rs
@@ -3,6 +3,7 @@
 
 #![cfg_attr(not(any(test, feature = "std")), no_std)]
 
+#![feature(slice_split_at_unchecked)]
 #[cfg(feature = "alloc")]
 extern crate alloc;

camshaft · 2024-02-16T02:46:06Z

What's weird is the line above the split_at checks to make sure that it's in bounds.

while bytes.len() >= MAX_LEN {

I guess I don't know how much optimizations we do in MIR or GOTO but I would expect the split_at to be equivalent to split_at_unchecked with that check. Even with --opt-level=1, all of the panics are removed with the LLVM backend: https://godbolt.org/z/rjcqn67Kc.

tautschnig · 2024-02-16T13:10:40Z

It seems the main problem is indeed the bounds check performed by split_at, which in turn results in a number of further operations that are conditional on that. What we seemingly would need, and this is in line with what @camshaft said, is an interval abstract domain running alongside the analysis. We want to do this for loops anyway, so we might as well extend the use to also cover cases like this one.

mina86 · 2024-02-20T11:32:29Z

FYI, I’ve no idea what Kani is or what it does, but if you’re happy with nightly features, the proper way to do loop seen in the diff is:

    while let Some((chunks, remaining)) = bytes.split_at(MAX_LEN) {

Or if MAX_LEN is a constant you can use split_first_chunk:

    while let Some((chunks, remaining)) = bytes.split_first_chunk::<{ MAX_LEN }>() {

celinval · 2024-02-20T19:22:39Z

Replacing all the split_at calls in https://github.com/aws/s2n-quic/blob/cf0b40ec213ba9437c8c4dfac43cca593b15e62f/quic/s2n-quic-core/src/inet/checksum.rs#L408 with split_at_unchecked (which requires using unsafe and adding #![feature(slice_split_at_unchecked)]), verification succeeds.

Actually, just replacing the split_at call in write_sized_generic with split_at_unchecked is sufficient to make verification succeed. This is the complete diff:

diff --git a/quic/s2n-quic-core/src/inet/checksum.rs b/quic/s2n-quic-core/src/inet/checksum.rs
index 551c88b9..9b0843ae 100644
--- a/quic/s2n-quic-core/src/inet/checksum.rs
+++ b/quic/s2n-quic-core/src/inet/checksum.rs
@@ -58,7 +58,7 @@ fn write_sized_generic<'a, const MAX_LEN: usize, const CHUNK_LEN: usize>(
     //# }
 
     while bytes.len() >= MAX_LEN {
-        let (chunks, remaining) = bytes.split_at(MAX_LEN);
+        let (chunks, remaining) = unsafe { bytes.split_at_unchecked(MAX_LEN) };
 
         bytes = remaining;
 
@@ -404,12 +404,12 @@ mod tests {
     // Reduce the length to 4 for Kani until
     // https://github.com/model-checking/kani/issues/3030 is fixed
     #[cfg(any(kani, miri))]
-    const LEN: usize = if cfg!(kani) { 4 } else { 32 };
+    const LEN: usize = if cfg!(kani) { 16 } else { 32 };
 
     /// * Compares the implementation to a port of the C code defined in the RFC
     /// * Ensures partial writes are correctly handled, even if they're not at a 16 bit boundary
     #[test]
-    #[cfg_attr(kani, kani::proof, kani::unwind(9), kani::solver(cadical))]
+    #[cfg_attr(kani, kani::proof, kani::unwind(17), kani::solver(cadical))]
     fn differential() {
         #[cfg(any(kani, miri))]
         type Bytes = crate::testing::InlineVec<u8, LEN>;
diff --git a/quic/s2n-quic-core/src/lib.rs b/quic/s2n-quic-core/src/lib.rs
index 4ec0e3cf..65971201 100644
--- a/quic/s2n-quic-core/src/lib.rs
+++ b/quic/s2n-quic-core/src/lib.rs
@@ -3,6 +3,7 @@
 
 #![cfg_attr(not(any(test, feature = "std")), no_std)]
 
+#![feature(slice_split_at_unchecked)]
 #[cfg(feature = "alloc")]
 extern crate alloc;

Can we stub this for now? Also, why wasn't this caught in our regression?

camshaft · 2024-02-20T19:28:45Z

Can we stub this for now?

We worked around the issue in aws/s2n-quic#2128. However, I wanted to capture this issue and see if we could return to the original length.

why wasn't this caught in our regression?

From my understanding, there are only a handful of s2n-quic harnesses running in the Kani CI. This harness was added ~8 months ago and isn't captured in that list. Even if the Kani CI was running all of the harnesses in s2n-quic, the submodule was quite old and didn't have this harness.

zhassan-aws · 2024-02-20T19:32:28Z

Can we stub this for now?

That should be doable.

Also, why wasn't this caught in our regression?

We hadn't updated the s2n-quic submodule for some time, so this harness wasn't included in our regressions. We've now updated it.

WesleyRosenblum · 2024-04-18T19:12:52Z

This might be happening again:

tautschnig · 2024-04-18T21:18:19Z

This might be happening again:

* https://github.com/aws/s2n-quic/actions/runs/8742616782/job/23991319945?pr=2184

* https://github.com/aws/s2n-quic/actions/runs/8742616782/job/23993046067?pr=2184#step:3:36574

This wouldn't be too surprising to me: in #3117 (comment) I logged that we appeared to be very narrowly staying within the limits of available memory for this test.

zhassan-aws · 2024-08-29T18:02:31Z

This issue is resolved with Kani 0.54.0: memory usage for the original harness (using a length of 16) does not exceed 1.5 GB:

$ /usr/bin/time --verbose cargo kani --harness inet::checksum::tests::differential
...
SUMMARY:
 ** 0 of 887 failed (6 unreachable)

 ** 1 of 1 cover properties satisfied


VERIFICATION:- SUCCESSFUL
Verification Time: 123.26682s

Complete - 1 successfully verified harnesses, 0 failures, 1 total.
        Command being timed: "cargo kani --harness inet::checksum::tests::differential"
        User time (seconds): 273.60
        System time (seconds): 13.51
        Percent of CPU this job got: 115%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:07.62
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1466228
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 26960
        Minor (reclaiming a frame) page faults: 2999267
        Voluntary context switches: 81426
        Involuntary context switches: 3385
        Swaps: 0
        File system inputs: 3528
        File system outputs: 1548984
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Closing this issue.

tautschnig · 2024-08-30T19:20:16Z

Looking at the logs of our past action runs it seems that #3269 made all the difference.

zhassan-aws · 2024-09-04T21:47:33Z

I tracked it down to this change: aws/s2n-quic#2233 which introduced the usage of get_unchecked, and thus removed the extra Option layer that was causing the memory blowup in symex.

Without this change, the memory blowup occurs even with the latest version of Kani (0.55.0). So, unfortunately, it's not a CBMC/Kani improvement that fixed this issue.

zhassan-aws added T-High Priority Tag issues that have high priority [E] Performance Track performance improvement (Time / Memory / CPU) [C] Feature / Enhancement A new feature request or enhancement to an existing feature. T-User Tag user issues / requests labels Feb 15, 2024

zhassan-aws assigned tautschnig Feb 15, 2024

zhassan-aws mentioned this issue Feb 15, 2024

test(s2n-quic-core): use a smaller length in the Kani test aws/s2n-quic#2128

Merged

tautschnig mentioned this issue Apr 4, 2024

Automatic cargo update to 2024-04-01 #3117

Merged

WesleyRosenblum mentioned this issue Apr 18, 2024

feat(s2n-quic-core): Add ability to create an incremental reader initialized with an offset aws/s2n-quic#2184

Merged

tautschnig mentioned this issue Apr 29, 2024

Do not assume that ZST-typed symbols refer to unique objects #3134

Merged

tautschnig mentioned this issue Jun 5, 2024

Inject pointer validity check when casting raw pointers to references #3221

Merged

camshaft mentioned this issue Jun 10, 2024

build(deps): update bolero to 0.11.0 aws/s2n-quic#2233

Merged

zhassan-aws mentioned this issue Aug 5, 2024

Performance drastically differs across two string indexing operations #3328

Open

zhassan-aws closed this as completed Aug 29, 2024

zhassan-aws mentioned this issue Aug 29, 2024

test(s2n-quic-core): revert to bigger length for Kani test aws/s2n-quic#2312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory with Kani 0.46.0 on an s2n-quic harness #3030

Out of memory with Kani 0.46.0 on an s2n-quic harness #3030

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 16, 2024

camshaft commented Feb 16, 2024 •

edited

Loading

tautschnig commented Feb 16, 2024

mina86 commented Feb 20, 2024

celinval commented Feb 20, 2024

camshaft commented Feb 20, 2024

zhassan-aws commented Feb 20, 2024

WesleyRosenblum commented Apr 18, 2024

tautschnig commented Apr 18, 2024

zhassan-aws commented Aug 29, 2024

tautschnig commented Aug 30, 2024

zhassan-aws commented Sep 4, 2024

Out of memory with Kani 0.46.0 on an s2n-quic harness #3030

Out of memory with Kani 0.46.0 on an s2n-quic harness #3030

Comments

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 15, 2024

zhassan-aws commented Feb 16, 2024

camshaft commented Feb 16, 2024 • edited Loading

tautschnig commented Feb 16, 2024

mina86 commented Feb 20, 2024

celinval commented Feb 20, 2024

camshaft commented Feb 20, 2024

zhassan-aws commented Feb 20, 2024

WesleyRosenblum commented Apr 18, 2024

tautschnig commented Apr 18, 2024

zhassan-aws commented Aug 29, 2024

tautschnig commented Aug 30, 2024

zhassan-aws commented Sep 4, 2024

camshaft commented Feb 16, 2024 •

edited

Loading