Introspection #58

ctfhacker · 2021-04-18T18:57:24Z

This PR adds the perf_stats feature to libafl which would enable performance metrics to each fuzzing instance. This should mostly be used for debugging to further understand what percentage of time is being spent where in the fuzzing pipeline.

Default metrics:

# fuzzers/libfuzzer_libpng/Cargo.toml
libafl = { path = "../../libafl/", features = ["default"] }

[Stats #1] clients: 5, corpus: 950, objectives: 0, executions: 181549, exec/sec: 0
[Stats #1] clients: 5, corpus: 1104, objectives: 0, executions: 361607, exec/sec: 361607
[Stats #1] clients: 5, corpus: 1218, objectives: 0, executions: 529569, exec/sec: 264784
[Stats #1] clients: 5, corpus: 1293, objectives: 0, executions: 698878, exec/sec: 232958

With performance metrics:

# fuzzers/libfuzzer_libpng/Cargo.toml
libafl = { path = "../../libafl/", features = ["default", "perf_stats"] }

[PerfStats #1] clients: 5, corpus: 1425, objectives: 0, executions: 1699447, exec/sec: 205088
Client 001: Scheduler: 0.15 | Manager: 0.00 | Stages:
  Stage 0:
    0.0030: GetInputFromCorpus
    0.0713: Mutate
    0.0004: MutatePostExec
    0.1333: TargetExecution
    0.0113: PreExec
    0.0082: PostExec
    0.0048: PreExecObservers
    0.1051: PostExecObservers
    0.4978: GetFeedbackInterestingAll
    0.0004: GetObjectivesInterestingAll
  Not Measured: 0.01
Client 002: Scheduler: 0.14 | Manager: 0.00 | Stages:
  Stage 0:
    0.0031: GetInputFromCorpus
    0.0727: Mutate
    0.0004: MutatePostExec
    0.1321: TargetExecution
    0.0113: PreExec
    0.0081: PostExec
    0.0050: PreExecObservers
    0.1073: PostExecObservers
    0.5086: GetFeedbackInterestingAll
    0.0004: GetObjectivesInterestingAll
  Not Measured: 0.01

The metrics are stored in the ClientPerfStats struct which is added to the ClientStats struct only when the perf_stats feature is enabled. The timing is performed using rdtsc on x86 (no other archs have been implemented thus far), but is built using cpu::read_time_counter which is also only built for x86. Other archs will have a compile time error that this function has not been created.

The bulk of the new code is in State::execute_input where timers are added (only with perf_stats feature) around each phase of the fuzzing of an input. In this way, researchers can understand the performance ramifications of the components under experiment.

The tests were done in the libfuzzer_libpng example.

Performance with performance monitoring

The follow experiment was ran with the libfuzzer_libpng example fuzzer.

These were executed on the following CPU:

Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz

The stats here are a bit weird in the sense that with perf_stats, there seems to be more exec/sec. I haven't quite figured out why yet, but was hoping someone else could verify on their system as well.

Test without `perf_stats` with 1 fuzzer

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &

[Stats #1] clients: 2, corpus: 351, objectives: 0, executions: 1257056, exec/sec: 47069

Test with `perf_stats` with 1 fuzzer

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &

[PerfStats #1] clients: 2, corpus: 391, objectives: 0, executions: 1447825, exec/sec: 64405
Client 001: Scheduler: 0.13 | Manager: 0.00 | Stages:
  Stage 0:
    0.0040: GetInputFromCorpus
    0.0965: Mutate
    0.0005: MutatePostExec
    0.1782: TargetExecution
    0.0310: PreExec
    0.0261: PostExec
    0.0074: PreExecObservers
    0.1089: PostExecObservers
    0.0005: GetObjectivesInterestingAll
    0.4103: Feedback index 0
    0.0005: Feedback index 1
  Not Measured: 0.01

Test without `perf_stats` with 3 fuzzers

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &

[Stats #1] clients: 4, corpus: 1293, objectives: 0, executions: 3291898, exec/sec: 137719

Test with `perf_stats` with 3 fuzzers

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &

[PerfStats #1] clients: 4, corpus: 1205, objectives: 0, executions: 3863856, exec/sec: 153861
Client 001: Scheduler: 0.18 | Manager: 0.00 | Stages:
  Stage 0:
    0.0035: GetInputFromCorpus
    0.0862: Mutate
    0.0004: MutatePostExec
    0.1977: TargetExecution
    0.0274: PreExec
    0.0235: PostExec
    0.0064: PreExecObservers
    0.0984: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3704: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01
Client 002: Scheduler: 0.16 | Manager: 0.00 | Stages:
  Stage 0:
    0.0035: GetInputFromCorpus
    0.0881: Mutate
    0.0004: MutatePostExec
    0.1980: TargetExecution
    0.0280: PreExec
    0.0242: PostExec
    0.0071: PreExecObservers
    0.1009: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3800: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01
Client 003: Scheduler: 0.14 | Manager: 0.00 | Stages:
  Stage 0:
    0.0036: GetInputFromCorpus
    0.0901: Mutate
    0.0004: MutatePostExec
    0.2008: TargetExecution
    0.0288: PreExec
    0.0247: PostExec
    0.0067: PreExecObservers
    0.1028: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3888: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01

… need to work on getting granular information from `Feedback` and `Observer`

…into real_time_benchmarks

domenukk · 2021-04-18T19:54:03Z

Seems like all it's missing is a good old cargo fmt

domenukk

Cool stuff

fuzzers/libfuzzer_libpng/src/lib.rs

libafl/src/fuzzer.rs

domenukk · 2021-04-18T22:02:28Z

libafl/src/stats/mod.rs

+const NUM_STAGES: usize = 4;
+
+/// Number of feedback mechanisms to measure for performance
+pub const NUM_FEEDBACKS: usize = 4;


Maybe this can be generated at builtime somehow? Since feedback tuples are generated at build time

Hmm.. yeah i'll start looking into how this could be re-worked to be known at compile time specific to the target being compiled.

Initially, I think this would clog up the code a bit, since i /think/ we would have to have something like:

pub struct ClientPerfStats<const S: usize, const F: usize> { ... stages_used: [bool; NUM_STAGES] ... ect }

which would mean then the ClientPerfs also needs it:

struct ClientPerfs<const S: usize, const F: usize> { .... perf_stats: ClientPerfStats<S, F> }

which would then mean all instances of ClientStats would have these across the entire code base.

I /think/ we could have a compile time assert that checks on State creation if the length of the stages or feedback is greater than the number of alloted spaces in the perf stats and maybe print an error if that is the case. Not quite as automatic, though and would require the researcher change the variable if the error triggered.

Thoughts?

Thinking a bit more about this.. I think we would need to have some const fn len() on FeedbackTuple and StagesTuple. I don't think const fn can be used in a trait from rust-lang/rfcs#2632. So I think the only option here is to add a run-time check that fails if too many stages or feedbacks are added.

Now if you sync from dev HasLen has a const LEN field, you can just do ST::LEN for the cost len of the StagesTuple for instance

FWIW Just tried it, but ST::LEN doesn't work yet everywhere. It's not available at some places and we would need nightly features at others. We could use Vecs instead of arrays, though. Not sure if/how much this would impact perf.
Or just leave it as-is...

The arrays are there to ensure we don't do any allocations. I don't mind either way really. We could always just do a vec::with_capacity I guess

domenukk · 2021-04-18T22:04:18Z

libafl/src/stats/mod.rs

+
+    /// Update the total stage counter and increment the stage counter for the next stage
+    pub fn finish_stage(&mut self) {
+        // Increment the stage to the next index. The check is only done if this were to


These short functions may benefit from [inline] (maybe not necessary with LTO builds?)

Yeah I can add a few just in case for the small ones

s1341 · 2021-04-19T03:05:14Z

Awseome! What is necessary to make this aarch64 compatible?

s1341 · 2021-04-19T03:16:44Z

libafl/src/cpu.rs

+/// the fuzzer.
+#[cfg(target_arch = "x86_64")]
+pub fn read_time_counter() -> u64 {
+    unsafe { core::arch::x86_64::_rdtsc() }


Might be worth using https://docs.rs/llvmint/0.0.3/llvmint/fn.readcyclecounter.html, as that will automatically do the right thing for each arch.

Sounds really useful for more obscure platforms!
We should only add the dependency for this feature flag though (and maybe even only for the platforms we can not reach otherwise?)

Oh sure! Didn't know about the llvm intrinsic. If we're okay with the llvmint dependency then I'm fine with using it.

The original thought was to add those specific instructions similar to Google's benchmark: https://github.com/google/benchmark/blob/master/src/cycleclock.h

I think that would mitigate the external dependency just for one instruction per arch.

But that would require asm... which is not stable.

let's go with

libc, if available for this arch

llvmint dependency otherwise
?

Or since it's a feature flag anyway, just llvmint is fine too, if it's too much work otherwise

Just pushed a fix using llvmint::readcyclecounter. Looks like the feature needed requires nightly. I think you guys are wanting to stay on stable as far as I'm aware, which would mean neither this or the asm! fix would work. I guess we could also require nightly for the perf_stats feature as well.

Edit from the last comment.. i had the wrong cfg_attr. I've ripped out only the needed extern from llvmint so we don't need to add that dependency directly to src/cpu.rs and it seems to work fine now.

…link_llvm_intrinsics on the perf_stats feature

s1341 · 2021-04-22T08:28:00Z

libafl/src/lib.rs

@@ -3,6 +3,7 @@ Welcome to libAFL
 */

 #![cfg_attr(not(feature = "std"), no_std)]
+#![cfg_attr(feature = "perf_stats", feature(link_llvm_intrinsics))]


Ins't this feature perma-unstable? Might be worth depending on llvmint instead of requiring nightly.

I think so too. Seems like the stable and hardware-independent way to go for now

I plan to use it in libafl_frida anyways, if that helps.

Oof... llvmint depends on simdty which uses nightly features...
Not good.
We can either do our own C wrapper or go back to the initial proposal to do it using libc...

Ok so we do want to enforce std features. I can convert everything to just using Instant rather than the pure u64. It won't work for no_std obviously. I guess I forgot not everyone doesn't run on nightly all the time ;-)

I guess asking to run in nightly for perf_stats might be a big ask as well?

Yeah llvmint relying on simdty was the reason I ripped out just the extern in the first place.

I think the C wrapper is the way to go...

C wrappers are the way to go but be careful, Rust code is compiled with LLVM but the C code may be compiled with GCC or MSVC, depends on the build.rs.

Yeah, I discovered the hard way that __builtin_thread_pointer is not universally accepted.

domenukk · 2021-04-28T08:32:37Z

@ctfhacker how about renaming this feature to introspection? Eventually, my goal is to add in metadata, like "mutation x was successful y times, and took z seconds in total"

ctfhacker · 2021-04-28T11:05:28Z

Not a problem on my end. Note, the current setup uses clock cycles and not seconds, so we'd have to get the clock speed to covert to seconds. Or just going straight away to Duration if that is a use case we already know about.

domenukk · 2021-04-28T12:28:25Z

Seconds was just an example. Clock cycles is fine, as long as it's measured in the same way and is fast doing so :)

domenukk · 2021-05-01T11:17:08Z

I've renamed it to introspection and moved this back to the x86 only flag. If anyone needs on non-x86, we'll need to link c to it. Any takers?

* initial b2b implementation * no_std and clippy fixes * b2b testcase added * more correct testcases * fixed b2b * typo * fixed unused warning

domenukk · 2021-05-07T00:00:27Z

I've done some very very rudimentary measurements and, while using rdtsc was consistently faster than using the system time, it's not too far off (1 mio execs in 10 seconds on 4 slow, overheated cores) vs 850k using systemtime.
So I've added this as a fallback, as at least the actual measurement seemed to be in the same ballpark.
We can eventually add other arch support using C code, but for now I would like to merge this, I think.

andreafioraldi · 2021-05-07T07:33:18Z

I will try to remove the hardcoded NUM_FEEDBACKS if I can and then merge

andreafioraldi · 2021-05-07T07:48:47Z

ok needs a rework to remove thes consts. Btw, NUM_FEEDBACKS = 4 is too few IMO, should be 16

andreafioraldi · 2021-05-20T09:58:05Z

libafl/src/events/llmp.rs

@@ -230,7 +230,7 @@ where
                let client = stats.client_stats_mut_for(sender_id);
                client.update_corpus_size(*corpus_size as u64);
                client.update_executions(*executions as u64, *time);
-                stats.display(event.name().to_string() + " #" + &sender_id.to_string());
+                // stats.display(event.name().to_string() + " #" + &sender_id.to_string());


@domenukk porcodio, now we have to release again because this line was commented

andreafioraldi · 2021-05-20T10:00:33Z

libafl/src/events/llmp.rs

@@ -241,7 +241,35 @@ where
                // TODO: The stats buffer should be added on client add.
                let client = stats.client_stats_mut_for(sender_id);
                client.update_executions(*executions as u64, *time);
-                stats.display(event.name().to_string() + " #" + &sender_id.to_string());
+                if sender_id == 1 {


Shouldn't be needed, I agree

ctfhacker added 6 commits April 7, 2021 21:23

Rework to put ClientPerfStats in State and pass that along. Still…

12b9584

… need to work on getting granular information from `Feedback` and `Observer`

Add perf_stats feature to libafl/Cargo.toml

c017d72

Update feedbacks to have with_perf

23061ef

Merge branch 'real_time_benchmarks' of github.com:AFLplusplus/LibAFL …

9469776

…into real_time_benchmarks

Merge branch 'dev' into real_time_benchmarks

542738c

Remove unneeeded print statement

fedb5f8

cargo fmt all the things

7423cef

domenukk reviewed Apr 18, 2021

View reviewed changes

s1341 reviewed Apr 19, 2021

View reviewed changes

ctfhacker added 6 commits April 19, 2021 15:46

use local llvmint vs cpu specific asm for reading cycle counter

4ad9c10

Remove debug testing code

b9a0c7c

Stats timeout to 3 seconds

2a96fe3

Inline smallish functions for ClientPerfStats

55703cb

Remove .libs/llvmint and have the correct conditional compilation of …

042bfce

…link_llvm_intrinsics on the perf_stats feature

pub(crate) the NUM_FEEDBACK and NUM_STAGES consts

2b6b369

s1341 reviewed Apr 22, 2021

View reviewed changes

Merge branch 'dev' into real_time_benchmarks

8219da3

domenukk changed the title ~~Real time benchmarks~~ Introspection Apr 29, 2021

perf_stats->instrospection; merged dev; moved back to x86 stable

33df973

domenukk added 2 commits May 1, 2021 13:24

Tcp Broker to Broker Communication (#66)

b9e75c0

* initial b2b implementation * no_std and clippy fixes * b2b testcase added * more correct testcases * fixed b2b * typo * fixed unused warning

merged dev

3fb4626

andreafioraldi changed the base branch from old_dev to dev May 6, 2021 18:19

domenukk added 2 commits May 7, 2021 01:07

adapted to new feedback interface;;

0ac407a

clippy fixes

6cc1e96

fallback to systemtime on non-x86

1fa893e

make clippy more strict

5d1eb27

domenukk requested a review from andreafioraldi May 7, 2021 00:11

andreafioraldi added 2 commits May 7, 2021 09:50

small fixes

f56f340

bump 0.2.1

1e61748

andreafioraldi changed the base branch from dev to main May 7, 2021 07:59

readme

92116b9

andreafioraldi merged commit 94fa04f into main May 7, 2021

andreafioraldi mentioned this pull request May 7, 2021

Introspection #97

Merged

domenukk deleted the real_time_benchmarks branch May 19, 2021 13:34

andreafioraldi reviewed May 20, 2021

View reviewed changes

Introspection #58

Introspection #58

Conversation

ctfhacker commented Apr 18, 2021 • edited Loading

Performance with performance monitoring

Test without perf_stats with 1 fuzzer

Test with perf_stats with 1 fuzzer

Test without perf_stats with 3 fuzzers

Test with perf_stats with 3 fuzzers

domenukk commented Apr 18, 2021

domenukk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1341 commented Apr 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenukk commented Apr 28, 2021

ctfhacker commented Apr 28, 2021

domenukk commented Apr 28, 2021

domenukk commented May 1, 2021

domenukk commented May 7, 2021 • edited Loading

andreafioraldi commented May 7, 2021 • edited Loading

andreafioraldi commented May 7, 2021

andreafioraldi May 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctfhacker commented Apr 18, 2021 •

edited

Loading

Test without `perf_stats` with 1 fuzzer

Test with `perf_stats` with 1 fuzzer

Test without `perf_stats` with 3 fuzzers

Test with `perf_stats` with 3 fuzzers

domenukk commented May 7, 2021 •

edited

Loading

andreafioraldi commented May 7, 2021 •

edited

Loading

andreafioraldi May 20, 2021 •

edited

Loading