Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Corpus Pruning algorithm #2418

Merged
merged 18 commits into from
Jul 18, 2024
4 changes: 2 additions & 2 deletions libafl/src/events/llmp/restarting.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ use serde::{Deserialize, Serialize};
#[cfg(feature = "std")]
use typed_builder::TypedBuilder;

#[cfg(feature = "std")]
use crate::events::{AdaptiveSerializer, CustomBufEventResult, HasCustomBufHandlers};
#[cfg(all(unix, feature = "std", not(miri)))]
use crate::events::EVENTMGR_SIGHANDLER_STATE;
#[cfg(feature = "std")]
use crate::events::{AdaptiveSerializer, CustomBufEventResult, HasCustomBufHandlers};
use crate::{
events::{
Event, EventConfig, EventFirer, EventManager, EventManagerHooksTuple, EventManagerId,
Expand Down
1 change: 1 addition & 0 deletions libafl/src/stages/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ pub mod tuneable;
pub mod unicode;

pub mod pruning;
pub use pruning::*;

/// A stage is one step in the fuzzing process.
/// Multiple stages will be scheduled one by one for each input.
Expand Down
84 changes: 58 additions & 26 deletions libafl/src/stages/pruning.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
//! Corpus pruning stage

use alloc::string::ToString;
use core::marker::PhantomData;

use libafl_bolts::{rands::Rand, Error};

use crate::{
corpus::Corpus,
corpus::{Corpus, HasCurrentCorpusId},
schedulers::{RemovableScheduler, Scheduler},
stages::Stage,
state::{HasCorpus, HasRand, UsesState},
HasScheduler,
};
#[cfg(feature = "std")]
use crate::{events::EventRestarter, state::Stoppable};
Expand Down Expand Up @@ -48,47 +51,76 @@ impl<E, EM, Z> Stage<E, EM, Z> for CorpusPruning<EM>
where
EM: UsesState,
E: UsesState<State = Self::State>,
Z: UsesState<State = Self::State>,
Z: UsesState<State = Self::State> + HasScheduler,
<Z as HasScheduler>::Scheduler: RemovableScheduler,
Self::State: HasCorpus + HasRand,
{
#[allow(clippy::cast_precision_loss)]
fn perform(
&mut self,
_fuzzer: &mut Z,
fuzzer: &mut Z,
_executor: &mut E,
state: &mut Self::State,
_manager: &mut EM,
) -> Result<(), Error> {
// Iterate over every corpus entry
let n_corpus = state.corpus().count_all();
let mut do_retain = vec![];
let mut retain_any = false;
for _ in 0..n_corpus {
// Iterate over every corpus entr
let n_all = state.corpus().count_all();
let n_enabled = state.corpus().count();

let Some(currently_fuzzed_idx) = state.current_corpus_id()? else {
return Err(Error::illegal_state("Not fuzzing any testcase".to_string()));
};

// eprintln!("Currently fuzzing {:#?}", currently_fuzzed_idx);

let mut disabled_to_enabled = vec![];
let mut enabled_to_disabled = vec![];
// do it backwards so that the index won't change even after remove
for i in (0..n_all).rev() {
let r = state.rand_mut().below(100) as f64;
let retain = self.prob * 100_f64 < r;
if retain {
retain_any = true;
if self.prob * 100_f64 < r {
let idx = state.corpus().nth_from_all(i);

// skip the currently fuzzed id; don't remove it
// because else after restart we can't call currrent.next() to find the next testcase
if idx == currently_fuzzed_idx {
// eprintln!("skipping {:#?}", idx);
continue;
}

let removed = state.corpus_mut().remove(idx)?;
fuzzer
.scheduler_mut()
.on_remove(state, idx, &Some(removed.clone()))?;
// because [n_enabled, n_all) is disabled testcases
// and [0, n_enabled) is enabled testcases
if i >= n_enabled {
// we are moving disabled to enabled now
disabled_to_enabled.push((idx, removed));
} else {
// we are moving enabled to disabled now
enabled_to_disabled.push((idx, removed));
}
}
do_retain.push(retain);
}

// Make sure that at least somthing is in the
if !retain_any {
let r = state.rand_mut().below(n_corpus);
do_retain[r] = true;
// Actually move them
for (idx, testcase) in disabled_to_enabled {
state.corpus_mut().add(testcase)?;
fuzzer.scheduler_mut().on_add(state, idx)?;
}

for (i_th, retain) in do_retain.iter().enumerate().take(n_corpus) {
if !retain {
let corpus_id = state.corpus().nth_from_all(i_th);

let corpus = state.corpus_mut();
let removed = corpus.remove(corpus_id)?;
corpus.add_disabled(removed)?;
}
for (idx, testcase) in enabled_to_disabled {
state.corpus_mut().add_disabled(testcase)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should ever allow a call to call add(_disabled) with a testcase that's already added, we should have a set_enabled(true/false) method instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems counter-intuitive as API, and it will hide bugs where we add them twice by accident

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this testcase is not "already added" at this point of execution, because we just removed it a few lines before.

Copy link
Member Author

@tokatoka tokatoka Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a set_enabled(true/false) method instead

yes i know.
but if we will implement it, then internally it would look the same as this. because disabled and enabled have separate corpus. if you want move them, then the only way is (i guess) to delete from one and then add to the other

fuzzer.scheduler_mut().on_add(state, idx)?;
}

// println!("There was {}, and we retained {} corpura", n_corpus, state.corpus().count());
/*
eprintln!(
"There was {}, and we retained {} corpura",
n_all,
state.corpus().count()
);
*/
Ok(())
}

Expand Down
Loading