Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MutatedTransform to the input type in TMinMutationalStage (#1251) #1971

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

am009
Copy link
Contributor

@am009 am009 commented Mar 25, 2024

No description provided.

@am009
Copy link
Contributor Author

am009 commented Mar 25, 2024

I'm not very sure about modifications (three lines of code) in the second commit (12b3849). The TMinMutationalStage will construct a new Testcase with the minimized input and replace the original one. I guess the post-operation is needed, for example, to add metadata to the new testcase.


let before_len = input.len();
let before_len = base.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be input_transformed.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the awkward places.

The input_transformed is now of the wrapped type MutatedTransform<Input, State>. I think here we want the length of the internal input (if I understand correctly). If we use try_transform_into to get the internal input (so that we can call .len()), MutatedTransform<Input, State> will be consumed. But we still need to keep MutatedTransform<Input, State> to pass into the mutator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok got it

@tokatoka
Copy link
Member

I'm not very sure about modifications (three lines of code) in the second commit (12b3849). The TMinMutationalStage will construct a new Testcase with the minimized input and replace the original one. I guess the post-operation is needed, for example, to add metadata to the new testcase.

i think

            post.post_exec(state, corpus_idx)?;

already takes care of that post-operation right?

@am009
Copy link
Contributor Author

am009 commented Mar 25, 2024

I'm not very sure about modifications (three lines of code) in the second commit (12b3849). The TMinMutationalStage will construct a new Testcase with the minimized input and replace the original one. I guess the post-operation is needed, for example, to add metadata to the new testcase.

i think

            

already takes care of that post-operation right?

I think post.post_exec(state, corpus_idx)?; handles the corpus_id returned by fuzzer.process_execution. Is that corpus_id equal to the base_corpus_id?

I'm not sure about the behavior of fuzzer.process_execution. Will TMinMutationalStage inserts a new testcase into the corpus (and returns the new corpus_id) if the input is interesting like normal MutationalStage? This behavior does not seem useful for the minimization process.

@tokatoka
Copy link
Member

tokatoka commented Mar 25, 2024

I think post.post_exec(state, corpus_idx)?; handles the corpus_id returned by fuzzer.process_execution. Is that corpus_id equal to the base_corpus_id?

No
Base could be

  1. The initial testcase that you start minimizing with
    for this one you don't have to call post_exec() right?
    Or,
  2. The minimized testcase that you shrinked from 1)
    for this one you already called post_exec at line 162

Will TMinMutationalStage inserts a new testcase into the corpus (and returns the new corpus_id)

It's not really insert it's replace

@tokatoka
Copy link
Member

wait I got it.

i think you should replace post.post_exec() with base_post.post_exec().

@am009
Copy link
Contributor Author

am009 commented Mar 25, 2024

No Base could be

  1. The initial testcase that you start minimizing with
    for this one you don't have to call post_exec() right?
    Or,
  2. The minimized testcase that you shrinked from 1)
    for this one you already called post_exec at line 162

But the base is of type Input, not Testcase<Input>. And there is a new test case constructed before the replacement. Metadata is probably lost? (and the post-operation are used to maintain the metadata, at least for StringIdentificationMetadata.)

It's not really insert it's replace

Yeah, the replacement is happening out of the loop, but there is also a corpus_idx returned by the fuzzer.process_execution inside the loop. Is that a new index for a new testcase?

@tokatoka
Copy link
Member

Now I understand

Metadata is probably lost? (and the post-operation are used to maintain the metadata, at least for StringIdentificationMetadata.)

Yes you are right

@tokatoka
Copy link
Member

Yeah, the replacement is happening out of the loop, but there is also a corpus_idx returned by the fuzzer.process_execution inside the loop. Is that a new index for a new testcase?

I think It's the same in index as before.

@@ -171,6 +182,7 @@ where
fuzzer
.scheduler_mut()
.on_replace(state, base_corpus_idx, &prev)?;
base_post.unwrap().post_exec(state, Some(base_corpus_idx))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try not to use unwrap() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have updated.
Now it will return an error when the option is empty:

            // perform the post operation for the new testcase, e.g. to update metadata.
            // base_post should be updated along with the base (and is no longer None)
            base_post
                .ok_or_else(|| Error::empty_optional("Failed to get the MutatedTransformPost"))?
                .post_exec(state, Some(base_corpus_idx))?;

@@ -134,6 +143,7 @@ where
if feedback.is_interesting(state, manager, &input, observers, &exit_kind)? {
// we found a reduced corpus entry! use the smaller base
base = input;
base_post = Some(post.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to avoid this clone() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function MutatedTransformPost.post_exec consumes self. For example, the post is a metadata, and its ownership is transferred to the metadata map in the post_exec. So I guess it can't be avoided?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could set a bool to true here and then do base_post conversion and post_exec below, after calling post.post_exec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we still need to clone? I think the post object is no longer valid after post_exec, because it is comsumed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh...
Well I guess in that case not much we can do..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i removed it

@tokatoka
Copy link
Member

@am009
I removed it
can you check if this is good?

@am009
Copy link
Contributor Author

am009 commented Mar 26, 2024

@am009 I removed it can you check if this is good?

One issue is that there may be an inconsistency between the post object and the updated base. the post is updated in every loop. When the base is updated, the loop goes on, and the post object no longer originates from the same try_transform_into call as the base.

I debugged the baby_fuzzer_minimizing. I found that the fuzzer.process_execution always returns None for the corpus_idx. Because the fuzzer's crash or coverage feedback is set to empty (). The tmin stage creates its own feedback (self.create_feedback) to check if the input still triggers the crash or has the same execution path. I got the feeling that We just don't need to call fuzzer.process_execution at all.

@tokatoka
Copy link
Member

            let (untransformed, post) = transformed.try_transform_into(state)?;

i'm updating post

@tokatoka
Copy link
Member

i don't understand why it run execute() then later calls process_execution() for the second time. it's doing the same thing twice

@am009
Copy link
Contributor Author

am009 commented Mar 27, 2024

            let (untransformed, post) = transformed.try_transform_into(state)?;

i'm updating post

Let me explains it further. For example, num is two (we run the mutation loop 2 times).

  • In the first loop, try_transform_into returns untransformed@1 and post@1,
    • the untransformed passed all checks and untransformed is assigned to base.
  • In the next loop, try_transform_into returns untransformed@2 and post@2,
    • this time the untransformed@2 input is not assigned to base.
    • post is updated to post@2.
  • loop reached exit condition, and the loop returned the variable post, which is of value post@2
  • finally, new testcase is constructed from untransformed@1, but post@2.post_exec(state, new_corpus_id) is called to update metadata for the new testcase. Then the incorrect metadata is added to the testcase.

But still, this is a helpful new approach to prevent the clone.

i don't understand why it run execute() then later calls process_execution() for the second time. it's doing the same thing twice

I think the process_execution came from the original mutational stage.

  • Fuzzer run fuzzer.execute_input to execute the input
  • Fuzzer run fuzzer.process_execution to use the feedback to check if it is a solution or an interesting input. it returns a corpus id if the input is interesting.

However, the tmin stage does not rely on the feedback of fuzzer, but it has its own feedback (self.create_feedback) to check if the minimized input is still the same.

@tokatoka
Copy link
Member

Then the incorrect metadata is added to the testcase.

Ok I see. Then it's fine with your code.

@tokatoka
Copy link
Member

I debugged the baby_fuzzer_minimizing. I found that the fuzzer.process_execution always returns None for the corpus_idx. Because the fuzzer's crash or coverage feedback is set to empty (). The tmin stage creates its own feedback (self.create_feedback) to check if the input still triggers the crash or has the same execution path. I got the feeling that We just don't need to call fuzzer.process_execution at all.

I thought this fuzzer.process_execution()'s purpose is to make sure that we don't actually increase the corpus. we process the execution then

                if state.corpus().count() == corpus_count
                    && state.solutions().count() == solution_count

we can compare this later.

Because the fuzzer's crash or coverage feedback is set to empty ()

https://github.com/AFLplusplus/LibAFL/blob/main/fuzzers/baby_fuzzer_minimizing/src/main.rs#L42
It's this one right?

@am009
Copy link
Contributor Author

am009 commented Mar 27, 2024

In the baby_fuzzer_minimizing example, there are multiple stages.

  • First, The code fuzz the harness to find at least crash.
  • Then it creates a new fuzzer for minimization, loads initial input from the crash folder, and sets the current corpus to 0, the first crash.
  • Finally, it launches a new minimization stage.

I think the feedback you referenced is for the first fuzzing step, not for the minimization stage.

I think it is here, it sets the objective and feedback to (). and the executor's observer is also set to ().

let mut fuzzer = StdFuzzer::new(scheduler, (), ());
// Create the executor for an in-process function with just one observer
let mut executor = InProcessExecutor::new(&mut harness, (), &mut fuzzer, &mut state, &mut mgr)?;

@tokatoka
Copy link
Member

ok i see.
i think it's fine to have that here.
for that specific example, yes process_execution is doing nothing, but maybe for others, it is necessary to evaluate it and then make sure there's no addition to the corpus

@tokatoka tokatoka merged commit c221108 into AFLplusplus:main Mar 27, 2024
54 checks passed
@tokatoka
Copy link
Member

Looks good thank you 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants