Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export reproducers to gcp bucket #5374

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tarasmadan
Copy link
Collaborator

No description provided.

@tarasmadan tarasmadan force-pushed the syz_reprolist_by_namespace branch 13 times, most recently from 9b4cac1 to 1f6a60b Compare October 9, 2024 08:35
@tarasmadan tarasmadan changed the title [wip] export reproducers export reproducers to gcp bucket Oct 9, 2024
@tarasmadan tarasmadan marked this pull request as ready for review October 9, 2024 08:35
@tarasmadan tarasmadan force-pushed the syz_reprolist_by_namespace branch 2 times, most recently from 727b2af to e82b78f Compare October 9, 2024 08:41
"git checkout syz_reprolist_by_namespace\n" +
"export CI=1\n" +
"./tools/syz-env \"" +
"go run ./tools/syz-reprolist/... -namespace upstream; " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure relying on syz-reprolist is a good idea.
syz-reprolist needs to know about changes in syzkaller code base:
https://github.com/google/syzkaller/blob/master/tools/syz-reprolist/reprolist.go#L172-L178
Nobody updated that in the past 5 years (nor it's tested).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to switch syz-reprolist from dashapi to jsonapi and once switched, delete the dashapi part.
Do you see the syz-reprolist future differently?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought syz-reprolist works the way it worked before.
I am missing larger picture: what happens with the exported archive later? how do we export C repros where they are missing, if at all? What namespaces do we export from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with the exported archive later?

We'll need something like the build.sh and run.sh from https://github.com/dvyukov/syzkaller-repros.
Git is the alternative way to store these reproducers.
I decided to start from the GCS path because it costs 1 line now and allows me to experiment with the next steps.
P.S. Thanks for asking George about details.

how do we export C repros where they are missing, if at all?

I don't understand the question. Could you please dive here a little?

What namespaces do we export from?

I wanted to export the upstream reproducers first.

"git checkout syz_reprolist_by_namespace\n" +
"export CI=1\n" +
"./tools/syz-env \"" +
"go run ./tools/syz-reprolist/... -namespace upstream; " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"upstream" shouldn't be hardcoded here. It does not necessary exist, and it's not the only namespace we want to export from.
You configured the bucket in the config, but "upstream" is also part of config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted. I'll add one more option.

Copy link
Collaborator Author

@tarasmadan tarasmadan Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's not the only namespace we want to export from

I was sure it is the only for now.

func exportReproScript(archivePath string) string {
script := "\n" +
// "git clone --depth 1 --branch master --single-branch https://github.com/google/syzkaller\n" +
"git clone https://github.com/tarasmadan/syzkaller\n" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syz-reprolist needs a persistent output directory for caching. If I remember correctly, it takes infinity to run with empty dir. How long does it take for you? It's supposed to incrementally cache the results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some tool to download reproducers already exported by jsonAPI.
To not add some new names, I want to reuse the syz-reprolist. This PR adds all the needed functionality.
If we don't have other use-cases for the syz-reprolist I propose to delete the old code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this line for testing. It actually works.
Let's me just fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

func main() {
flag.Parse()
if *flagNamespace != "" {
if err := exportNamespace(); err != nil {
log.Printf("error: %s", err.Error())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this error be noticed by something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it for two use-cases:

  1. Manual "go run ./tools/syz-reprolist -namespace upstream" call.
  2. Batch job problems analysis. You can access the batch log from gcp dashboard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch job problems analysis. You can access the batch log from gcp dashboard.

This won't be noticed, right? Is it possible to have something noticeable for batch jobs (e.g. alert)?

Copy link
Collaborator Author

@tarasmadan tarasmadan Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. It is a good question.
What do you think about the higher-level monitoring here? For example we can alert if some reproducers-upstream.tar.gz was not updated for a month. In case of alert I'll open the batch jobs page and visually check what went wrong. There logs will help then.

Focusing on the lower level logs we'll see for example the Spot machines preemption (they are also errors for the gcp logs processor).

This PR exports the latest reproducer for every bug.
Reproducers are exported to the "bug_id/repro_id.c" files.
This approach allows to add some metadata files or export more reproducers/bug later.
All the files are then archived and uploaded to the preconfigured location.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants