Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase benchmark iters for Android jobs #6297

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Copy link

pytorch-bot bot commented Oct 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6297

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit df0db16 with merge base 5e44991 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 16, 2024
@guangy10
Copy link
Contributor Author

guangy10 commented Oct 16, 2024

@huydhn I scheduled 4 runs from different commits, and ensure all 4 runs finished successfully. However, the dashboard only shows 2 commits (expect 4).

@guangy10 guangy10 force-pushed the increase_benchmark_iter branch 3 times, most recently from b962c9b to 4855002 Compare October 16, 2024 18:35
@guangy10 guangy10 had a problem deploying to upload-benchmark-results October 16, 2024 18:58 — with GitHub Actions Failure
@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 16, 2024 20:33 — with GitHub Actions Inactive
@guangy10 guangy10 had a problem deploying to upload-benchmark-results October 16, 2024 21:13 — with GitHub Actions Failure
@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 16, 2024 21:31 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor

huydhn commented Oct 16, 2024

Hmm, I'm seeing 4 commits on your branch?

Screenshot 2024-10-16 at 16 08 31

Also, GitHub data from around 11AM to 4PM today was lost due to the HUD outage. I think I'm seeing the same thing on my test branch.

As the issue is mitigated now, you probably need to schedule new run (or just rerun those missing one I think)

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 16, 2024

Also, GitHub data from around 11AM to 4PM today was lost due to the HUD outage. I think I'm seeing the same thing on my test branch.

Expect 4 points from today. Maybe due to the HUD outage

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 16, 2024

@kirklandsign It's a surprise that the total time in Running state varies so much. Take dl3 for example, you can see in job_1 (link), it's in the Running state (execute 100 iterations) for only 3 mins, and in job_2 (link), it's in the Running state for 12 mins; in job_3 (link), it's in the Running state for 1 mins.

In the 12mins job (seems to be an outlier, maybe due to context switching during execution??), the data is not shown in the dashboard, it's execution latency is 881.29ms, far different from 752.77 and 712.32.

1-3 mins are expected given 752.77ms x 100 / 60000 = 1.25mins for execution and additional 1-2 mins overhead for setup&teardown.

@guangy10 guangy10 marked this pull request as ready for review October 16, 2024 23:33
@kirklandsign
Copy link
Contributor

In the 12mins job (seems to be an outlier, maybe due to context switching during execution??), the data is not shown in the dashboard, it's execution latency is 881.29ms, far different from 752.77 and 712.32.

Good point. I also see something wrong in the log

2024-10-16T19:08:35.7452548Z Starting: Intent { cmp=org.pytorch.minibench/.BenchmarkActivity (has extras) }
2024-10-16T19:08:35.7453249Z Status: timeout
2024-10-16T19:08:35.7453625Z LaunchState: UNKNOWN (-1)
2024-10-16T19:08:35.7454097Z Activity: org.pytorch.minibench/.BenchmarkActivity
2024-10-16T19:08:35.7454617Z WaitTime: 10129

@guangy10
Copy link
Contributor Author

guangy10 commented Oct 16, 2024

Good point. I also see something wrong in the log

2024-10-16T19:08:35.7452548Z Starting: Intent { cmp=org.pytorch.minibench/.BenchmarkActivity (has extras) }
2024-10-16T19:08:35.7453249Z Status: timeout
2024-10-16T19:08:35.7453625Z LaunchState: UNKNOWN (-1)
2024-10-16T19:08:35.7454097Z Activity: org.pytorch.minibench/.BenchmarkActivity
2024-10-16T19:08:35.7454617Z WaitTime: 10129

That's another issue. All jobs report that actually, for dl3 xnnpack.
What does that error mean? "Status: timeout" even if the job is running for 1 or 3 mins? If this is a state we should track in the benchmark result to exclude bad results, we should add it.

@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 17, 2024 00:19 — with GitHub Actions Inactive
@guangy10 guangy10 force-pushed the increase_benchmark_iter branch 3 times, most recently from 11a15cf to 529c161 Compare October 17, 2024 01:15
@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 17, 2024 02:33 — with GitHub Actions Inactive
@guangy10
Copy link
Contributor Author

W/ 1000 iters, still end up having many green|red spots. Since it's still taking less then 10mins, could experiment by bump up iters to 2k

@kirklandsign
Copy link
Contributor

kirklandsign commented Oct 17, 2024

That's another issue. All jobs report that actually, for dl3 xnnpack.

Sorry let me fix the app. Probably the run itself is good, but I ran it on UI thread, which we shouldn't. Let me use a background thread for it. Try #6320

@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 17, 2024 06:38 — with GitHub Actions Inactive
@@ -42,7 +42,7 @@ protected void onCreate(Bundle savedInstanceState) {
.findFirst()
.get();

int numIter = intent.getIntExtra("num_iter", 50);
int numIter = intent.getIntExtra("num_iter", 2000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly when I tried locally I see 100 is already stable :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants