-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable flaky filecoin test for now #419
Conversation
d106204
to
5a0031e
Compare
View stack outputs
|
5a0031e
to
346fd91
Compare
346fd91
to
20f60f7
Compare
20f60f7
to
346fd91
Compare
currently, seed.run fails 90%+ of the time because of the Filecoin test. until we have a more reliable way to test the filecoin flow, the PR disables it. the inserted comment suggests a few solutions -- either spinning up custom infra for w3filecoin pipeline on each pr to get more reliable results, or mocking out the pipeline
346fd91
to
13462fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It loathes me to do this. The staging environment is the closest thing we have to production and the last place we can test end to end interactions across the whole system.
The flakiness needs to be fixed, not ignored. Oftentimes it is the test code that is to blame but sometimes it's an actual bug.
(FWIW I did not write these tests and it frustrates me that not enough care and attention was made to ensure they are robust.)
If the staging environment itself has caused the tests to fail (changes to the system that are not under the control of this repo) then it needs to be fixed. If this is the case and we're disabling these tests because of it then we're effectively saying our staging environment is broken and we cannot use it and we are completely blind when deploying changes to know if the entire system works together.
I understand there is an inherent problem here that we cannot deploy the entire system from a single repo. Unfortunately it's an artefact of history (and also an artefact of deployment across multiple cloud providers to optimise for cost) that I'd like to change in the future.
I'm with @alanshaw's sentiments. I think if we have a test failure, that should be a blocker: either we confirm that the test is unnecessary, or we fix it. To do otherwise degrades our confidence in our continuous delivery, and the whole thing falls apart. That said, it doesn't need to block everyone. We can operate with slightly less confidence for a limited time. I recommend we merge this and escalate the card that resolves, so that one person is blocked on it while everyone else can move forward with slightly more caution. |
dd2d5e6
to
cc3e204
Compare
|
+1 I should have cross posted here but I did suggest this same resolution in Discord. |
@alanshaw Added to top of sprint as storacha/project-tracking#129. Good to merge this PR? |
**Before Merge:** * [ ] Set `HOSTED_ZONES` (plural) on `prod`. * [ ] Merge #419 (which this currently includes to get tests to pass). --- With `NEXT_PUBLIC_W3UP_SERVICE_URL=https://petra.up.storacha.network` in my local `console`: ![CleanShot 2024-08-19 at 17 33 15](https://github.com/user-attachments/assets/66aa54da-dbf6-47f1-9b1d-a151e9546b9f) Then… ![CleanShot 2024-08-19 at 17 34 12](https://github.com/user-attachments/assets/03141163-ec6f-447b-bdee-48e3d0ab8abc) And success! ![CleanShot 2024-08-19 at 17 33 55](https://github.com/user-attachments/assets/fe88b106-daa3-43dd-86ec-a71516bf1c0d) Or, failure. ![CleanShot 2024-08-19 at 17 34 19](https://github.com/user-attachments/assets/5a8dd394-0f90-429d-a3de-dc6ca9ea3819) Note that that error message is a dummy message in the test harness, not something real. Also, I see to have bumped something that made the Stripe section always appear. Previously, staging/prod was set not to show it by not providing the right env vars. I'm not sure why that changed, and whether we want it to. Closes storacha/project-tracking#119 --------- Co-authored-by: hannahhoward <hannah@hannahhoward.net>
Goals
currently, seed.run PRs fail 90%+ of the time because of the Filecoin test.
until we have a more reliable way to test the filecoin flow, the PR disables it.
the inserted comment suggests a few solutions -- either spinning up custom infra for w3filecoin pipeline on each pr to get more reliable results, or mocking out the pipeline
either way, this is the temporary fix -- I don't want to put in extra retries or anything -- we just need to find a better way to test, or investigate if there are in fact underlying issues, but in the meantime, other PRs need to get merged.