Skip to content
This repository has been archived by the owner on Oct 14, 2024. It is now read-only.

Retrospective for January 2021 Releases #195

Closed
adamfarley opened this issue Jan 20, 2021 · 16 comments
Closed

Retrospective for January 2021 Releases #195

adamfarley opened this issue Jan 20, 2021 · 16 comments

Comments

@adamfarley
Copy link
Contributor

This will be a Slack call after the release, with at least a week of notice in the #Release Slack channel.

A bulleted list of all proposed agenda items in this issue (whether at the top or in a comment) will be created on the day of the call, with a list of actions added at the end.

@adamfarley
Copy link
Contributor Author

  • Nightly runs were not disabled during the release.
  • Seemed unclear which release builds were at which status. Should the heat-map table from the previous release become standard policy? Could it be automated?

@smlambert
Copy link

Slight correction from #195 (comment): Nightly test runs were not disabled a week prior to the release, but they were disabled before the release (adoptium/temurin-build#2392)

Seems like we'd benefit from following a checklist more closely (a proposed checklist #178), website banner added by attentive Andreas.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jan 20, 2021

Short list of potentially unresolved actions from previous retrospectives, where actions could be identified.
Each to be reviewed during this retrospective, and marked Ignored, Addressed (with link), or Carried over.
If all actions in a given retrospective issue have one of these responses, that issue will be closed.

October 2019 - Closed

  • Investigate & implement "concurrent" AdoptOpenJDK "build pipeline" support (Raised by: Andrew Leonard) - done
  • Grinder Slack notify support at AdoptOpenJDK (Raised by: Andrew Leonard) - ignored (Morgan investigated and found a serious blocker)
  • Review Adopt Nightly Build+platform and tests+platform scheduling for optimum execution (Raised by: Andrew Leonard) - done.

April 2020 - Open

  • Define what is a release (the Release guide is a good start, let's ensure every activity of component is captured in it, including creating installers, publishing docker images, writing release notes, etc). (Raised by: Shelley Lambert) - rolling task. Initial work done.
  • Ensure that there is an overarching workflow/pipeline for each of the items in a release definition. (Raised by: Shelley Lambert) - ask Shelley
  • Verify each step in the flow. Currently we have this in place for the 'build component'. We build, then we run testing, but need it for every other component (docker, installers, documentation, website, etc). (Raised by: Shelley Lambert) - ask Shelley

July 2020 - Open

  • aarch git mirror jobs still fail due to shenandoah tags coming in and need a manual tidy up each time (Raised by: Martijn Verberg) - ask George

October 2020 - Open

  • Raise issue to develop documentation for this problem: apt installers for 8u272 suffer a gap in update time which affects end users. (Volunteers: George & Stewart) - ask George
  • Create PR to effect this change to a template: Separate HotSpot/OpenJ9 release issues. (George Adams) - done
  • Host a call to discuss having a better visible release status like https://gist.github.com/aahlenst/bbb8ca9c87353e0c8928633961047340 (Volunteer: George Adams)
    (With all the different branches/release dates (think ARM on 8), it's super hard to track.) - ask George
  • Make Slack announcement regarding this: Build repo lockdown had some "leakage" which broke Solaris/SPARC (& others?) (Volunteer: Stewart Addison) - done
  • Raise build issue to discuss platform prioritization (e.g. run Windows/x64, Linux/x64, Macos/x64 pipelines first).
    Proposal to separate top-level build pipeline runs per major release into “important platforms” and “other platforms” (one top-level execution each). (Volunteer: Stewart Addison)
    Related: Establish criteria for inclusion / exclusion of various platform/version builds #186 and Assess test target execution time & define test schedule adoptium/aqa-tests#2037
    Carried over, raise issue.
  • Raise issue to discuss solution for various issues with the patch number in build and API Windows installer version numbers and sorting (re: 11.0.9.1+1). (Volunteer: Andrew Leonard) - done

Note: Any unresolved actions have been folded into the next retrospective for review. Link.

@adamfarley
Copy link
Contributor Author

adamfarley commented Jan 20, 2021

  • We need something to prevent retrospective issues piling up again.
    One solution: Create a retrospective template issue that includes the tickable steps below.

Step 1) Send the link to folks on the #Release Slack channel around the start of the new release.
Step 2) Copy actions from the previous retrospective into the new issue, while ignoring actions that have an issue link.
Step 3) Announce the retrospective Slack call's date + time at least one full week in advance, and send out meeting invites.
Step 4) On the day of the call, compile all of the agenda items into a single-comment, tick-box list.
Step 4) Host the slack call for the retrospective, including:

  • Iterating over the actions from the previous retrospective, annotating both the current and old issues with any issues that have been raised for them.
  • Iterate over the agenda tick-list, ensuring everything gets debated.
  • Create a list of actions at the end of the retrospective.

Step 5) Close the issue for the previous retrospective, ensuring that every action that has no issue link is either:

  • annotated with the action taken (if no issue is to be raised)
  • or linked to this retrospective issue.

Step 6) Raise a new retrospective issue for the next release.
Step 7) Set yourself a calendar reminder so that you remember to commence step 1 (in the new issue) just before the next release.

@karianna
Copy link
Member

JTHarness Test Output parameter may need to be lengthed:

...
[2021-01-20T12:59:50.376Z] Output overflow:
[2021-01-20T12:59:50.376Z] JT Harness has limited the test output to the text
[2021-01-20T12:59:50.377Z] at the beginning and the end, so that you can see how the
[2021-01-20T12:59:50.377Z] test began, and how it completed.
[2021-01-20T12:59:50.377Z] 
[2021-01-20T12:59:50.377Z] If you need to see more of the output from the test,
[2021-01-20T12:59:50.377Z] set the system property javatest.maxOutputSize to a higher
[2021-01-20T12:59:50.377Z] value. The current value is 100000
[2021-01-20T12:59:50.377Z] ...

@karianna
Copy link
Member

IBM RHEL 6 machine missing some pre-reqs for language tests.

[2021-01-18T14:16:48.058Z] Running test MBCS_Tests_pref_ja_JP_linux_0 ...
[2021-01-18T14:16:48.703Z] ===============================================
[2021-01-18T14:16:48.703Z] MBCS_Tests_pref_ja_JP_linux_0 Start Time: Mon Jan 18 08:16:48 2021 Epoch Time (ms): 1610979408049
[2021-01-18T14:16:48.703Z] Nothing to be done for setup.
[2021-01-18T14:16:48.703Z] variation: NoOptions
[2021-01-18T14:16:48.703Z] JVM_OPTIONS:  
[2021-01-18T14:16:48.703Z] { itercnt=1; \
[2021-01-18T14:16:48.703Z] mkdir -p "/home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../TKG/test_output_16109794004215/MBCS_Tests_pref_ja_JP_linux_0"; \
[2021-01-18T14:16:48.703Z] cd "/home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../TKG/test_output_16109794004215/MBCS_Tests_pref_ja_JP_linux_0"; \
[2021-01-18T14:16:48.703Z] LANG=ja_JP.UTF-8 bash /home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../../jvmtest/functional/MBCS_Tests/pref/test.sh; \
[2021-01-18T14:16:48.703Z] if [ $? -eq 0 ] ; then echo ""; echo "MBCS_Tests_pref_ja_JP_linux_0""_PASSED"; echo ""; cd /home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/..;  else echo ""; echo "MBCS_Tests_pref_ja_JP_linux_0""_FAILED"; echo ""; fi; } 2>&1 | tee -a "/home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../TKG/test_output_16109794004215/TestTargetResult";
[2021-01-18T14:16:48.703Z] Linux_ja_JP.UTF-8
[2021-01-18T14:16:50.029Z] Can't locate Test/Simple.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../../jvmtest/functional/MBCS_Tests/pref/tap_compare.pl line 16.
[2021-01-18T14:16:50.029Z] BEGIN failed--compilation aborted at /home/jenkins/workspace/Test_openjdk8_hs_special.functional_x86-64_linux/openjdk-tests/TKG/../../jvmtest/functional/MBCS_Tests/pref/tap_compare.pl line 16.
[2021-01-18T14:16:50.029Z] 
[2021-01-18T14:16:50.029Z] MBCS_Tests_pref_ja_JP_linux_0_FAILED
[2021-01-18T14:16:50.029Z] 
[2021-01-18T14:16:50.029Z] Nothing to be done for teardown.
[2021-01-18T14:16:50.029Z] MBCS_Tests_pref_ja_JP_linux_0 Finish Time: Mon Jan 18 08:16:49 2021 Epoch Time (ms): 1610979409627

@andrew-m-leonard
Copy link

Running out of disk space on a machine can have knock on job failure impacts, so one rogue job can disable a test machine taking out any other job that subsequently tries to run on it...

Ideally we need a way of never hitting 100% disk space used:

  • Async monitor that Aborts the active job when disk reaches 95% ?

@andrew-m-leonard
Copy link

andrew-m-leonard commented Jan 21, 2021

The release dry-run needs to be earlier than the weekend before, to give several days to fully triage all the testcase failures?
We can then have an "expected failure list" we simply cross check, and triage much quicker.

Basically in the week or 2 before a release I think we need to be on top of the testcase triaging, which in this case would have identified several machine setup issues (mac dumps, mac gptest, AIX dumps)

@aahlenst
Copy link
Contributor

Would be great if people could help collect topics for the release announcement. See AdoptOpenJDK/blog#561 for an example. I don't always realize that something is noteworthy and has impact on the end-users.

Apart from that, we should think about how we can introduce proper change logs in each project. I'm speaking about a file like https://github.com/square/okhttp/blob/master/CHANGELOG.md and not a commit history. For example, we lost the statement about the GCC 7.5 update in the release announcement because I was apparently the only person that could remember the change.

@sxa
Copy link
Member

sxa commented Jan 27, 2021

The release dry-run needs to be earlier than the weekend before, to give several days to fully triage all the testcase failures?

We're set up to do this weekly now aren't we? Therefore we should be triaging the release test suite on a weekly basis

@andrew-m-leonard
Copy link

We do triage the weekend tests on the Monday, yes

@adamfarley
Copy link
Contributor Author

adamfarley commented Feb 1, 2021

Agenda:

Previous Retrospective actions review

See this comment

Items for this retrospective

  • Proposal to prevent retrospective issues/actions pile-up in the future. See this comment - Action.
  • Seemed unclear which release builds were at which status. Should the heat-map table from the previous release become standard policy? Could it be automated? (Adam) - George is already on this.
  • Nightly test runs were not disabled a week prior to the release, but they were disabled before the release (Disable nightly testing during January 2021 release cycle adoptium/temurin-build#2392). Seems like we'd benefit from following a checklist more closely (a proposed checklist Sample Release Checklist to improve Release Automation #178), website banner added by attentive Andreas. (Shelley) - Discussion needs to happen on this, raise TSC issue. (Volunteer: Adam)
  • JTHarness Test Output parameter may need to be lengthed. (Martijn) - Raise issue to discuss/action. (Volunteer: Andrew)
  • IBM RHEL 6 machine missing some pre-reqs for language tests. (Martijn) - Recommend Martijn raise issue (Adam to prod)
  • Ideally we need a way of never hitting 100% disk space used. Async monitor that Aborts the active job when disk reaches 95%? (Andrew Leonard) - solved by reducing the core dump size on AIX.
  • Proposal to run the full suite of release tests much earlier/weekly. (Andrew/Stewart) - OpenJ9 (do this for milestone, inc weeklies), Hotspot (no call for this) - (Volunteer: Andrew to raise issue)
  • Would be great if people could help collect topics for the release announcement. Example January 2021 release announcement blog#561 (Andreas) - create an issue for documentation of this information. (Adam to speak with andreas about doing this)
  • Apart from that, we should think about how we can introduce proper change logs in each project. E.g. a file like https://github.com/square/okhttp/blob/master/CHANGELOG.md and not a commit history.
    For example, we lost the statement about the GCC 7.5 update in the release announcement because I was apparently the only person that could remember the change. (Andreas)

Actions as a result of this retrospective:

Adam Farley:

  • Create template for future Retrospective issues. (PR)
  • Raise TSC issue to discuss having a checklist for release actions. E.g. a checklist of checklists in a TSC template, or some manner of wiki. (Issue raised)
  • Speak with Martijn about him raising an issue for "IBM RHEL 6 machine missing some pre-reqs for language tests." (To be discussed here.)
  • Speak with Andreas about storing this info in an issue and encouraging folks to use it. (To be discussed here.)

Andrew Leonard:

  • Raise issue to discuss JTHarness Test Output parameter lengthening. (Issue raised)
  • Raise issue to cover actions needed for inc weekly tests when testing OpenJ9 milestones. (Issue raised)

@adamfarley
Copy link
Contributor Author

adamfarley commented Feb 1, 2021

Agenda for Retrospective: Part 2

Previous Retrospective actions review

See unaddressed actions here.

Actions as a result of this retrospective:

To be stored with the actions from part one (the comment above) to avoid confusion.

AOB

@andrew-m-leonard
Copy link

Raised JTHarness log limit issue: adoptium/aqa-tests#2226

@andrew-m-leonard
Copy link

Add the ReleaseType choice of "Weekly" to the build pipeline job: adoptium/temurin-build#2436

@adamfarley
Copy link
Contributor Author

Note: Any unresolved actions have been folded into the next retrospective for review. Link.

If any have been unintentionally missed, feel free to add them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants