Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing synchronization in SemaphoreStep #235

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

jglick
Copy link
Member

@jglick jglick commented Sep 11, 2023

#188 does not seem to have been complete; various HashMap-valued fields in State were accessed without consistent locking.

Motivated by frequent flakes in SCMRetrieverTest.selfTestLibraries (from pipeline-groovy-lib) in PCT. The test would time out waiting for one of the branch builds to complete; log messages Blocking… indicated that the step had been started on the right string, but then later the log would show Planning to unblock… rather than Unblocking…, as if contexts did not contain the string which had been added. I was able to reproduce such a failure locally after updating the core & BOM to newest 2.414.x and running the test about ten times in a row. My hypothesis is that unsynchronized access corrupted the HashMap.

@jglick jglick requested a review from a team as a code owner September 11, 2023 15:02
static synchronized State get() {
File home = Jenkins.get().getRootDir();
State state = states.get(home);
if (state == null) {
LOGGER.info(() -> "Initializing state in " + home);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some extra logging here and in stop in case the fault lied there, but it does not seem to have.

final Map<String,Object> returnValues = new HashMap<String,Object>();
final Map<String,Throwable> errors = new HashMap<String,Throwable>();
final Set<String> started = new HashSet<String>();
final Map<String,String> contexts = new HashMap<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just code cleanup

@@ -96,47 +101,61 @@ private String k() {
return id + "/" + number;
}

/** Marks the step as having successfully completed; or, if not yet started, makes it do so synchronously when started. */
/**@deprecated use {@link #success(String, Object)} */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there are any tests actually obtaining this object directly. The normal usage involves solely static calls.

synchronized (s) {
if (!s.contexts.containsKey(k)) {
System.err.println("Planning to unblock " + k + " as success");
LOGGER.info(() -> "Planning to unblock " + k + " as success");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Logger is better style and gives us timestamps too.

s.returnValues.put(k, returnValue);
return;
}
c = getContext(s, k);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible point where synchronization was lacking.

@@ -147,29 +166,50 @@ public static void waitForStart(@NonNull String k, @CheckForNull Run<?,?> b) thr
if (b != null && !b.isBuilding()) {
throw new AssertionError(JenkinsRule.getLog(b));
}
s.wait(1000);
s.wait(100);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might shave off a fraction of a second from tests to check this more promptly.

*/
public final class SemaphoreStep extends AbstractStepImpl implements Serializable {
public final class SemaphoreStep extends Step implements Serializable {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up deprecations.

Object returnValue = null;
Throwable error = null;
boolean success = false, failure = false, sync = true;
synchronized (s) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point where synchronization was inadequate. Rather cumbersome idiom to avoid holding the lock while making Pipeline API calls, to avoid possible deadlocks.

jglick added a commit to jglick/bom that referenced this pull request Sep 12, 2023
@jglick jglick disabled auto-merge September 12, 2023 16:42
@jglick jglick merged commit d85873b into jenkinsci:master Sep 12, 2023
13 checks passed
@jglick jglick deleted the SemaphoreStep branch September 12, 2023 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants