Fix issue with big logs, causing the browser to freeze (JENKINS-56377) #164

zhekoff316 · 2020-05-26T08:17:27Z

We've had this issue for a couple of months, where big logs would freeze the browser and the whole thing would load for ~2-3 minutes. I've addressed this issue and optimized the code in several places:

Moved "hide/show" link generation functionality in the back-end
Removed an unnecessary loop, which would loop over every element, while it was looping through each element. This was the main cause of the performance issues.
I've added the setTimeout() function, so it prevents browser freezing in especially big logs
I've also added some documentation, so it will be easier in the future for everyone to work on this script and improve it

I've tested it on a pipeline, which has 5-10 thousand steps, and my load time has been in the span of 2-5 seconds, compared to 90-150s before (including a browser crash/freeze). Also, this has been tested on Chrome and Mozilla.

zhekoff316 · 2020-05-26T09:15:18Z

I feel like there is an issue with the build process, as a higher version of Jenkins is required to run the tests (the POM states 2.176.4 as the minimum Jenkins version, while the tests ran at 2.164.1). I've ran the tests locally and they all pass,

dwnusbaum · 2020-05-26T13:18:32Z

Thanks for the PR!

I feel like there is an issue with the build process

Yeah there is, I just filed #165 to hopefully fix the issue.

jglick

Sounds reasonable. Have not reviewed in detail.

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

jglick · 2020-05-26T17:50:00Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

-        if (id == null) {
-            break
+
+        // The CSS rule for branch names only needs to be added once per node, so we


BTW https://github.com/jenkinsci/workflow-job-plugin/pull/164/files?diff=unified&w=1 improves reviewability due to indentation changes.

jglick · 2020-05-26T17:50:52Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

-    var startId = e.getAttribute('startId')
-    if (startId == null || startId == nodeId) {
-        e.innerHTML = e.innerHTML.replace(/.+/, '$&<span class="pipeline-show-hide"> (<a href="#" onclick="showHidePipelineSection(this); return false">hide</a>)</span>')
-        // TODO automatically hide second and subsequent branches: namely, in case a node has the same parent as an earlier one


I think this is already filed in JIRA somewhere.

I removed this part, because it would probably make more sense to generate this inside the back-end.

EDIT: upon further inspection, I will leave it as it was before due to its performance impact being way smaller than I thought (almost not noticeable).

I should have clarified that my review comment was in reference to the removal of the one-line code comment, which is properly an RFE that should be tracked in Jira, not to behavioral changes in your patch.

…move redundant documentation

dwnusbaum

@zhekoff316 Thanks for the PR! I am requesting changes to fix the issue I noted with parallel branch labels, but I would really like to see this merged once that is fixed!

I've tested it on a pipeline, which has 5-10 thousand steps

Is that a real Pipeline, or just one you created for testing?

dwnusbaum · 2020-05-28T19:49:54Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

-    var nodes = $$('.pipeline-new-node')
-    var enclosings = new Map() // id → enclosingId
-    var labels = new Map() // id → label
-    for (var i = 0; i < nodes.length; i++) {
-        var node = nodes[i]
-        var oid = node.getAttribute('nodeId')
-        enclosings.set(oid, node.getAttribute('enclosingId'))
-        labels.set(oid, node.getAttribute('label'))
-    }


Removing this loop causes branch names to no longer be shown when log output changes from one branch to another in many cases.

The problem is that in the while loop right after this, we need to walk through all of this node's parents to find what parallel branch it is a part of. Because the loop is gone, now we only walk through the immediate parents of each node, so if there is any nesting inside of the parallel branch, for example because of a block-scope step, we never find the enclosing parallel branch, and so the while loop exits before finding a label.

The old code is not optimal because we create a fresh enclosings and labels map for every node even though most of the content will be the same. I would try to come up with a good way to store those as page-level state so that you can keep the approach you have now of only updating enclosings and labels once per node, and then split the while loop below out somehow and only run it after the enclosings and labels have data for every node (not sure of a good way to do that).

Here is a Pipeline you can run to compare the behavior with and without this PR to see what I mean:

stage('Outer') { parallel(branchOne: { stage('Stage One') { node() { sh 'echo "from one"' sleep(time: 1) echo 'from one again' } } }, branchTwo: { stage('Stage Two') { node() { sh 'echo "from two"' sleep(time: 1) echo 'from two again' } } }) }

I see. I will try to find a solution to this problem.

dwnusbaum · 2020-05-28T19:51:30Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

+                break
+            }
+        }
+    }, 5000)


5 seconds is a long time to wait. Is 1 second (or maybe even less) not good enough just to get it out of the critical path for the main page load?

zhekoff316 · 2020-06-10T10:40:16Z

Hello again, thanks for the feedback.

First of all, sorry about the delay.

I have done some big changes in the latest commit. They resolve the issue with parallel pipelines (branch names are shown), and also greatly improve the performance with big logs.

I have removed the usage of the Behaviour.js library, as we can use basic JavaScript to accomplish the same results with greater performance. Also, this library is deprecated.
I have split most of the logic into separate functions. In my opinion, this enhances readability and maintainability.
More technically speaking, the problematic for loop, which generates the enclosings now only executes once, and if the pipeline is parallel it will go through another function, which will append the branch names. I feel like this is the most optimal way to do it while preserving all current functionality.

dwnusbaum · 2020-06-16T18:08:28Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

-        }
+var isParallel = false
+var nodes = $$('.pipeline-new-node')
+var pipelineElements = document.getElementsByClassName("pipeline-new-node")


Is this used anywhere?

Correct, those two variables are left-over after I made some changes to the code. Will remove them.

dwnusbaum · 2020-06-16T18:12:32Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

+    isParallelBuild($$('.pipeline-new-node'));
+    generatePipelineLinks($$('.pipeline-new-node'));


Doing this only once on page load seems like it will have issues if someone opens the log for an in-progress build, leaves it open while the build continues, and then comes back and looks at the logs. For example, there may not be a parallel step when the page is opened initially, but one may execute later on. I think that in-progress builds are supported today by Behaviour.specify and the old code, but I haven't specifically tested it.

I will test this use case with the current code and will let you know. It is also a variant to remove the parallel check all-together - the performance impact will be minimal, as the main performance issue was fixed.

oleg-nenashev · 2020-06-23T07:36:54Z

Added it to my review queue. Thanks to @zhekoff316 for this pull request, it is a very important improvement!

dwnusbaum

I took a quick look, and the logic seems a little off; I added some comments. Have you checked to see whether we can count on Behaviour.specify processing the nodes in any particular order? That could help us figure out a consistent way to build up the enclosings and labels maps without having to recreate them for every node.

dwnusbaum · 2020-08-10T21:20:48Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

    var enclosings = new Map() // id → enclosingId
    var labels = new Map() // id → label


Did you mean to create a local version of enclosings and labels here, or were you wanting to reuse and modify the global variables and use them as a kind of cache or something?

The way things are now, it doesn't look like generateEnclosing (no s) actually does anything because the global variables aren't used.

dwnusbaum · 2020-08-10T21:25:25Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

+var enclosings = new Map() // id → enclosingId
+var labels = new Map() // id → label


I'm not sure if scripts like this are encapsulated by Jenkins beyond standard JavaScript behavior, but having global vars like this seems like a bad idea. I would probably find the <pre> element with class console-output and attach these objects to that element instead, but I'm not a JavaScript expert, so I'm not sure what the preferred approach is for this kind of thing.

I think it'd be easier to simply wrap the whole script in an IIFE

(function() { // ... })();

This way those variables are not in the global scope but easily accessible from anywhere in this script.

dwnusbaum · 2020-08-10T21:28:27Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

        enclosings.set(oid, node.getAttribute('enclosingId'))
        labels.set(oid, node.getAttribute('label'))
+        appendBranchNames(oid, enclosings, labels)


It looks like this is the only place where appendBranchNames is called, which seems wrong, since this method is only called for the parallel step itself. What if we start executing the first branch in a parallel step, and then that branch stops and we start executing the other branch? The first branch might get labelled correctly, but I think that the second branch wouldn't be labeled (unless later on in the Pipeline there was another parallel step).

ngg · 2020-09-07T11:04:21Z

Hi! Is there any update regarding this PR? We hit this bug as well :(

bitwiseman · 2020-09-09T03:06:18Z

@ngg
We're waiting for updates from @zhekoff316, I believe.

zhekoff316 · 2020-09-09T06:12:05Z

I've been on a vacation, but I will take a look at the latest review comments as soon as I can, and will let you know if I can fix it.

zhekoff316 · 2021-10-22T11:53:50Z

Hello :) , sorry for taking so much time to come back to this.

I've noticed that @jglick has made a change (#112) that is mentioned in this PR. I haven't looked it in detail yet, but I can confirm that the problem does not appear anymore on big console logs on our Jenkins server (no browser crashes or freezes, it takes much faster for the logs to load).

Do you still have this problem on your side, as it seems the main issues have been fixed?

ngg · 2021-10-22T13:44:33Z

That change was made 3 years ago, I don't see how that can be relevant. The issue is still not fixed, it's still horribly slow to open the console log. An 1mb log file takes several minutes to load and the browser stays unresponsive until then.

zhekoff316 · 2021-10-22T13:52:01Z

That change was made 3 years ago, I don't see how that can be relevant. The issue is still not fixed, it's still horribly slow to open the console log. An 1mb log file takes several minutes to load and the browser stays unresponsive until then.

My bad on this one. I thought it was a newer change, as I can no longer re-produce the problem on our Jenkins server and I am talking about a 7mb log, which loads for 5-6 seconds maximum. The 1mb logs are loading almost instantly. The hardware or general configuration has not been changed.

Edit: I found some of the older jobs (they have a lot of steps) I used for testing this issue. I can still re-produce it, so I will start work on this again.

zhekoff316 · 2021-11-09T10:57:24Z

Hello,

I've tried to address the latest comments from before I stopped working on this PR.

(1) The global variables were indeed useless there, so I removed them and replaced them with variables in the function scope
(2) Regarding your comments @dwnusbaum about the appendBranchNames:

It looks like this is the only place where appendBranchNames is called, which seems wrong, since this method is only called for the parallel step itself. What if we start executing the first branch in a parallel step, and then that branch stops and we start executing the other branch? The first branch might get labelled correctly, but I think that the second branch wouldn't be labeled (unless later on in the Pipeline there was another parallel step).

Could you please give me an example pipeline where I can test this for example, because I can not reproduce anything wrong currently with the parallel builds. This function is only called if a parallel stage exists in the whole pipeline, and will go through all elements (from what I can see). Maybe I am doing something wrong, so it would be very helpful if you can provide me a very short example :) If there is something wrong there, I will fix it.

Overall, I tested again with a job with a lot of Pipeline Steps and the log would open very fast. Parallel builds with multiple stages also looked okay, and the show/hide buttons are working. I am ready to provide examples if you need them.

Will be happy if you can review this, as it's quite an important improvement.

halkeye · 2022-02-03T19:56:04Z

src/main/resources/org/jenkinsci/plugins/workflow/job/console/NewNodeConsoleNote/script.js

    var enclosings = new Map() // id → enclosingId
    var labels = new Map() // id → label
+    var nodes = $$(".pipeline-new-node")


in general, document.querySelectorAll will be much faster than $$ as one is a native browser API, and the other would be a prototypes scanner. its not a drop in replacement, but it doesn't seem like there's any prototype specific apis being used for this, so should work.

likely obsolete

jessejoe · 2022-02-03T20:38:21Z

I've also been hitting this issue as part of testing upgrading our old Jenkins server (v2.150.1) to current LTS, but this issue makes /console and /consoleFull pretty much unusable for us (I've had to use workflow-job-plugin v.2.36 or earlier so far). I've documented a lot of my troubleshooting in https://issues.jenkins.io/browse/JENKINS-62241.

I just tested the changes in this PR with a build log from an example pipeline in the ticket I mentioned above which is simply generating a log from parallel branches:

def genText(lines){
    (1..lines).each{
       println "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur interdum fringilla interdum"
    }
    return true
}
parallel firstBranch: {
    genText(20000)
}, secondBranch: {
    genText(20000)
}

I wanted to give some benchmarks of trying to load the different consoles for the the result of a build of the above pipeline, which winds up being a ~16MB log file:

Jenkins Version	`workflow-job-plugin` Version	`/console`	`/consoleFull`	`/consoleText`
`2.150.1` (our existing server)	`2.28`	<1 second	5-7 seconds	<1 second
`2.319.2` (local container)	`2.36`	<1 second	Never (5 min timeout) - browser slowdown warning	<1 second
`2.319.2` (local container)	`1145.v7f2433caa07f`	Never (5 min timeout) - browser slowdown warning	Never (5 min timeout) - browser slowdown warning	<1 second
`2.319.2` (local container)	`zhekoff316:master:43c58`	5-7 seconds	65-75 seconds	<1 second

The changes in this PR are definitely an improvement over the more current versions, but still not as efficient as our old Jenkins server 2.150.1 with workflow-job-plugin 2.28. Definitely at least make it usable though!

dwnusbaum · 2022-02-03T21:10:25Z

The difference between 1145.v7f2433caa07f and 2.36 is that prior to #150 (landed in 2.37), the relevant Javascript essentially never executes on /console for large log files due to the bug described in that PR, only on /consoleFull (this is why they have the same behavior on /consoleFull). The difference between 2.36 and 2.28 is that that #111 (landed in 2.31) added Javascript to try to show the names of parallel branches in the console UI, and this Javascript is slow because it traverses the Pipeline flow graph for each flow node that produced log output to figure out what branch each node belongs to (IIRC this PR (#164) tries to improve the performance by caching the intermediate steps of the traversal). Prior to #27 (landed in 2.26) the branch names were written directly into the logs, so there was no Javascript needed to show this data at all.

So in 2.28, there is no Javascript that tries to show branch names in the log, which is why things are fast.

jglick · 2022-02-03T21:23:00Z

Possibly we could just detect a large log file and disable the JS (compared to consoleText you would still see rendered console notes)? Probably needs bigger redesigns to make the display usable on a large build, especially when there are parallel branches involved.

jessejoe · 2022-02-03T21:34:35Z

After spending the week discussing this in the Jenkins Gitter, it seems like most large log files do not hit this issue (or I suspect there'd be a lot more complaints). The magic ingredient seems to be the multiple parallel branches logging (could detect that instead?). IMO the biggest issue is that even /console is affected, which really doesn't even show a lot of the log, and it is linked to from the job build history and build executor status. As a user, I think /console should load as fast as possible since its such a common endpoint and should not take multiple seconds to load a little bit of log.

wblair-ebsco · 2022-09-27T18:41:15Z

I hit this problem with large console logs quite often with version 2.293. Haven't complained but that does not means it's not a problem.
Behaves like a memory leak because I have to restart the controller to display large console logs.

Fix issue with big logs, causing the browser to freeze (JENKINS-56377)

f84c16b

dwnusbaum self-requested a review May 26, 2020 13:18

jglick reviewed May 26, 2020

View reviewed changes

Gergan Zhekov added 2 commits May 27, 2020 11:09

Add link generation back to front-end, move setTimeout() function, re…

f785961

…move redundant documentation

Remove TODO comment

10a58b7

dwnusbaum previously requested changes May 28, 2020

View reviewed changes

Re-factor JS, fix issues

26cc512

zhekoff316 requested review from dwnusbaum and jglick June 10, 2020 12:55

dwnusbaum reviewed Jun 16, 2020

View reviewed changes

jglick removed their request for review June 17, 2020 20:41

Re-add Behaviour.js, re-factor JS

b0c2eb2

zhekoff316 requested a review from dwnusbaum June 19, 2020 12:27

oleg-nenashev self-requested a review June 23, 2020 07:35

dwnusbaum reviewed Aug 10, 2020

View reviewed changes

bitwiseman self-requested a review September 9, 2020 03:06

jglick mentioned this pull request Oct 1, 2020

Optimized version of doConsoleText to avoid a memory buffer #112

Merged

zhekoff316 and others added 3 commits November 5, 2021 16:28

Merge branch 'jenkinsci:master' into master

d6f12dd

Remove global variables

1cd870c

Merge branch 'master' into master

43c58ea

zhekoff316 requested a review from dwnusbaum November 9, 2021 10:53

zhekoff316 requested a review from jglick November 9, 2021 11:00

jglick mentioned this pull request Feb 3, 2022

[JENKINS-60862] Add hidden spans with node data to truncated logs so branch names can be displayed #150

Merged

halkeye reviewed Feb 3, 2022

View reviewed changes

jglick requested a review from car-roll February 3, 2022 19:57

zhekoff316 closed this by deleting the head repository Feb 24, 2024

		isParallelBuild($$('.pipeline-new-node'));
		generatePipelineLinks($$('.pipeline-new-node'));

		var enclosings = new Map() // id → enclosingId
		var labels = new Map() // id → label

Fix issue with big logs, causing the browser to freeze (JENKINS-56377) #164

Fix issue with big logs, causing the browser to freeze (JENKINS-56377) #164

Conversation

zhekoff316 commented May 26, 2020 • edited by dwnusbaum Loading

zhekoff316 commented May 26, 2020

dwnusbaum commented May 26, 2020

jglick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhekoff316 May 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhekoff316 commented Jun 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleg-nenashev commented Jun 23, 2020

dwnusbaum left a comment

Choose a reason for hiding this comment

dwnusbaum Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngg commented Sep 7, 2020

bitwiseman commented Sep 9, 2020

zhekoff316 commented Sep 9, 2020

zhekoff316 commented Oct 22, 2021

ngg commented Oct 22, 2021

zhekoff316 commented Oct 22, 2021 • edited Loading

zhekoff316 commented Nov 9, 2021

Choose a reason for hiding this comment

jessejoe commented Feb 3, 2022

dwnusbaum commented Feb 3, 2022

jglick commented Feb 3, 2022

jessejoe commented Feb 3, 2022 • edited Loading

wblair-ebsco commented Sep 27, 2022

zhekoff316 commented May 26, 2020 •

edited by dwnusbaum

Loading

zhekoff316 May 27, 2020 •

edited

Loading

dwnusbaum Aug 10, 2020 •

edited

Loading

zhekoff316 commented Oct 22, 2021 •

edited

Loading

jessejoe commented Feb 3, 2022 •

edited

Loading