Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependencies on commons-math3 and ssj by implementing simple linear regression directly, replacing the build duration distribution chart with a histogram, and deleting the smoothed trend lines in the test history charts #639

Merged
merged 4 commits into from
Aug 21, 2024

Conversation

dwnusbaum
Copy link
Member

@dwnusbaum dwnusbaum commented Aug 20, 2024

This is a followup to #638 and #625 to further simplify the dependencies after I discussed it with some colleagues. This plugin is used pretty widely in the Jenkins ecosystem, so it seems worth trying to minimize its dependencies, especially to eliminate dependencies which are not being actively maintained.

This PR does 3 things (you can review commit by commit if you like):

  • It gets rid of the commons-math3 dependency by just directly implementing a simple linear regression algorithm. This is very straightforward since we are only using it for a trend line and don't care about any other data related to the regression.
  • It replaces the SmoothingCubicSpline-based build duration distribution chart with a histogram, which I think is a better representation of the data anyway.
  • It completely deletes the two SmoothingCubicSpline-based "Smooth of X" trend lines that are only shown in the test history widget if more than 200 builds' worth of test results are available. I did this so we could get rid of the ssj dependency completely. Here is my justification:
    • I do not think pluggable storage for test results is in widespread use, so most users set up build retention policies that will prevent them from getting anywhere close to 201 builds' worth of test results, so those users can never see these trend lines anyway
    • Even for users who do have at least 201 build's worth of test results, by default the page only shows 100 builds' worth test results, so only users who actively play around with the options on the page would ever see these trend lines
    • These trend lines were introduced for the first time in Test history refactoring and improvements #625, so this is not removing long-lived functionality
    • I could not find any simple replacement for SmoothingCubicSpline in an actively-maintained library, and I am not confident in being able to reimplement it from scratch correctly in a reasonable amount of time
    • ssj is not actively maintained and has some warning signs, for example its 3.3.2 release is not listed anywhere in publicly available Git commit history
    • I am totally ok with adding this functionality back to the plugin with any of these three approaches:
      • If someone can find a replacement for SmoothingCubicSpline in an actively-maintained library with minimal dependencies and that ideally is not pulling in a whole grab bag of unrelated functionality that we do not need
      • If someone is willing to implement an equivalent algorithm directly
      • If someone wants to make it possible for https://github.com/jenkinsci/license-maven-plugin to support manually-specified license information, so that we could copy/paste the implementation of SmoothingCubicSpline into this repository while still having license information be reported accurately in Jenkins on the /manage/about/ page

Here are screenshots with and without the PR:

With this PR:

Screen Shot 2024-08-20 at 1 46 41 PM

Without this PR:

Screen Shot 2024-08-20 at 1 45 38 PM

Testing done

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@dwnusbaum dwnusbaum requested a review from a team as a code owner August 20, 2024 18:15
durationSeries.put("symbolSize", "0");
durationSeries.put("sampling", "lttb");
durationSeries.put("type", "bar");
durationSeries.put("barWidth", "99%");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly less than 100% so you see a border between each bar.

durationSeries.put("symbolSize", "0");
durationSeries.put("sampling", "lttb");
durationSeries.put("type", "bar");
durationSeries.put("barWidth", "99%");
ArrayNode durationData = MAPPER.createArrayNode();
durationSeries.set("data", durationData);
ObjectNode durationStyle = MAPPER.createObjectNode();
durationSeries.set("itemStyle", durationStyle);
durationStyle.put("color", "--success-color"); // "rgba(50, 200, 50, 0.8)");
Copy link
Member Author

@dwnusbaum dwnusbaum Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This color is a bit ugly now that it is not just a simple line in case anyone has any recommendations. I think ideally this would be a stacked histogram chart with each build result shown in a different color.

src/main/java/hudson/tasks/junit/History.java Outdated Show resolved Hide resolved
int idx2 = Math.max(0, Math.min(idx, lrY.length - 1));
lrY[idx2]++;
}
for (int i = 0; i < lrY.length; ++i) {
lrX[i] = ((minDuration + (maxDuration - minDuration) / lrY.length * i) / scale * 100.0);
lrX[i] = minDuration + step * (i + 0.5);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset the X values so they are in the middle of the bucket. We are no longer scaling them into the range 0-1 because we are using them directly in our data series.

// Use float for smaller JSONs.
domainAxisLabels.add((float) (Math.round(mul * z * roundMul) / roundMul));
for (int i = 0; i < lrY.length; i++) {
double scaledX = mul * lrX[i];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer round our X values in the backend (z in the old code), because our axis type is now value instead of category, and so if we round here our bars will have weird gaps and not be contiguous.

domainAxisLabels.add((float) (Math.round(mul * z * roundMul) / roundMul));
for (int i = 0; i < lrY.length; i++) {
double scaledX = mul * lrX[i];
double y = lrY[i];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing fancy with the Y values anymore, this is just the frequency of build durations in this bucket. We can probably make lrY an int[] now.

Comment on lines +329 to +330
formatter: function(value) {
return Math.round(value * model.distribution.xAxis.roundingFactor) / model.distribution.xAxis.roundingFactor;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round the labels in the frontend.

data: model.distribution.domainAxisLabels,
min: model.distribution.xAxis.min,
max: model.distribution.xAxis.max,
minInterval: model.distribution.xAxis.interval,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I really wanted was to be able to tell echarts to pick an interval that is a multiple of model.distribution.xAxis.interval, and then let it automatically choose which multiple based on screen and label size, but I didn't see any way to do it.

If you set interval: model.distribution.xAxis.interval, then you get one grid line per bucket, which is nice, but the labels are unreadably close to each other and I couldn't find a way to fix that dynamically. You can set interval: model.distribution.xAxis.interval * 2 and things look fine on "normalish" laptop screen size, but again it doesn't scale at all if you shrink your screen size.

I also wanted to have dynamic minorTick support, e.g. if we let echarts pick the interval, could we at least set up the minor ticks to align with the buckets? Unfortunately I don't think so, you can only control how many minor ticks you want between each major tick, e.g. if you specify interval: model.distribution.xAxis.interval * 3, you could specify minorTick { show: true, splitNumber: 3 }, but again that doesn't adjust based on the data or screen size at all.

Copy link
Member

@basil basil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up.

@basil basil requested a review from timja August 20, 2024 20:06
Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I've tested with 500 builds each with roughly 8,000 test cases and performance was good. Results look similar to before except for in the distribution graph which has changed

@timja timja merged commit f2321cf into jenkinsci:master Aug 21, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants