Add caching, tracing, logging, and batching to the plugin #413

craigtmoore · 2024-06-11T00:00:13Z

Adds caching using Caffine cache, logging with java.util.logging, tracing using opentelemtry, and inserts the data into the database using batching. We have some builds on our Jenkins instance that have over 29,000 test results so we added these changes to optimize performance.

Fixes #412

Update DatabaseTestResultStorage:
- add caseResultsCache
- add packageResultsCache
- update publish() method to use batch updates to improve performance
- Add @WithSpan annotation to each method
- add logger
Update DatabaseTestResultStorageTest:
- add basic unit-test with a mocked database
- move duplicate code segments to constants and methods
- print tables with proper column widths (and truncate values that are too large)
Update pom.xml:
- add opentelemtry dependencies (for tracing)
- add caffine dependency (for caching)
- add dependency on kotlin-std-lib-jkd8 to fix dependency resolution issues
- tie the hpi goal to the compile phase
update docker-compose.yaml:
- add commented out 'mysql' database config
- add jaeger server
- add jenkins server with otel agent embedded
docker-compose-with-zipkin.yaml: same as docker-compose.yaml, but using zipkin instead of jaeger
Add Dockerfile: creates an instance of the Jenkins docker image that includes the opentelemetry agent jar file, used to generate theweatherman/jenkins:lts-jdk17-otel image.
Add setup_jenkins.sh: bash script for installing the plugin and configuring jenkins to store junit results in the databse (works with the docker-compose.yaml)

Testing done

I re-used the existing tests to verify that I've introduced no new regressions, but also addeed a new unit-test to the getCaseResults_mockDatabase() method and verify that it loads the data correctly.

Submitter checklist

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
[NA] Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue

timja

How extensively have you tested this? Have you deployed it to your instance with the large number of results?

My main concern is that the optimised database calls for only reading e.g the fail or skipped count are now being in Java so the full amount needs to be read into memory and filtered client side.

Its possible that most of the methods get called when this is used anyway so it may not impact much, or possibly there could either be more caches.

I assume you found a cache was needed because retrieval was slow? do you have any info on this?

The batch changes make full sense

Dockerfile

README.md

pom.xml

setup_jenkins.sh

README.md

src/test/java/io/jenkins/plugins/junit/storage/database/DatabaseTestResultStorageTest.java

craigtmoore · 2024-06-11T20:05:00Z

As far as testing goes, the unit-tests at our company are quite large, over 29,000 test results in a single build. We're looking into using a plugin like this, but the performance was quite poor when first tried it (which is why I looked at improving the performance). At first I deployed it using mysql instead of postgres because I could not connect the database to Jenkins.

Once I was able to deploy it using and connect it to a dockerized mysql database, I ran the following pipeline:

pipeline {
    agent any
    stages {
        stage('Extract test results') {
            steps {
                sh 'unzip -o /tmp/archive.zip -d build'
            }
        }
    }
    post {
        always {
            junit 'build/**/TEST-*.xml'
        }
    }
}

Basically, I downloaded the zip file with the large number of test results from our production Jenkins server and then
copied that to the Jenkins instance, using:

docker cp archive.zip junit-sql-storage-plugin-jenkins-1:/tmp/archive.zip

Then I ran the pipeline and it took a long time to store the test results (nearly 3 minutes to store the results using
a mysql database). So I added open telemetry to the plugin so I could see where the bottlenecks were. I decided to add
batching to the publish method to help reduce the time it takes to store the values. This certainly helped, but the
other problem was loading the test results took a really long time. I thought it might have been that the queries were
too slow, but it turned out that the junit plugin was calling the getAllPackageResults over 1500 times
(which can takes 100s of milliseconds) so I added caching which reduced the processing time down to 10s of microseconds for each method call. I decided to take it one step further and cache the value returned by the retrieveCaseResults() method
(which I renamed getCaseResults()), this also really helped because the case results query takes ~20ms to run and
that takes a long time when there are over 29000 test results. Also, that query was being called by many of the
meta-data methods, like:

getFailedTests
getSkippedTests
getPassedTests
By caching the caseResults, we avoided running the query repeatedly and reduced the processing time to 10s of
microseconds.

So to answer your comment, yes I did a lot of testing to verify that my changes improved the performance. The telemetry that I added was also very useful in figuring out the bottle necks.

timja

Getting close

README.md

deploy.sh

docker-compose.yaml

jenkins-config.yaml

craigtmoore · 2024-06-13T19:45:45Z

I think I've resolved all of your comments, please let me know if there is any thing else. I really appreciate all of the feedback. I'm going to do a bit more testing, especially with junit attachments plugin.

- Update DatabaseTestResultStorage: * add caseResultsCache * add packageResultsCache * update publish() method to use batch updates to improve performance * Add manual trace creation (annotation based tracing is only supported using an open telemetry agent) * add withSpan(), createSpan(), addSqlAttributes(), addPackageAttributes(), and getTracer() helper methods for creating manual traces to the code * add logger - Update DatabaseTestResultStorageTest: * add basic unit-test with a mocked database * move duplicate code segments to constants and methods * print tables with proper column widths (and truncate values that are too large) * use LoggerRule for configuring the log level of the DatabaseTestResultStorage's logger - Update pom.xml: * add dependency on io.jenkins.plugins:opentelemetry plugin * update dependencyManagements: fix kotlin and errorprone dependency versions cause build to fail due to multiple version being pulled in by transitive dependencies * add io.jenkins.plugins:caffine dependency (for caching) * add dependencyManagement for `kotlin-std-lib-jkd*` to fix dependency resolution issues - update docker-compose.yaml: * add jaeger server * add jenkins server: skip setup wizard and add otel exporter (to jaeger) - Add Dockerfile: * creates an instance of the Jenkins docker image (used as the 'jenkins' image in the docker-compose.yaml file) * install plugins using jenkins-plugin-cli: configuration-as-code, database-postgresql, and opentelemetry * install plugin junit-sql-plugin from the target folder and configure jenkins using jenkins-config.yaml file - Add deploy.sh: bash script for compiling the plugin, building the 'jenkins' docker image, and deploying the docker-compose.yaml file - Add jenkins-config.yaml: * add settings for postgresql plugin * configure the openTelemetry plugin * configure the junit to store results to database - Update README.md: * Add examples on how to build plugin * Add example on how to deploy docker compose swarm * Add example on how to install the compiled plugin * Add example on how to configure jenkins using configuration-as-code plugin * Add note on Accessing jaeger * Add note on how to access the db inside of docker

timja

This looks really good, I've tested it out, fixed up the deploy script to work with build and a couple of other minor things.

I've found one issue though with the open telemetry integration, I would expect the spans to be linked to the build but they aren't:

I'm wondering if the junit step needs a StepHandler?

https://github.com/jenkinsci/opentelemetry-plugin/blob/03559ec4adf9ed6843e7fb0635a5e4cc9f961fb2/src/main/java/io/jenkins/plugins/opentelemetry/job/step/DurableTaskHandler.java

see also:
jenkinsci/opentelemetry-plugin#850

I couldn't see any TRACEPARENT env values being set

cc @cyrille-leclerc or @kuisathaverat if you can provide any advice

timja · 2024-06-15T14:58:17Z

src/main/java/io/jenkins/plugins/junit/storage/database/DatabaseTestResultStorage.java

-                                statement.setNull(12, Types.VARCHAR);
+        @Override
+        public void publish(TestResult result, TaskListener listener) throws IOException {
+            var publishSpan = createSpan("DatabaseTestResultStorage.RemotePublisherImpl.publish");


As far as I can tell this only works for agents on the controller not remote agents

Setting -Dotel.java.global-autoconfigure.enabled=true on the agent seems to make it work

timja · 2024-06-15T15:26:51Z

I don't think its StepHandler I think if the variables are propagated through createRemotePublisher and this code is copied then it would work:
https://github.com/jenkinsci/opentelemetry-plugin/blob/82e3e1b2574a68864dcd05b43b95d0aeb7f41249/src/main/java/io/jenkins/plugins/opentelemetry/job/OtelEnvironmentContributorService.java#L41-L52

timja · 2024-06-15T22:15:04Z

Continued at #414

jonesbusy · 2024-06-16T09:35:52Z

Shoudn't the opentelemetry plugin be optional ? What about jenkins instance using SQL storage but not connected to an Opentelemetry collector ?

From what I remember few month ago is that having OpenTelemetry plugin installed will display some deak links on job/build page when the plugin is not configured

timja · 2024-06-16T09:46:07Z

Possibly, would be better to remove those dead links I think if the plugin isn't configured at all.

Plugins should be able to integrate without it adding a bunch of things to the UI

cyrille-leclerc · 2024-06-16T14:01:19Z

Thanks for your interest in the opentelemetry plugin.
I think we could make it optional for simpler integration of otel in the other Jenkins plugins, I have a few checks to do.
I'll get back to you ASAP

timja · 2024-06-16T14:06:39Z

just raised a PR jenkinsci/opentelemetry-plugin#868

kuisathaverat · 2024-06-16T19:53:35Z

I couldn't see any TRACEPARENT env values being set

you have to enable the export of the environment variables that define the TRACEPARENT, but I not sure how this plugin get the Opentelemetry context, it probably has to retrieve it from the job not front and environment variable. We store the context in actions in the job.

timja · 2024-06-16T20:21:04Z

I was able to get it in #414

cyrille-leclerc · 2024-06-17T14:00:54Z

@craigtmoore @timja the way you integrate OpenTelemetry is great. GlobalOpenTelemetry.get() is the way to go.
I'm looking at some improvements to this pattern GlobalOpenTelemetry.get(). I'll get back to you shortly.

I saw @timja' PR, Ill review it it asap, I have to understand the challenge you faced :-)

Don't add nudge to define visualisation url if plugin isn't configured opentelemetry-plugin#868

timja · 2024-06-17T15:23:39Z

Anything running on the controller I think would be quite straightforward, but running on agents is a bit more complicated, its not the nicest / cleanest implementation in #414

timja added the enhancement New feature or request label Jun 11, 2024

timja changed the title ~~#412 Add caching, tracing, logging, and batching to the plugin~~ Add caching, tracing, logging, and batching to the plugin Jun 11, 2024

timja reviewed Jun 11, 2024

View reviewed changes

timja reviewed Jun 12, 2024

View reviewed changes

craigtmoore and others added 3 commits June 14, 2024 12:45

Minor fixes

6f60ed8

New lines

a16a48b

timja reviewed Jun 15, 2024

View reviewed changes

timja merged commit e33b238 into jenkinsci:master Jun 15, 2024
14 checks passed

timja mentioned this pull request Jun 15, 2024

Link OpenTelemetry traces to parent span when publishing test results #414

Draft

6 tasks

timja mentioned this pull request Jun 16, 2024

Don't add nudge to define visualisation url if plugin isn't configured jenkinsci/opentelemetry-plugin#868

Closed

6 tasks

craigtmoore deleted the add-caching-tracing-logging-batching branch July 26, 2024 14:18

craigtmoore mentioned this pull request Sep 9, 2024

Test results do not load when N > 1000 tests #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caching, tracing, logging, and batching to the plugin #413

Add caching, tracing, logging, and batching to the plugin #413

craigtmoore commented Jun 11, 2024 •

edited by timja

Loading

timja left a comment

craigtmoore commented Jun 11, 2024

timja left a comment

craigtmoore commented Jun 13, 2024

timja left a comment

timja Jun 15, 2024

timja Jun 15, 2024 •

edited

Loading

timja commented Jun 15, 2024

timja commented Jun 15, 2024

jonesbusy commented Jun 16, 2024

timja commented Jun 16, 2024

cyrille-leclerc commented Jun 16, 2024

timja commented Jun 16, 2024

kuisathaverat commented Jun 16, 2024

timja commented Jun 16, 2024

cyrille-leclerc commented Jun 17, 2024

timja commented Jun 17, 2024

Add caching, tracing, logging, and batching to the plugin #413

Add caching, tracing, logging, and batching to the plugin #413

Conversation

craigtmoore commented Jun 11, 2024 • edited by timja Loading

Testing done

Submitter checklist

timja left a comment

Choose a reason for hiding this comment

craigtmoore commented Jun 11, 2024

timja left a comment

Choose a reason for hiding this comment

craigtmoore commented Jun 13, 2024

timja left a comment

Choose a reason for hiding this comment

timja Jun 15, 2024

Choose a reason for hiding this comment

timja Jun 15, 2024 • edited Loading

Choose a reason for hiding this comment

timja commented Jun 15, 2024

timja commented Jun 15, 2024

jonesbusy commented Jun 16, 2024

timja commented Jun 16, 2024

cyrille-leclerc commented Jun 16, 2024

timja commented Jun 16, 2024

kuisathaverat commented Jun 16, 2024

timja commented Jun 16, 2024

cyrille-leclerc commented Jun 17, 2024

timja commented Jun 17, 2024

craigtmoore commented Jun 11, 2024 •

edited by timja

Loading

timja Jun 15, 2024 •

edited

Loading