-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetty 12.0.5: High CPU/Load after some time in production, no active thread with business code #11326
Comments
We are not aware of any CPU spinning loops in Jetty. The example thread you reported is trying to acquire a buffer as part of the standard processing of a request. Can you report standard, non-JSON, stack traces? If you can monitor your environment, can you use a tool like TTOP (https://github.com/aragozin/jvm-tools/blob/master/sjk-core/docs/TTOP.md) to check whether the high CPU is caused by 1 thread spinning, or many threads? Also JMC, if you can connect, will be able to report live whether it's just one thread consuming the CPU, or many threads (and you can get a stack trace from the most CPU consuming thread). |
You can use the built-in tooling with the JVM to get a reasonable (and useful) thread dump.
|
Thanks for your fast feedback. Yes |
What the thread dumps say over time is likely going to be more important than cpu timings |
it will take probably some days to get more details. we had this situation 4 times in the last 3 weeks and sometimes > 7 days between each incident. I will com back when I have the details in addition i checked also memory stuff and there is all fine, so its not for example GC related |
Here is a threaddump from jstack: Currently its occurring every 2-3 days when we not restart/redploy the nodes manually We will also try out 12.0.6 now And if provided data is not enough to get an idea we will check whether we can use the mentioned live monitoring tools |
@gbrehmer thanks for the thread dump. It shows that many Jetty threads are in The question for you is how do you (or Spring) configures the Jetty Otherwise, we can suggest settings that hopefully would work around the problem and reduce CPU usage. |
our setup is from our perspective quite simple. because we just running jetty embedded in a mostly default setup from spring boot 3.2.2 (https://github.com/spring-projects/spring-boot/blob/550651f88fdb69378ea650946c2507fa05cf34fa/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/web/embedded/jetty/JettyServletWebServerFactory.java#L207). So The only customizations we made in the spring settings (but not
We running this in k8s with
This sounds great! |
@gbrehmer I'm working on an experimental Jetty branch that may help you with your reported problem. Since we don't have too many details, it's hard to say if that branch will solve your issue, but it's certainly worth a shot if you can spare the time. It contains two improvements:
So if you could build this PR's branch, retry your test and collect a few server dumps while you test is running, that data would be invaluable to help us move forward. Thanks! |
The problem is that "my test" is our production system. I currently can not reproduce the issue, I only know that after some days the load quickly increases to 100%. From 1-5% as the median load to 100% in 10min, so I assume that the important stats are only available during this timeframe. We will discuss whether we can spent the effort to create a custom build and probably we need a special endpoint to trigger the jetty dump, because we normally have no JMX active. |
or what about exposing such stats as metrics? We already integrated jetty with micrometer by using |
Unfortunately, we do not have a micrometer integration. But if you could find a way to call Beware that this is an experimental branch. While it received some testing, if you're willing to try it in production, be prepared to rollback to the stable version if needed! |
so only a server dump from this branch would be helpful and the current one from 12.0.6 will not give you much more insights? Because added a "dump server" feature is probably easier to accomplish on our side |
Correct, the information reported by the dump in version 12.0.6 does not contain enough information to get an idea of the state of your pool. |
Same issue happened to us, after upgrading to Spring Boot 3.2.1 with Jetty 12.0.5. So far we're testing our solution with Spring Boot 3.1.8 and Jetty 11. So far seems to work fine. Steps to reproduce:
Expected behavior: server runs fine. Here is the stack trace I managed to get from our freezing Jetty container.
|
@ignusin your problem seems to be #11098 which was fixed in Jetty 12.0.6. Could you please try to reconfigure your environment to force Spring Boot to use Jetty 12.0.6 instead of 12.0.5? That should solve that NPE. @gbrehmer while I think your problem is slightly different, have you tried to force Spring Boot to use Jetty 12.0.6? Did that help? Thanks! |
we have 12.0.6 running since 8 days but with interruptions (restarts) caused by new feature deployments and previously it seems that such problems only occur when service is running several days w/o a reboot. So far we had no high cpu usage event anymore, only one unknown k8s oom restart but probably not related. I would say if we have one additional week w/o problems, than 12.0.6 fixes our problems |
so far no more problems with 12.0.6. Thanks! |
Jetty version(s)
12.0.5
Jetty Environment
ee10? core? (Spring Boot 3.2.2 Jetty embedded)
Java version/vendor
(use: java -version)
17.0.9 temurin
OS type/version
Linux
Description
I created a thread dump from spring boot actuator during the high load but probably next time i need a better one which also shows cpu times etc.
All relevant threads are not sitting in parts of our business code. So I assume that there is some infinite loop or something like that in rare cases in jetty. Can you see anything suspicious? Thanks for you support!
I checked also JVM memory stats: Heap was filled by 11% only
One example thread
Full threaddump:
jsonformatter.txt
How to reproduce?
sorry currently only occuring like once a week on one production server (pod in k8s, not always same one)
The text was updated successfully, but these errors were encountered: