When the agent of opentelemetry is executed on jdk21, it will cause the CarrierThread to stop executing. #13811
Labels
bug
Something isn't working
needs author feedback
Waiting for additional feedback from the author
needs triage
New issue that requires triage
Describe the bug
Our program makes extensive use of virtual threads, and tomcat has also enabled virtual threads.
When the program runs on jdk21, we find that there are occasional instances of the entire service being unresponsive.
At the very beginning, I suspected that it was some dependent jar packages that caused the virtual thread to be "pinned" (such as synchronized, etc.).
However, upon checking the stack, it was found that it was not due to the "pin" issue:
None of the virtual threads is in the state of PARKED or PINNED;
On the contrary, a large number of virtual threads are in the STARTED state;
There are exactly a number of virtual threads in the RUNNABLE state with Runtime.getRuntime().availableProcessors().
Further, I examined the stack of CarrierThread, and the stack of each CarrierThread stays here:
Please note that the last instruction execution, there is at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:221).
In jdk21, LockSupport.java:221 is a "park" for non-virtual threads:https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/util/concurrent/locks/LockSupport.java
Therefore, I suspect that other code was invaded during the scheduling process of the virtual thread task. So the relevant classes in the jvm were printed for analysis.
Through the print in the JVM java.util.concurrent.ScheduledThreadPoolExecutor, I found that the proxy in the schedule virtual thread task,will perform io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask.
However, because virtual thread execute java.lang.VirtualThread.parkNanos. In jdk21, this will be "Sets the current thread to the current carrier thread."
And coincidentally, when the next virtual thread task is submitted in, the attachContextToTask method of the proxy should be executed first. After the attach, when performing the "AbstractWeakConcurrentMap. ExpungeStaleEntries", may be in the "java.lang.ref.ReferenceQueue.poll", Run "LockSupport.park(Ljava/lang/Object;)" again.
Please note that at this time,"Thread.currentCarrierThread() "returns the CarrierThread itself.
So the CarrierThread have entered "jdk.internal.misc.Unsafe.park(ZJ)V".
If all CarrierThreads execute up to here simultaneously, and the wake-up of park requires the execution of logic in other virtual threads. Then, all the virtual threads will freeze.
Steps to reproduce
The following are the code and operation steps for problem reproduction:
AgentTest class:
ExecutorAgent:
MANIFEST.MF:
pom:
operation
mvn clean package
and
java -javaagent:jvm-agent-demo-1.0-SNAPSHOT-jar-with-dependencies.jar -jar jvm-agent-demo-1.0-SNAPSHOT-jar-with-dependencies.jar
output
Expected behavior
All virtual thread tasks on jdk21 are running normally
Actual behavior
All the virtual threads suddenly stopped executing.
This is the stack information of CarrierThread:
Javaagent or library instrumentation version
opentelemetry-api: 1.38.0; javaagent: Implementation-Version: 2.13.0
Environment
JDK:
OpenJDK 64-Bit Server VM (21.0.4+7-LTS mixed mode, sharing)
OS:
Ubuntu 24.04 LTS
Additional context
No response
The text was updated successfully, but these errors were encountered: