-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix query cancellation #2077
fix query cancellation #2077
Conversation
Thanks for trying this out @TalkingFoxMid. Have you seen the comments in #1922? One of the thing we want to do is to signal to the database that it should cancel the running query on the database side too. |
67e6830
to
e32557f
Compare
Thanks for the reply. Cancellation on the database side is already working in this solution. I verified it with the following test:
This can also be demonstrated in the QueryCancellationSuite by adding a join on the lock fiber before the final table read. I believe that in this case, the insert query (2) was canceled on the DB side. |
But I'll also try testing this with pg_sleep to confirm. Thanks! |
2d86feb
to
a492eb4
Compare
a492eb4
to
cc3c5a0
Compare
jdbcBlockedFiber <- asyncM.start(asyncM.blocking(f(a)).attempt) | ||
a <- jdbcBlockedFiber.join.flatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some concerns regarding this code. They may or may not be valid though, but let me pronounce them:
Sync#blocking
is non-cancellable by design. Therefore the execution pipeline can be cancelled either before enteringblocking
or right after it completes, but not inside. Therefore it can only cancel the fiber that runs the query, but not the operation itself.- There can be a potential fiber leak with this code: if a cancel signal comes when the fiber is created, but before it gets joined, then here it goes (btw, I believe this is one of the reasons why
background
is almost always preferable over low-level fiber start/join manipulations). - Firing a fiber and then joining it immediately after looks quite suspicious. What is it attempted to accomplish there exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore it can only cancel the fiber that runs the query, but not the operation itself.
Yes, it is. But cancellation handler that sends the cancel query to the DB is on the fiber that can be cancelled now. So, the blocked fiber will also be canceled from the JDBC side (the database stops the JDBC thread).
There can be a potential fiber leak with this code
I think there is no fiber leak, the asyncM.blocking fiber will continue running until the JDBC thread finishes dealing with the query. When the JDBC query is completed and the thread is unblocked, that fiber will be cleaned up. Currently, Doobie works in exactly the same way.
About background:
There is a problem with this approach when it comes to background tasks. The cancellation of a fiber also tries to cancel the jdbcBlockedFiber, but the issue is that the cancellation of jdbcBlockedFiber is blocking and doesn't work. So, using background tasks doesn't solve the problem
What is it attempted to accomplish there exactly?
You cannot cancel jdbcBlockedFiber because it is running on a blocked JDBC thread. However, you can cancel the jdbcBlockedFiber.join.start fiber, because it is blocked asynchronously. This allows the cancellation handler, which sends a cancel query to the database, to be executed. Please refer to the example code in the description, as it illustrates the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no fiber leak, the asyncM.blocking fiber will continue running until the JDBC thread finishes dealing with the query. When the JDBC query is completed and the thread is unblocked, that fiber will be cleaned up.
Actually, this is what I would call a "fiber leak". Because the fiber goes out of control at this point and lives solely on the mercy of JDBC.
This allows the cancellation handler, which sends a cancel query to the database, to be executed.
Could you elaborate a bit – why do you think a cancel query is sent to the database in this case? Where is it happening exactly?
I see that you are trying to reproduce it with the test. However, I'm not sure it is working because the query gets cancelled gracefully. Perhaps it is happening just because the connection to the DB gets closed after the cancellation, which in turn cancels the query. However, it may not always work with connection pooling, for example.
There is a problem with this approach when it comes to background tasks. The cancellation of a fiber also tries to cancel the jdbcBlockedFiber, but the issue is that the cancellation of jdbcBlockedFiber is blocking and doesn't work.
Yes, exactly. Because it guarantees that there's no leak in that case. I feel it was designed that way for a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is what I would call a "fiber leak". Because the fiber goes out of control at this point and lives solely on the mercy of JDBC.
There is no way in CE3 to cancel a fiber running on a blocked thread, I suppose. So the fiber lives solely on the mercy of JDBC anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit – why do you think a cancel query is sent to the database in this case? Where is it happening exactly?
I added a debug log to PreparedStatementInterpreter.close to check if the raw JDBC method (PreparedStatement.close) was called after the fiber was canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the database is slow to respond, the close operation might not execute instantly. The delay occurs also because the cancellation request has to be sent to the database over the network, which can introduce additional latency. Consequently, the timeout might not cancel the query after the intended 300ms, but instead after 600ms or more, due to this added latency (and it leads to user request stucks)
@TalkingFoxMid Thanks for explaining. If I understood correctly, what you are describing is not a limitation of cancelable
but actually a general feature of Cats Effect, which is that it backpressures on cancelation i.e. if you request cancelation of some fiber, it will wait until that cancelation fully completes.
Consider this example:
IO.sleep(2.seconds).onCancel(IO.sleep(1.second)).timeout(1.second)
Even though the timeout is after 1 second, it will take 2 seconds to complete because of the onCancel
finalizer. Is this what you are considering to be a limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the database is slow to respond, the close operation might not execute instantly. The delay occurs also because the cancellation request has to be sent to the database over the network, which can introduce additional latency. Consequently, the timeout might not cancel the query after the intended 300ms, but instead after 600ms or more, due to this added latency (and it leads to user request stucks)
@TalkingFoxMid Thanks for explaining. If I understood correctly, what you are describing is not a limitation of
cancelable
but actually a general feature of Cats Effect, which is that it backpressures on cancelation i.e. if you request cancelation of some fiber, it will wait until that cancelation fully completes.Consider this example:
IO.sleep(2.seconds).onCancel(IO.sleep(1.second)).timeout(1.second)Even though the timeout is after 1 second, it will take 2 seconds to complete because of the
onCancel
finalizer. Is this what you are considering to be a limitation?
Consider also this example:
import cats.effect.{Async, IO, IOApp}
import scala.concurrent.duration._
object OnCancelTest extends IOApp.Simple {
@volatile var blocked = true
def goDB: IO[Unit] =
IO {
while (blocked) {Thread.sleep(100)}
}.cancelable(
{IO.sleep(5.seconds) >> IO {blocked = false}}.start.void
)
override def run: IO[Unit] =
for {
fiber <- goDB.start
_ <- IO.sleep(100.millis)
_ <- fiber.cancel
_ <- IO(println("Cancelled"))
} yield ()
}
In that snippet, fiber.cancel was blocked not only because it waits for the cancelable handler to complete (the handler completes instantly because it only starts another fiber). The problem also lies in the fact that the external state wasn't modified instantly, even if the cancellation handler completed successfully. This is similar to what we see in Doobie: even after a cancel query is sent to the database, there is still some latency before the database unblocks the blocked JDBC thread running query.
I don't think this is a limitation of CE3, but it does seem like a slight disadvantage of this approach in the context of the current problem in Doobie.
This approach can cause the user query to hang for a long time because we are trying to wait for the query to be canceled in the database. However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further. This is exactly our case, so this disadvantage is relevant for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satorg Could you give an approve here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TalkingFoxMid thanks for including that second example, that was very helpful.
However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further. This is exactly our case, so this disadvantage is relevant for us
Cats Effect specifically has a timeoutAndForget
API for this.
I agree that sometimes behavior like this is necessary, but that should be the (careful) choice of the user. Once backpressure is removed, it's not possible to restore it. I am dubious about removing backpressure signals in a library, since then users no longer have agency to make the appropriate decision for their specific usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further.
The catch here is that most likely there will be no immediate relief whatsoever. As @jatcwang found out recently, even if we give up here immediately, the query will be canceled as a part of a bracket's release effect at the end of the transaction. Therefore in a case of long-running cancel query, the entire transaction will have to await until the cancel query completes anyway.
modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala
Outdated
Show resolved
Hide resolved
modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala
Outdated
Show resolved
Hide resolved
4c0f87e
to
4cd42e8
Compare
config.setUsername("postgres") | ||
config.setPassword("password") | ||
config.setPassword("mysecretpassword") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be reverted back to jdbc:postgresql:world
right?
} | ||
|
||
object Primitive { | ||
case class Default[M[_]]()(implicit asyncM: WeakAsync[M]) extends Primitive[M] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it is recommended to avoid using case classes in publicly available libraries unless absolutely necessary. I guess, the primary reason for that here was to get a static apply
method. Totally understandable it is, but case classes generate a lot of other stuff which makes them difficult to deal with in terms of maintaining binary compatibility. For example: a copy
method, inheriting Product
, and the last but not least a notorious unapply
static method.
A regular class should work just as well here, and if the static apply
is desirable, it's better to add it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same is applicable to the case class Cancelable
below.
class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer => | ||
|
||
private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion here to consider:
class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer => | |
private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]()) | |
class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Primitive[M])(implicit val asyncM: WeakAsync[M]) { outer => | |
def this(logHandler: LogHandler[M])(implicit val asyncM: WeakAsync[M]) = | |
this(logHandler, Primitive.Default[M]) |
Now you can use customPrimitive
directly.
I.e. in this case the overloaded constructor helps to avoid dealing with Option
.
Superceded by #2079 |
After migrating Doobie to Cats Effect 3 (CE3), query cancellation no longer works as expected. When a query is canceled (using IO.cancel), the current fiber blocks, and PreparedStatement.close is not called immediately. This issue arises because in CE3, the cancellation of a fiber on a blocked thread does not trigger cancellation handlers right away, while PreparedStatement.close needs to be called in a cancellation handler.
Here is a code example that demonstrates this behavior. In CE2, the cancellation handler runs immediately, but in CE3, it only runs after the thread becomes unblocked:
I also added unit test which reproduce that behaviour.
Alternative more appropriate solution: #2079 (comment)