fix query cancellation #2077

TalkingFoxMid · 2024-08-05T09:05:19Z

After migrating Doobie to Cats Effect 3 (CE3), query cancellation no longer works as expected. When a query is canceled (using IO.cancel), the current fiber blocks, and PreparedStatement.close is not called immediately. This issue arises because in CE3, the cancellation of a fiber on a blocked thread does not trigger cancellation handlers right away, while PreparedStatement.close needs to be called in a cancellation handler.

Here is a code example that demonstrates this behavior. In CE2, the cancellation handler runs immediately, but in CE3, it only runs after the thread becomes unblocked:

import cats.effect.{Async, IO, IOApp}

object OnCancelTest extends IOApp.Simple {
  import scala.concurrent.duration._

  def goDB: IO[Unit] = IO(println("START goDB")).map(
    _ => Thread.sleep(10000)
  )
    .guarantee(
      IO(println("SEND CANCELLATION VIA JDBC"))
    )


  override def run: IO[Unit] =
    for {
      f <- goDB.start
      _ <- IO.sleep(1.second)
      _ <- f.cancel
      _ <- IO(println("CANCELLED goBD"))
    } yield ()
}

I also added unit test which reproduce that behaviour.

Alternative more appropriate solution: #2079 (comment)

jatcwang · 2024-08-05T09:17:51Z

Thanks for trying this out @TalkingFoxMid. Have you seen the comments in #1922?

One of the thing we want to do is to signal to the database that it should cancel the running query on the database side too.
I believe that we might need PreparedStatement#cancel for that. If you want to simulate a long running query in postgres you can try using pg_sleep - I think that way you can test whether a query was really cancelled on the database side.

TalkingFoxMid · 2024-08-05T09:39:05Z

Thanks for trying this out @TalkingFoxMid. Have you seen the comments in #1922?

One of the thing we want to do is to signal to the database that it should cancel the running query on the database side too. I believe that we might need PreparedStatement#cancel for that. If you want to simulate a long running query in postgres you can try using pg_sleep - I think that way you can test whether a query was really cancelled on the database side.

Thanks for the reply. Cancellation on the database side is already working in this solution. I verified it with the following test:

Fiber 1 locks the table.
Fiber 2 attempts to insert data.
Cancel Fiber 2.
Fiber 1 unlocks the table.
Select from the table → The table is empty.

This can also be demonstrated in the QueryCancellationSuite by adding a join on the lock fiber before the final table read. I believe that in this case, the insert query (2) was canceled on the DB side.

TalkingFoxMid · 2024-08-05T09:40:36Z

Thanks for trying this out @TalkingFoxMid. Have you seen the comments in #1922?

One of the thing we want to do is to signal to the database that it should cancel the running query on the database side too. I believe that we might need PreparedStatement#cancel for that. If you want to simulate a long running query in postgres you can try using pg_sleep - I think that way you can test whether a query was really cancelled on the database side.

But I'll also try testing this with pg_sleep to confirm. Thanks!

satorg · 2024-08-05T22:45:58Z

modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala

+      jdbcBlockedFiber <- asyncM.start(asyncM.blocking(f(a)).attempt)
+      a <- jdbcBlockedFiber.join.flatMap {


I've got some concerns regarding this code. They may or may not be valid though, but let me pronounce them:

Sync#blocking is non-cancellable by design. Therefore the execution pipeline can be cancelled either before entering blocking or right after it completes, but not inside. Therefore it can only cancel the fiber that runs the query, but not the operation itself.

There can be a potential fiber leak with this code: if a cancel signal comes when the fiber is created, but before it gets joined, then here it goes (_{btw, I believe this is one of the reasons why background is almost always preferable over low-level fiber start/join manipulations}).

Firing a fiber and then joining it immediately after looks quite suspicious. What is it attempted to accomplish there exactly?

Therefore it can only cancel the fiber that runs the query, but not the operation itself.

Yes, it is. But cancellation handler that sends the cancel query to the DB is on the fiber that can be cancelled now. So, the blocked fiber will also be canceled from the JDBC side (the database stops the JDBC thread).

There can be a potential fiber leak with this code

I think there is no fiber leak, the asyncM.blocking fiber will continue running until the JDBC thread finishes dealing with the query. When the JDBC query is completed and the thread is unblocked, that fiber will be cleaned up. Currently, Doobie works in exactly the same way.

About background:
There is a problem with this approach when it comes to background tasks. The cancellation of a fiber also tries to cancel the jdbcBlockedFiber, but the issue is that the cancellation of jdbcBlockedFiber is blocking and doesn't work. So, using background tasks doesn't solve the problem

What is it attempted to accomplish there exactly?

You cannot cancel jdbcBlockedFiber because it is running on a blocked JDBC thread. However, you can cancel the jdbcBlockedFiber.join.start fiber, because it is blocked asynchronously. This allows the cancellation handler, which sends a cancel query to the database, to be executed. Please refer to the example code in the description, as it illustrates the problem.

I think there is no fiber leak, the asyncM.blocking fiber will continue running until the JDBC thread finishes dealing with the query. When the JDBC query is completed and the thread is unblocked, that fiber will be cleaned up.

Actually, this is what I would call a "fiber leak". Because the fiber goes out of control at this point and lives solely on the mercy of JDBC.

This allows the cancellation handler, which sends a cancel query to the database, to be executed.

Could you elaborate a bit – why do you think a cancel query is sent to the database in this case? Where is it happening exactly?

I see that you are trying to reproduce it with the test. However, I'm not sure it is working because the query gets cancelled gracefully. Perhaps it is happening just because the connection to the DB gets closed after the cancellation, which in turn cancels the query. However, it may not always work with connection pooling, for example.

There is a problem with this approach when it comes to background tasks. The cancellation of a fiber also tries to cancel the jdbcBlockedFiber, but the issue is that the cancellation of jdbcBlockedFiber is blocking and doesn't work.

Yes, exactly. Because it guarantees that there's no leak in that case. I feel it was designed that way for a good reason.

Actually, this is what I would call a "fiber leak". Because the fiber goes out of control at this point and lives solely on the mercy of JDBC.

There is no way in CE3 to cancel a fiber running on a blocked thread, I suppose. So the fiber lives solely on the mercy of JDBC anyway.

Could you elaborate a bit – why do you think a cancel query is sent to the database in this case? Where is it happening exactly?

I added a debug log to PreparedStatementInterpreter.close to check if the raw JDBC method (PreparedStatement.close) was called after the fiber was canceled.

If the database is slow to respond, the close operation might not execute instantly. The delay occurs also because the cancellation request has to be sent to the database over the network, which can introduce additional latency. Consequently, the timeout might not cancel the query after the intended 300ms, but instead after 600ms or more, due to this added latency (and it leads to user request stucks)

@TalkingFoxMid Thanks for explaining. If I understood correctly, what you are describing is not a limitation of cancelable but actually a general feature of Cats Effect, which is that it backpressures on cancelation i.e. if you request cancelation of some fiber, it will wait until that cancelation fully completes.

Consider this example:

IO.sleep(2.seconds).onCancel(IO.sleep(1.second)).timeout(1.second)

Even though the timeout is after 1 second, it will take 2 seconds to complete because of the onCancel finalizer. Is this what you are considering to be a limitation?

If the database is slow to respond, the close operation might not execute instantly. The delay occurs also because the cancellation request has to be sent to the database over the network, which can introduce additional latency. Consequently, the timeout might not cancel the query after the intended 300ms, but instead after 600ms or more, due to this added latency (and it leads to user request stucks)

@TalkingFoxMid Thanks for explaining. If I understood correctly, what you are describing is not a limitation of cancelable but actually a general feature of Cats Effect, which is that it backpressures on cancelation i.e. if you request cancelation of some fiber, it will wait until that cancelation fully completes.

Consider this example:

IO.sleep(2.seconds).onCancel(IO.sleep(1.second)).timeout(1.second)

Even though the timeout is after 1 second, it will take 2 seconds to complete because of the onCancel finalizer. Is this what you are considering to be a limitation?

Consider also this example:

import cats.effect.{Async, IO, IOApp} import scala.concurrent.duration._ object OnCancelTest extends IOApp.Simple { @volatile var blocked = true def goDB: IO[Unit] = IO { while (blocked) {Thread.sleep(100)} }.cancelable( {IO.sleep(5.seconds) >> IO {blocked = false}}.start.void ) override def run: IO[Unit] = for { fiber <- goDB.start _ <- IO.sleep(100.millis) _ <- fiber.cancel _ <- IO(println("Cancelled")) } yield () }

In that snippet, fiber.cancel was blocked not only because it waits for the cancelable handler to complete (the handler completes instantly because it only starts another fiber). The problem also lies in the fact that the external state wasn't modified instantly, even if the cancellation handler completed successfully. This is similar to what we see in Doobie: even after a cancel query is sent to the database, there is still some latency before the database unblocks the blocked JDBC thread running query.

I don't think this is a limitation of CE3, but it does seem like a slight disadvantage of this approach in the context of the current problem in Doobie.

This approach can cause the user query to hang for a long time because we are trying to wait for the query to be canceled in the database. However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further. This is exactly our case, so this disadvantage is relevant for us

@satorg Could you give an approve here?

@TalkingFoxMid thanks for including that second example, that was very helpful.

However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further. This is exactly our case, so this disadvantage is relevant for us

Cats Effect specifically has a timeoutAndForget API for this.

I agree that sometimes behavior like this is necessary, but that should be the (careful) choice of the user. Once backpressure is removed, it's not possible to restore it. I am dubious about removing backpressure signals in a library, since then users no longer have agency to make the appropriate decision for their specific usecase.

@TalkingFoxMid ,

However, there are situations when you just need to timeout immediately, return the default value and continue executing the request further.

The catch here is that most likely there will be no immediate relief whatsoever. As @jatcwang found out recently, even if we give up here immediately, the query will be canceled as a part of a bracket's release effect at the end of the transaction. Therefore in a case of long-running cancel query, the entire transaction will have to await until the cancel query completes anyway.

modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala

jatcwang · 2024-08-08T13:48:30Z

modules/hikari/src/test/scala/doobie/HikariQueryCancellationSuite.scala

        config.setUsername("postgres")
-        config.setPassword("password")
+        config.setPassword("mysecretpassword")


These should be reverted back to jdbc:postgresql:world right?

satorg · 2024-08-09T02:07:51Z

modules/free/src/main/scala/doobie/free/Primitive.scala

+}
+
+object Primitive {
+  case class Default[M[_]]()(implicit asyncM: WeakAsync[M]) extends Primitive[M] {


In general, it is recommended to avoid using case classes in publicly available libraries unless absolutely necessary. I guess, the primary reason for that here was to get a static apply method. Totally understandable it is, but case classes generate a lot of other stuff which makes them difficult to deal with in terms of maintaining binary compatibility. For example: a copy method, inheriting Product, and the last but not least a notorious unapply static method.

A regular class should work just as well here, and if the static apply is desirable, it's better to add it manually.

The same is applicable to the case class Cancelable below.

satorg · 2024-08-09T02:18:27Z

modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala

+class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer =>
+
+  private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]())
+


Just a suggestion here to consider:

Suggested change

class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer =>

private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]())

class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Primitive[M])(implicit val asyncM: WeakAsync[M]) { outer =>

def this(logHandler: LogHandler[M])(implicit val asyncM: WeakAsync[M]) =

this(logHandler, Primitive.Default[M])

Now you can use customPrimitive directly.
I.e. in this case the overloaded constructor helps to avoid dealing with Option.

jatcwang · 2024-08-19T08:49:38Z

Superceded by #2079

mergify bot added core free labels Aug 5, 2024

TalkingFoxMid force-pushed the fix_query_cancellation branch from 67e6830 to e32557f Compare August 5, 2024 09:28

TalkingFoxMid force-pushed the fix_query_cancellation branch 2 times, most recently from 2d86feb to a492eb4 Compare August 5, 2024 10:28

fix query cancellation

cc3c5a0

TalkingFoxMid force-pushed the fix_query_cancellation branch from a492eb4 to cc3c5a0 Compare August 5, 2024 12:16

satorg reviewed Aug 5, 2024

View reviewed changes

modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala Outdated Show resolved Hide resolved

jatcwang reviewed Aug 6, 2024

View reviewed changes

modules/free/src/main/scala/doobie/free/kleisliinterpreter.scala Outdated Show resolved Hide resolved

TalkingFoxMid mentioned this pull request Aug 6, 2024

Cancel the DB side query when doobie execution is cancelled #1922

Closed

Add tests using postgres pg_sleep and also with Hikari

d66ae06

mergify bot added the hikari label Aug 6, 2024

TalkingFoxMid mentioned this pull request Aug 7, 2024

fix query cancellation with cancelable #2079

Merged

fix compatibility

4c3a85d

mergify bot added the postgres label Aug 7, 2024

jatcwang approved these changes Aug 7, 2024

View reviewed changes

a-khakimov approved these changes Aug 8, 2024

View reviewed changes

fix warnings + add Free2Gen

4cd42e8

TalkingFoxMid force-pushed the fix_query_cancellation branch from 4c0f87e to 4cd42e8 Compare August 8, 2024 13:34

jatcwang reviewed Aug 8, 2024

View reviewed changes

satorg reviewed Aug 9, 2024

View reviewed changes

jatcwang closed this Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix query cancellation #2077

fix query cancellation #2077

TalkingFoxMid commented Aug 5, 2024 •

edited

Loading

jatcwang commented Aug 5, 2024

TalkingFoxMid commented Aug 5, 2024

TalkingFoxMid commented Aug 5, 2024

satorg Aug 5, 2024 •

edited

Loading

TalkingFoxMid Aug 6, 2024 •

edited

Loading

satorg Aug 6, 2024

TalkingFoxMid Aug 6, 2024

TalkingFoxMid Aug 6, 2024

armanbilge Aug 7, 2024 •

edited

Loading

TalkingFoxMid Aug 8, 2024

TalkingFoxMid Aug 8, 2024

armanbilge Aug 8, 2024

satorg Aug 9, 2024 •

edited

Loading

jatcwang Aug 8, 2024

satorg Aug 9, 2024

satorg Aug 9, 2024

satorg Aug 9, 2024

jatcwang commented Aug 19, 2024

		jdbcBlockedFiber <- asyncM.start(asyncM.blocking(f(a)).attempt)
		a <- jdbcBlockedFiber.join.flatMap {

		class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer =>

		private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]())

-class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Option[Primitive[M]] = None)(implicit val asyncM: WeakAsync[M]) { outer =>
-  private val _primitive = customPrimitive.getOrElse(Primitive.Default[M]())
+class KleisliInterpreter[M[_]](logHandler: LogHandler[M], customPrimitive: Primitive[M])(implicit val asyncM: WeakAsync[M]) { outer =>
+  def this(logHandler: LogHandler[M])(implicit val asyncM: WeakAsync[M]) =
+    this(logHandler, Primitive.Default[M])

fix query cancellation #2077

fix query cancellation #2077

Conversation

TalkingFoxMid commented Aug 5, 2024 • edited Loading

jatcwang commented Aug 5, 2024

TalkingFoxMid commented Aug 5, 2024

TalkingFoxMid commented Aug 5, 2024

satorg Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

TalkingFoxMid Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armanbilge Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satorg Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jatcwang commented Aug 19, 2024

TalkingFoxMid commented Aug 5, 2024 •

edited

Loading

satorg Aug 5, 2024 •

edited

Loading

TalkingFoxMid Aug 6, 2024 •

edited

Loading

armanbilge Aug 7, 2024 •

edited

Loading

satorg Aug 9, 2024 •

edited

Loading