Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix query cancellation with cancelable #2079

Merged

Conversation

TalkingFoxMid
Copy link
Contributor

@TalkingFoxMid TalkingFoxMid commented Aug 7, 2024

Alternative fix of problem described here #2077

@satorg
Copy link
Contributor

satorg commented Aug 9, 2024

@TalkingFoxMid thank you for giving it a shot!

@jatcwang, to be honest, I personally like this approach better.
It might look more complicated due to more changes made, but in fact it is not that much because most of the changes happened in the auto-generated code.

I wonder, is there a chance that we could weigh this PR in?
Also I would like to summon @armanbilge as a "cancelation guru" :)

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in fact it is not that much because most of the changes happened in the auto-generated code.

yes, would you mind pointing me at the non-auto-generated changes? 😅 still at a glance I think this approach is better too, since its using the existing cancelable IIUC.

}
}

scenario.unsafeRunSync()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of unsafeRunSync() should use munit-cats-effect CatsEffectSuite.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't use it at the moment - I can handle this in another PR to move all tests to use CatsEffectSuite

@jatcwang
Copy link
Collaborator

@armanbilge Non-autogenerated changes are in modules/core/src/main/scala/doobie/hi/connection.scala

@jatcwang
Copy link
Collaborator

jatcwang commented Aug 10, 2024

execImpl pretty much handles all Query/Update methods so I made the following updates:

     createLogged
-      .bracket(ps => IFC.embed(ps, prepLogged *> execAndProcessLogged))(ps => IFC.embed(ps, IFPS.close))
+      .flatMap(ps =>
+        WeakAsyncConnectionIO.cancelable(
+          IFC.embed(ps, prepLogged *> execAndProcessLogged)
+            .attempt.flatTap(_ => IFC.embed(ps, IFPS.close)).rethrow,
+          IFC.embed(ps, IFPS.close)))

which made HikariQueryCancellationSuite pass. (Posting the diff as I cannot push to the PR)

We should also need to make Streaming queries cancellable (doobie.hi.connection#stream) but I can tackle this in a separate PR.

In general I'm quite happy with this approach too as it should have less overhead 👍 Thanks @TalkingFoxMid !

@TalkingFoxMid TalkingFoxMid force-pushed the fix_query_cancellation_cancelable branch from 5591005 to 4b90d8f Compare August 12, 2024 08:49
@TalkingFoxMid

This comment was marked as resolved.

@TalkingFoxMid

This comment was marked as resolved.

@TalkingFoxMid

This comment was marked as resolved.

@armanbilge
Copy link
Member

@TalkingFoxMid

However, if we change onCancel to cancelable

onCancel and cancelable are not equivalent. task.onCancel(fin) runs fin after task is canceled. task.cancelable(fin) runs fin concurrently while task is running.


the guarantee of resource release is no longer present. Furthermore, there is also a "USE RESOURCE" print, so it seems that cancelable makes the unmasked effect uncancelable altogether. Here is edited code:

To be cancelable, the cancelable needs to be inside the unmask.

@@ -10,8 +10,8 @@ object TestCancelable extends IOApp.Simple {
       acquire.flatMap(
         r =>
           unmask {
-            IO.sleep(5.seconds) >> IO(println(s"USE $r"))
-          }.onCancel(IO(println(s"release $r")))
+            (IO.sleep(5.seconds) >> IO(println(s"USE $r"))).onCancel(IO(println(s"release $r")))
+          }
       )
   }

After making this change the result is:

START AQ
END AQ

@TalkingFoxMid

This comment was marked as resolved.

@armanbilge
Copy link
Member

@TalkingFoxMid thanks, I think there is still some confusion. cancelable is not a drop-in equivalent for onCancel. They are different combinators which serve different purposes. You may even use them together:

object TestCancelable extends IOApp.Simple {
  val acquire: IO[String] = IO.println("START AQ") >> IO.sleep(2.seconds) >> IO.println("END AQ") >> IO("RESOURCE")

  def evaluation: IO[Unit] = IO.uncancelable {
    unmask =>
      acquire.flatMap(
        r =>
          unmask {
            (IO.sleep(5.seconds) >> IO(println(s"USE ${r}"))).cancelable(IO(println(s"canceling USE ${r}")))
          }.onCancel(IO(println(s"release ${r}")))
      )
  }

  override def run: IO[Unit] =
    for {
      f <- evaluation.start
      _ <- IO.sleep(1.seconds)
      _ <- f.cancel
      _ <- IO.never[Unit]
    } yield ()
}
START AQ
END AQ
release RESOURCE

@TalkingFoxMid

This comment was marked as resolved.

@armanbilge
Copy link
Member

armanbilge commented Aug 13, 2024

Edit: Sergey's answer in #2079 (comment) is much better :) my original comment is hidden below.

So cancelable is not working here, isn't it?

Strange, I was not able to replicate your result. It works for me. Here is the exact code I am running, which is directly copied from your example:

//> using dep org.typelevel::cats-effect::3.5.4

import cats.effect.{IO, IOApp}

import scala.concurrent.duration.DurationInt

object TestCancelable extends IOApp.Simple {
  val acquire: IO[String] = IO.println("START AQ") >> IO.sleep(2.seconds) >> IO.println("END AQ") >> IO("RESOURCE")

  def evaluation: IO[Unit] = IO.uncancelable {
    unmask =>
      acquire.flatMap(
        r =>
          unmask {
            (IO(Thread.sleep(5)) >> IO(println(s"USE ${r}"))).cancelable(IO(println(s"canceling USE ${r}")))
          }.onCancel(IO(println(s"release ${r}")))
      )
  }

  override def run: IO[Unit] =
    for {
      f <- evaluation.start
      _ <- IO.sleep(1.seconds)
      _ <- f.cancel
      _ <- IO.never[Unit]
    } yield ()
}
% scala-cli --server=false sandbox.scala
START AQ
END AQ
release RESOURCE

But if we write not IO.sleep but Thread.sleep to simulate a fiber on a blocked thread (this is exactly similar to case in doobie, we are trying to cancel a fiber which is running on a blocked thread)

Note, the correct way to write a fiber on a blocked thread is IO.blocking(Thread.sleep(...)) (or IO.interruptible because sleep support cancelation via Thread#interrupt).

@satorg
Copy link
Contributor

satorg commented Aug 13, 2024

@TalkingFoxMid ,

The complete example is
...
Prints
...

Actually, your example (after I fix IO.(Thread.sleep(5)) to IO(Thread.sleep(5))) prints the following:

START AQ
END AQ
release RESOURCE

I.e. the callback effect fin registered in unmask { ... }.onCancel( fin ) gets triggered.

The reason why the fin effect provided to cancelable is not executed in your example is that due to the specific timings the cancelation occurs at the boundary of the unmasked region. I.e. the task gets canceled before is gets into cancelable.

However, if you change your timings to the following values:

  val acquire: IO[String] =
    IO.println("START AQ") >> IO.sleep(2.seconds) >> IO.println("END AQ") >> IO("RESOURCE")

  def evaluation: IO[Unit] = IO.uncancelable { unmask =>
    acquire.flatMap(r =>
      unmask {
        // Sleep time increased to 2 seconds (was 5 ms)
        (IO(Thread.sleep(2000)) >> IO(println(s"USE $r")))
          .cancelable(IO(println(s"canceling USE $r")))
      }.onCancel(IO(println(s"release ${r}")))
    )
  }

  override def run: IO[Unit] =
    for {
      f <- evaluation.start
      _ <- IO.sleep(3.seconds) // increased to 3 seconds
      _ <- f.cancel
      _ <- IO.never[Unit]
    } yield ()
then the app prints the following:
START AQ
END AQ
canceling USE RESOURCE
release RESOURCE

Here the cancel signal comes inside the unmasked region.
Therefore cancelable is applied and the cancel signal takes an effect right after Thread.sleep completes (which cannot be interrupted).

@TalkingFoxMid
Copy link
Contributor Author

TalkingFoxMid commented Aug 14, 2024

def cancelable[A](outer: Poll[IO])(fa: IO[A], fin: IO[Unit]): IO[A] = {
IO.uncancelable { poll =>
fa.start.flatMap { fiber =>
poll(outer(fiber.join))
.onCancel(fin *> fiber.cancel)
.flatMap(_.embed(poll(outer(IO.canceled *> IO.never[A]))))
}
}
}

Sorry for confusion and thank you all for trying to explain, it was very helpful. Now I see that there is cases when onCancel works and cancelable is not working, and vice versa (even if our effect is running on a blocked thread). So the most appropriate way to guarantee PreparedStatement cancelation I suppose is to use onCancel and cancelable together, to cover all possible cases of cancelation after resource was created.

However there is also small issue: call duplication. There are some cases when cancelable and onCancel can be executed both and this causes the effect to execute twice. It might worth if effect isn't idempotent (if we want guarantee exactly-once effect completing) but in case of PreparedStatement I suppose there is no such problem

@TalkingFoxMid TalkingFoxMid force-pushed the fix_query_cancellation_cancelable branch 2 times, most recently from 90c645c to 5b5e3e9 Compare August 14, 2024 08:19
@TalkingFoxMid
Copy link
Contributor Author

@satorg @armanbilge @jatcwang

I've finally done a solution that seems to be complete now and looking forward for your comments/approves

As for streams, I suggest doing this in another PR later

@TalkingFoxMid TalkingFoxMid force-pushed the fix_query_cancellation_cancelable branch 2 times, most recently from 4db468d to bdd1979 Compare August 14, 2024 13:56
Comment on lines +21 to +22
val logEventRef: Ref[IO, LogEvent] =
Ref.of[IO, LogEvent](null).unsafeRunSync()
Copy link
Contributor Author

@TalkingFoxMid TalkingFoxMid Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOLocal is no longer works in logHandler because there is cancelable which forks a fiber in a middle of query processing and IOLocal context isn't shared between two fibers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we guarantee that IOLocal works with logHandler ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch yes we probably need to. One of our docuemented examples also uses IOLocal to pass additional context to LogHandler.
In a perfect world, I do feel like IO.cancelable implementation should pass on the IOLocal though..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a perfect world, I do feel like IO.cancelable implementation should pass on the IOLocal though..

In general, you cannot make these kinds of assumptions when using IOLocal with combinators. Further reading:

Copy link
Contributor Author

@TalkingFoxMid TalkingFoxMid Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a lot of problems with maintaining this feature:

  1. It is not very functional. I think the most appropriate way to do the same - use writer monads, and I suppose that is possible in current setup.
  2. It restricts ours capabilities to rely on CE3 functional combinators in Doobie code

To be honest, I think that IOLocal should be used exclusively in the internal code (as it is impure) of an application (or library). The API of a functional library should not provide (or at least guarantee that it will work fine) the use of this tool to users.

So I wish to stop maintaining this feature, remove code example and migrate users to more suitable instruments (like a writer monad or a reader with Ref in context). Is it possible?

Copy link
Contributor Author

@TalkingFoxMid TalkingFoxMid Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatcwang

Please give a reference to place in the documentation where IOLocal is used. IOLocal can be used to pass context to LogHandler. But It can't be used to modify this context by logHandler if we use cancelable (in tests loghandler modifies IOLocal context, it puts log in IOLocal). I've found this example in doc:

import cats.effect.{IOLocal, Ref}
import doobie.util.log.Success

def users = List.range(0, 4).map(n => s"user-$n")

def program: IO[List[String]] =
  for {
    // define an IOLocal where we store the user which caused the query to be run  
    currentUser <- IOLocal("")
    // store all successful sql here, for all users
    successLogsRef <- Ref[IO].of(List.empty[String])
    xa = Transactor.fromDriverManager[IO](
        driver = "org.h2.Driver",
        url = "jdbc:h2:mem:queryspec;DB_CLOSE_DELAY=-1",
        user = "sa", 
        password = "",
        logHandler = Some(new LogHandler[IO] {
          def run(logEvent: LogEvent): IO[Unit] =
            currentUser.get.flatMap(user => successLogsRef.update(logs => s"sql for user $user: '${logEvent.sql}'" :: logs))
        })
      )
    // run a bunch of queries
    _ <- users.parTraverse(user =>
      for {
        _ <- currentUser.set(user)
        _ <- sql"select 1".query[Int].unique.transact(xa)
      } yield ()
    )
    // return collected log messages
    logs <- successLogsRef.get
  } yield logs

program.unsafeRunSync().sorted

It will work because child fibers inherit context of parents. So if we set up context in parrent fiber it can be accessed in loghandler. But if we modify context in loghandler this modification can't be propagated to a parent fiber. In tests there is modification of context in child fiber

Copy link
Contributor Author

@TalkingFoxMid TalkingFoxMid Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider these two examples:

import cats.effect.{IO, IOApp, IOLocal}

object IOCancelTest extends IOApp.Simple {

  def childTask(local: IOLocal[Int]): IO[Unit] =
    for {
      _ <- local.get.flatTap(IO.println) // 42 inherits from parent fiber
    } yield ()

  override def run: IO[Unit] = for {
    local <- IOLocal(42)
    _ <- childTask(local).cancelable(IO(println("some finalizer")))
  } yield ()
}
import cats.effect.{IO, IOApp, IOLocal}

object IOCancelTest2 extends IOApp.Simple {

  def childTask(local: IOLocal[Int]): IO[Unit] =
    for {
      _ <- local.set(64)
    } yield ()

  override def run: IO[Unit] = for {
    local <- IOLocal(42)
    _ <- childTask(local).cancelable(IO(println("some finalizer")))
    _ <- local.get.flatTap(IO.println) // 42 even after update in child fiber
  } yield ()
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you've found the example in the docs 👍

I think trying to modify IOLocal within a ConnectionIO should be quite rare. Since this is a general issue with IOLocal propagation it's not something we can solve here.

Thanks for updating the tests to use Ref 👍


import scala.concurrent.duration.DurationInt

class HikariQueryCancellationSuite extends munit.FunSuite {
Copy link
Contributor Author

@TalkingFoxMid TalkingFoxMid Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed QueryCancellationSuite in favor of more appropriate HikariQueryCancellationSuite because current solution doesn't work well with QueryCancellationSuite. Here is code for history:

// Copyright (c) 2013-2020 Rob Norris and Contributors
// This software is licensed under the MIT License (MIT).
// For more information see LICENSE or https://opensource.org/licenses/MIT

package doobie.util

import cats.effect.IO
import doobie.Transactor
import doobie.*
import doobie.implicits.*
import cats.syntax.all.*

import scala.concurrent.duration.DurationInt

class QueryCancellationSuite extends munit.FunSuite {
  import cats.effect.unsafe.implicits.global

  val xa = Transactor.fromDriverManager[IO](
    driver = "org.h2.Driver",
    url = "jdbc:h2:mem:queryspec;DB_CLOSE_DELAY=-1;DEFAULT_LOCK_TIMEOUT=60000;LOCK_TIMEOUT=60000",
    user = "sa",
    password = "",
    logHandler = None
  )


  test("Query cancel") {
    val scenario = WeakAsync.liftIO[ConnectionIO].use { elevator =>
      for {
        _ <- sql"CREATE TABLE IF NOT EXISTS example_table ( id INT)".update.run.transact(xa)
        _ <- sql"TRUNCATE TABLE example_table".update.run.transact(xa)
        _ <- sql"INSERT INTO example_table (id) VALUES (1)".update.run.transact(xa)
        _ <- {
          sql"select * from example_table for update".query[Int].unique >> elevator.liftIO(IO.never)
        }.transact(xa).start

        insertWithLockFiber <- {
          for {
            _ <- IO.sleep(100.milli)
            insertFiber <- sql"UPDATE example_table SET id = 2".update.run.transact(xa).start
            _ <- IO.sleep(100.milli)
            _ <- insertFiber.cancel

          } yield ()
        }.start

        _ <- IO.race(insertWithLockFiber.join, IO.sleep(9.seconds) >> IO(fail("Cancellation is blocked")))
        result <- sql"SELECT * FROM example_table".query[Int].to[List].transact(xa)
      } yield assertEquals(result, List(1))
    }
    scenario.unsafeRunSync()
  }
}

@armanbilge
Copy link
Member

Now I see that there is cases when onCancel works and cancelable is not working, and vice versa (even if our effect is running on a blocked thread).

Sorry, what is an example of a case where cancelable runs but onCancel does not run (this is the "vice versa")? That seems problematic.

Comment on lines 178 to 182
.bracket(ps => IFC.embed(ps, prepLogged *> execAndProcessLogged))(ps => IFC.embed(ps, IFPS.close))
.bracket(ps =>
WeakAsyncConnectionIO.cancelable(
IFC.embed(ps, prepLogged *> execAndProcessLogged),
IFC.embed(ps, IFPS.close)
))(IFC.embed(_, IFPS.close) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks right 👍

@TalkingFoxMid
Copy link
Contributor Author

Now I see that there is cases when onCancel works and cancelable is not working, and vice versa (even if our effect is running on a blocked thread).

Sorry, what is an example of a case where cancelable runs but onCancel does not run (this is the "vice versa")? That seems problematic.

Sorry for confusion. I mean that if we want finalizer to run instantly after we cancel a fiber - we must use cancelable and onCancel together. So there is two possible branches: if the fiber is on a blocked thread then cancelable will run instantly but if the fiber continues running runLoop then onCancel will run finalizer instantly

@satorg
Copy link
Contributor

satorg commented Aug 15, 2024

@TalkingFoxMid ,

... if we want finalizer to run instantly after we cancel a fiber - we must use cancelable and onCancel together. So there is two possible branches ...

I feel there's still a confusion. onCancel and cancelable are very different combinators that serve different purposes. We cannot talk about them in terms of "either ... or ... ".

  • onCancel simply lets us know when (and whether) cancelation occurred. Its fin effect gets executed after the cancellation happened. Therefore, onCancel does not affect the behavior of the effect it is applied to.
  • cancelable allows us to convert otherwise a "non-cancelable" effect into a "cancelable" one (it does not matter whether it is blocking or not). The cancelable's fin effect is the key for the conversion – it should contain an action that interrupts the target "non-cancelable" effect. Therefore, cancelable does affect the behavior of the effect it is applied to.

@armanbilge
Copy link
Member

Yes, and to add on to that a bit: I would not say that cancelable runs a "finalizer", it runs an "interrupt" for the primary task. If you want to run a finalizer, you must use a combinator such as guarantee or onCancel.

@satorg
Copy link
Contributor

satorg commented Aug 15, 2024

@armanbilge ,

I would not say that cancelable runs a "finalizer", it runs an "interrupt" for the primary task.

I guess the confusion might come from the fact, "finalizer" is the name used in the docs for cancelable:

https://github.com/typelevel/cats-effect/blob/caee0e86f815b641af61f19becd4eac27a769f11/core/shared/src/main/scala/cats/effect/IO.scala#L400-L442

Given an effect which might be [[uncancelable]] and a finalizer, produce an effect which can be canceled by running the finalizer.

Also, its parameter name is fin similar to one used for onCancel.

@TalkingFoxMid TalkingFoxMid force-pushed the fix_query_cancellation_cancelable branch 2 times, most recently from bc252b1 to 061d092 Compare August 16, 2024 09:13
@TalkingFoxMid TalkingFoxMid force-pushed the fix_query_cancellation_cancelable branch from 061d092 to ed60a5f Compare August 16, 2024 09:30
Copy link
Collaborator

@jatcwang jatcwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM thanks @TalkingFoxMid! Very much appreciate the reviews too @armanbilge @satorg

@jatcwang jatcwang merged commit 3898e04 into typelevel:main Aug 19, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants