Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7: async iterables must checkpoint after yields #17

Merged
merged 10 commits into from
Aug 8, 2022

Conversation

jakkdl
Copy link
Member

@jakkdl jakkdl commented Aug 1, 2022

I continue doing stuff in a bit of an overkill way, but handling continue/break seems kind of important on this one. I gotta rush to a board game evening, so there's some cleanup and missing tests left, and probably needs a bunch more comments - but you can look at overall approach and look at tests.

Also fixing a bunch of stuff for 107/108 .. and some other rando stuff too. Messy.

Also, I should tweak code redundancy and stuff in the tests - started getting seriously irritating to have to care about all other codes so wrote a quick&dirty solution for filtering.
Not loving splitting up the tests, but the diff printer wasn't smart enough to handle small diffs in large lists - might change that / write my own diff printer. And also frustrating to shift a hundred error messages if adding a line at the top of the file (although thank god for vim's Ctrl-A & Ctrl-X) - sliiiightly tempted to generate the tests from comments in the files exclusively, but that is likely overkill.

@jakkdl jakkdl marked this pull request as draft August 1, 2022 15:48
@jakkdl jakkdl changed the title First draft of 7: async iterables must checkpoint after yields 7: async iterables must checkpoint after yields Aug 1, 2022
@jakkdl jakkdl force-pushed the 7_async_iterable_checkpoints branch 2 times, most recently from bc1351c to 7b46176 Compare August 1, 2022 15:51
@Zac-HD Zac-HD mentioned this pull request Aug 2, 2022
12 tasks
@jakkdl
Copy link
Member Author

jakkdl commented Aug 2, 2022

I'm confused by the lack of a requirement of a checkpoint before the first yield, isn't the point to enforce that a call to an async function will always checkpoint? But a function

async def foo():
  while True:
    yield 1
    await trio.sleep(1)

isn't a checkpoint the first time it's called.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems good to me!

Also fixing a bunch of stuff for 107/108 .. and some other rando stuff too. Messy.

If you can split stuff out in separate PRs that don't seem too prone to merge conflicts, that would make review a bit easier. Or if there are conflicts but we expect to have a day to rebase in between, I guess; that's more of a judgment call.

(nothing jumps out as "should be a second PR" from this one, fwiw, this is a general principle)

And also frustrating to shift a hundred error messages if adding a line at the top of the file - sliiiightly tempted to generate the tests from comments in the files exclusively, but that is likely overkill.

No kill like overkill! I'd do it 😁


I'm confused by the lack of a requirement of a checkpoint before the first yield, isn't the point to enforce that a call to an async function will always checkpoint?

I think you're right, and that example should trigger a lint warning. Quoting the Trio docs:

If you call an async function provided by Trio (await ), and it doesn’t raise an exception, then it always acts as a checkpoint. (If it does raise an exception, it might act as a checkpoint or might not.)
This includes async iterators: If you write async for ... in <a trio object>, then there will be at least one checkpoint before each iteration of the loop and one checkpoint after the last iteration.

# In each example, *all* of the `await`s are required (though async-for or async-with could substitute)
async def minimal_ok():
    await ...
    yield
    await ...

async def multiple_ok():
    await ...
    yield
    await ...
    yield
    await ...

async def loop_standard_ok():
    for _ in range(5):
        await ...
        yield
    await ...

async def loop_reverse_ok():
    await ...
    for _ in range(5):
        yield
        await ...

async def same_line_ok():
    yield await ...
    await ...

async def same_line_yield_from_ok():
    yield from (x async for x in ...)  # yield from is *only* OK when it's an async iterable
    await ...

I'm not proposing to check for exactly one checkpoint after the last iteration, pretty sure that's not actually the meaning (but see python-trio/trio#2388).

flake8_trio.py Outdated Show resolved Hide resolved
tests/trio109.py Outdated Show resolved Hide resolved
@jakkdl
Copy link
Member Author

jakkdl commented Aug 2, 2022

I'm confused by the lack of a requirement of a checkpoint before the first yield, isn't the point to enforce that a call to an async function will always checkpoint?

I think you're right, and that example should trigger a lint warning. Quoting the Trio docs:

Ooh nice, that means I can entirely remove 109 and merge it with 107 - such that async iterables simply are a special case of 107. That should simplify this beast down a little bit~

flake8_trio.py Outdated Show resolved Hide resolved
@jakkdl jakkdl force-pushed the 7_async_iterable_checkpoints branch 2 times, most recently from bf98409 to 6cde589 Compare August 3, 2022 22:05
@jakkdl
Copy link
Member Author

jakkdl commented Aug 3, 2022

I made a complete mess in the git history, and almost lost several hours of work today trying to clean that up. But recovered everything and everything is now squashed up into a single commit rebased onto master.
Should be fully working, and relatively clean. But do poke comments where a function or test needs better comments - I am starting to see the case for a simpler check, but making 107/108 cover complicated cases is way too fun to pass up 😇
the 107/108 test files are kind of massive, and might be somewhat messy after I merged the yield tests I wrote into them, but I hope I've cleaned them up enough it's possible to get a decent overview of them.

The 107 error message might want to be more descriptive, or split back into a 109 for iterables, to be more verbose about needing checkpoints before and after yields. But the large rewrite merging the logic was well worth it regardless, caught some silly stuff with it.
Can split 108 into a 110 too, or even have three error messages for return w/o checkpoint / return after yield with no checkpoint inbetween / yield after yield with no checkpoint inbetween.

@jakkdl jakkdl marked this pull request as ready for review August 3, 2022 22:10
@jakkdl
Copy link
Member Author

jakkdl commented Aug 3, 2022

Ooh and mayhaps even good to give a line indicator to (one of) the offending yields. E.g. yield with no guaranteed checkpoint since yield on line {} / yield with no guaranteed checkpoint since function definition on line {} / function exit with no guaranteed checkpoint since yield on line {}

@jakkdl
Copy link
Member Author

jakkdl commented Aug 3, 2022

Added details to TRIO108, if you like it I'll give the same treatment to 107.

flake8_trio.py Outdated Show resolved Hide resolved
@Zac-HD
Copy link
Member

Zac-HD commented Aug 4, 2022

Woke up this morning pretty unwell, unfortunately, so I probably won't review this before next week. Feel free to add new PRs, jump over to pytest-exceptiongroup, or take some time off as suits you 🙂

@jakkdl
Copy link
Member Author

jakkdl commented Aug 4, 2022

Aww, hoping for a speedy recovery ❤️
No pressure on getting back ASAP, I'll find stuff to do 🙂

After some sleep I got a good solution for my ugly loop, and thought a bit more about the error codes. I think it largely comes down to if you want more or fewer error codes - I could in theory cut it all down to a single error code, changing TRIO107 from
async [function/iterable] must have at least one checkpoint on every code path, unless an expection is raised
to
[exit] from async [function] with no guaranteed checkpoint since [function definition] on line [<same line>]
and for iterables
[exit] from async [iterable] with no guaranteed checkpoint since [yield] on line [<line>]

so there's a single error code

  1. [return/yield/exit] from async [function/iterable] with no guaranteed checkpoint since [function definition/yield] on line [<line>]

Or create up to five error codes:

  1. async function must have at least one checkpoint on every code path, unless an exception is raised
  2. return from async function must have guaranteed checkpoints on all code paths before it
  3. async iterable with no checkpoint when exhausting the iterable. Add a checkpoint after yield on line <line>
  4. yield from async iterable with no guaranteed checkpoint since entering the function on line `
  5. yield with no guaranteed checkpoint since yield on line `

A reasonable compromise is perhaps to merge 1-2, and 3-5, such that there's one error code for async functions and one for async iterables - but this will depend on your workflow and stuff.
Or do something like define these five as 2XY, with X=0 for async functions and X=1 for async iterables, so there's a lot of flexibility in what error codes to ignore.

@jakkdl
Copy link
Member Author

jakkdl commented Aug 5, 2022

Another thing I thought about, when there's multiple possible un-checkpointed paths before a yield, I could make it point out all problems (either through multiple problems on the same statement, or with a message that extends to arbitrary size.
Example:

async def foo():
  if ...:
    await foo()
    yield
  yield # error: no checkpoint since function def, and also no checkpoint since yield

Currently it's semi-arbitrary which one it will complain about, usually the most recent one.

Though would need to avoid stuff like

yield
yield # error no checkpoint since yield on $line-1
yield # error no checkpoint since yield on $line-1, and $line-2
yield # error no checkpoing since yield on $line-1, and $line-2, and $line-3

Probably something like "if complaining about multiple statements, only point out statements no other problem has complained about"

@jakkdl
Copy link
Member Author

jakkdl commented Aug 5, 2022

Since I've touched on various smaller things in this PR (get/set_state being the most prominent one) there will be merge conflicts & some code needs to be changed for consistency - so probably don't try to merge this one at the same time as any one of the others without giving me an opportunity to rebase/merge inbetween.

@jakkdl
Copy link
Member Author

jakkdl commented Aug 5, 2022

Re multiple possible problematic yields, it might also be nice to try and inform about awaits that aren't part of guaranteed checkpoints - if they're on the path between statements that require checkpoints between them:

async def foo():
  if ...:
    await bar() # warning: not guaranteed
  return # error, no guaranteed checkpoint since funcDef.

silly in this case, but with more complicated stuff (continue/break/etc) and potential bugs in the check or stuff that's intentionally skipped due to complexity, it might be good to get feedback on stuff that a user might think is guaranteed. And reduce the number of instances of a user raising an issue "why doesn't it think this massive function is checkpointed?? It's clearly guaranteed, duhh".

Probably a separate PR, would require modifying a couple different things.

@Zac-HD
Copy link
Member

Zac-HD commented Aug 5, 2022

I think that having different codes for async functions vs iterables makes sense, no real opinion on whether that means we should have two or five though. Which I guess suggests two for simplicity, albeit with custom/helpful messages?

Another thing I thought about, when there's multiple possible un-checkpointed paths before a yield, I could make it point out all problems (either through multiple problems on the same statement, or with a message that extends to arbitrary size.

I think having multiple warnings on the same statement makes sense, and if each mention the "since xxx" location that should help users understand which code paths are missing checkpoints. Only pointing to the most recent yield on each code path should fix your example, I think; each would point to the previous line and that in turn suggests the obvious "add a checkpoint between each" fix.

Re multiple possible problematic yields, it might also be nice to try and inform about awaits that aren't part of guaranteed checkpoints - if they're on the path between statements that require checkpoints between them:

I think we can probably get away without this - let's leave it out for now and we can revisit if anyone actually does complain.

@jakkdl jakkdl force-pushed the 7_async_iterable_checkpoints branch 3 times, most recently from 929e5f3 to afc1ee1 Compare August 6, 2022 13:11
@Zac-HD
Copy link
Member

Zac-HD commented Aug 6, 2022

Actually, slightly different request: this continues to be a huge PR (obviously for a nontrivial feature!). Once you've merge in main, let's focus on getting the minimum viable version of this merged; I'm happy to wait for a follow-up before releasing if you think that would leave it unfinished. I just can't meaningfully review a 1.5k-line diff touching several different things 😅

@jakkdl
Copy link
Member Author

jakkdl commented Aug 7, 2022

Sorry yeah I'll get it in working shape and split off into separate pull requests. Requires so much discipline to work this way!
Though the line count will stay kinda huge due to the test files being hundreds of lines long

@jakkdl
Copy link
Member Author

jakkdl commented Aug 7, 2022

Down to 1000 lines added! 🎉 😂

The main stuff is the +210/-58 in flake8_trio.py - tried commenting up stuff so it should be possible to follow the logic (and ofc caught a minor bug in the process).
The test files are sorta messy, having copied tests back and forth between 107/108/109 multiple times, but I did another cleanup pass and there shouldn't be too many redundant tests in them now. Also added a bunch of comments, I'm hopefully not too too deep into the weeds now and incapable of judging what checkpoints are unintuitive or not 😅

Don't think I can meaningfully split off more since 107&108 are so intertwined now :)

I think/hope that transitioning to multiple errors per line is gonna be quick, I'd already started on this branch originally (:flushed:). But branched that off from this one~

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Big diff, but all the prep work has paid off 🙂

flake8_trio.py Show resolved Hide resolved
@Zac-HD Zac-HD merged commit ff2d0b3 into python-trio:main Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants