Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IO documentation #1877

Closed
wants to merge 3 commits into from
Closed

Add IO documentation #1877

wants to merge 3 commits into from

Conversation

slouc
Copy link

@slouc slouc commented Apr 2, 2021

Resolves one part of #1715 .

@djspiewak I'm gonna tag you because you submitted the issue. Apart from local fixes (typos, wording etc.), let me know what you think about it in general. I could contribute more in other parts, but it's my first time contributing documentation, so I'm not sure if I'm breaking any particular conventions here.

Copy link
Member

@kubukoz kubukoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's quite a lot of text/theory/history in here without concrete examples that would showcase the abilities of IO. While that's not bad per see (the explanations are quite good), I think that shouldn't be the main focus of the page.

There's also a lot of information written in a quite dense way, and I feel like each paragraph could easily be a couple paragraphs in a section explaining e.g. what the runtime means, how fibers are represented / how they use threads, what Ref/Deferred are etc. - hoping to see other maintainers' opinions on this, I'd like to avoid having a very long page like in CE2 but still not have too much information in just a couple paragraphs of dense text.

docs/io.md Show resolved Hide resolved
docs/io.md Outdated
- Interpretation of that description

This "programs as data" concept allows us to obtain a value that represents the whole program.
We can handle that value in a referentially transparent way and eventually interpret it, which means translating it into something valuable. For example, a description of a REST API can be interpreted by a web server into a set of endpoints to be served, or by Swagger to generate API documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Referentially transparent" should either be explained before, or avoided - ideally we could explain what IO is without ever introducing RT as a concept. Focusing on laziness and the ability to refactor would be better I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole page makes sense even without this paragraph actually :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a useful couple of sentences and doesn't hurt to have them, but I did omit the RT and introduce laziness / refactoring. Take a look, we can also choose to omit the whole paragraph in the next iteration. 👍

docs/io.md Outdated
We can handle that value in a referentially transparent way and eventually interpret it, which means translating it into something valuable. For example, a description of a REST API can be interpreted by a web server into a set of endpoints to be served, or by Swagger to generate API documentation.

So here's a simple definition: `IO` is a type that Cats Effect uses to capture the description of a program.
This description can then be interpreted by the runtime, evaluating the program and performing its side effects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This description can then be interpreted by the runtime, evaluating the program and performing its side effects.
This description can then be interpreted by the runtime, which will be responsible for evaluating the program and performing its side effects.

docs/io.md Outdated

2. Using `IOApp`.

First option runs the program here and now, and it should be done "at the end of the world".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs explaining what EOW means.

I'd also say that it is the place where all guarantees on laziness and suspended side effects break loose (in other words)

docs/io.md Outdated

First option runs the program here and now, and it should be done "at the end of the world".
Second option is the preferred one - extending the main entry point of our program with `IOApp` allows us to simply provide the final `IO` value (containing the full description of our program) and let the library worry about execution.
Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).
Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run) using various methods on IO and its type class instances.

docs/io.md Outdated
Second option is the preferred one - extending the main entry point of our program with `IOApp` allows us to simply provide the final `IO` value (containing the full description of our program) and let the library worry about execution.
Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).

`IO` was initially conceived as a simple schoolbook example of a Cats Effect type for capturing effects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this paragraph belongs here, it looks more like a changelog / migration guide notice - maybe someone disagrees though :)

I'd focus the page on how IO can be used any why the user would be interested in using it, what magical capabilities it has (both, race, start, parTraverse etc.) instead of the theory / history

Copy link
Author

@slouc slouc Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the second part, we'll add it (see the last commit; only TODOs for now though).

On the first part, well, I disagree that it's migration guide material. It just sets a bit of historical context in one or two sentences. But it's definitely not a hill I would die on 🙂 we can remove it or put it elsewhere if you and/or others really prefer it that way.

docs/io.md Outdated

## IO runs on fibers

We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatmapping is mentioned, but a for comprehension is used. While this is the same thing, it could be non-obvious that flatMap is involved at first sight

docs/io.md Outdated

We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:

```scala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer scala mdoc so that your snippets will be compiled and checked for errors (you can run that code with docs/mdoc in sbt)

docs/io.md Outdated
Every fiber consists of a sequence of such flatmapped `IO` values.
More information on fibers can be found in the [tutorial](tutorial.md), but in a nutshell, that's what a fiber really is. The continuation (sequence of flatmapped `IO` values) is then later assigned by the scheduler to run on one of the available threads.
Don't forget, the `IO` itself merely holds a description of the program - no database calls have yet been made, no HTTP requests sent, nothing.
Only once the runtime interprets it "at the end of the world" (see previous section) will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Only once the runtime interprets it "at the end of the world" (see previous section) will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.
Only once we pass that IO to the runtime, which interprets it "at the end of the world" (see previous section), will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this might be confusing wrt threads, i.e. a single fiber can switch threads at any point

and also, every IO is on the heap anyway whether you run it or not, so it'd be clearer to say that passing an IO to the runtime starts the fibers (no need to mention the heap really, it's a low level detail of the JVM)

docs/io.md Outdated
Having many more fibers than we have threads is thus OK, because a) fibers reside fully in memory, don't block any system resources and have very little context switching overhead, and b) starvation is prevented by cooperative yielding.
This is also the reason why it's OK to "block a fiber" - it's only semantically blocking, because all that's really happening is that the fiber gets taken off its thread. The thread itself is not blocked, and it will happily run another fiber.

Now, if we wanted to run the database query on one fiber, send the request on another, and use a third one for logging, all we'd have to do is append `.start` to each `IO`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't recommend start this early in the documentation, it's a quite low-level construct and most concurrent stuff can be done with combinators like race, both etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of agreed. I wouldn't necessarily hide it from the reader, but I agree it should not be in the spotlight either.

@kubukoz
Copy link
Member

kubukoz commented Apr 2, 2021

@slouc thanks for the contribution! I think we can work with that :) see attached comments and let me know what you think. Also curious about @RaasAhsan @djspiewak @vasilmkd et al.'s opinions

@slouc
Copy link
Author

slouc commented Apr 2, 2021

I think there's quite a lot of text/theory/history in here without concrete examples that would showcase the abilities of IO. While that's not bad per see (the explanations are quite good), I think that shouldn't be the main focus of the page.

There's also a lot of information written in a quite dense way, and I feel like each paragraph could easily be a couple paragraphs in a section explaining e.g. what the runtime means, how fibers are represented / how they use threads, what Ref/Deferred are etc. - hoping to see other maintainers' opinions on this, I'd like to avoid having a very long page like in CE2 but still not have too much information in just a couple paragraphs of dense text.

I think there's quite a lot of text/theory/history in here without concrete examples that would showcase the abilities of IO. While that's not bad per see (the explanations are quite good), I think that shouldn't be the main focus of the page.

There's also a lot of information written in a quite dense way, and I feel like each paragraph could easily be a couple paragraphs in a section explaining e.g. what the runtime means, how fibers are represented / how they use threads, what Ref/Deferred are etc. - hoping to see other maintainers' opinions on this, I'd like to avoid having a very long page like in CE2 but still not have too much information in just a couple paragraphs of dense text.

I'm afraid that if we just show some examples using race, both, ref, deferred etc. then it basically becomes the showcase page for stuff from Spawn / Concurrent. But that being said, I totally get what you mean about dense text packed in a small page. I'd be happy to split it up and add more text in each section.

For example:

- IO:
  - History / theory
  - Runtime (unsafe / IOApp etc)
  - Supported operations

- Fibers
  - Overview (lightweight, semantic blocking)
  - Continuations / run loop
  - Scheduling
  - Cooperative yielding 

It did cross my mind as well that I might be cramping several pages into one, and I considered going for something like the above instead, but then I realised that parts of it (fibers in particular) are already covered in the tutorial. So I didn't want to go and submit a bunch of stuff nobody asked for - instead, I wanted to just cover IO because it's an open issue, and I figured I'd do it by shedding some light on it from several angles.

I think documentation is super important so I'm OK with a bit of back-and-forth and some nitpicking until we all agree 100% on what we want the docs to look like. If we want to go for one long tutorial and short example-based explanations on the side, that's fine, but feels a bit arbitrary to have an IO section and not have a, say, Fibers section. 🤷 BTW I skimmed through your suggestions and they seem fine, but let's first agree on the big picture. 👍

@slouc slouc marked this pull request as draft April 3, 2021 09:49
@slouc
Copy link
Author

slouc commented Apr 3, 2021

@kubukoz Okay, I adopted your comments and spread the text into several different sections. Also marked the PR as draft because we're still sketching the layout.

I didn't introduce new pages yet, everything is still IO, but it's been stretched a bit more, so that we can more easily decide what goes in a different page and what goes away. You'll notice the TODOs on the methods section - these are the blanks that shouldn't be too hard to fill out once we know what we want.

Thanks for all the feedback, it was very useful. 👍 I'm hoping to hear other opinions on this as well.

Copy link
Member

@kubukoz kubukoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to give it more thought but don't really have a mind for this today. I'll try to look again soon

docs/io.md Show resolved Hide resolved
@slouc
Copy link
Author

slouc commented Apr 19, 2021

@kubukoz @djspiewak Hi, I'm going to bump this PR a bit.

My impression is that the general vision for the documentation is not super clear, and I'd love to help, but the only way to converge to something is to have a discussion. Perhaps it's only unclear to me, and there's already an established convention on what we want the docs to look like, but I'm not aware of one. 🤷

So here's a concrete question: How do we envision the documentation page for the IO type?

  1. Listing some of the most important methods available on IO and explaining what they do via examples (the "ScalaDoc" approach)
  2. Telling a story about how IO works, what it means to be a "monad that captures the description of an effectful program" (the "blogpost" approach)
  3. Something in between

The downside of the ScalaDoc approach is, well, we're just repeating the stuff from ScalaDoc. Here's an example of racePair just to illustrate my point. Repeating the information from ScalaDoc is not a big deal in itself, but I feel that people who need information on available methods for things like racing fibers concurrently are usually the ones that are already writing some code, and there's a higher chance that they'll be looking for this info in the ScalaDoc on available IO methods, rather than browsing through docs (I know I do).

On the other hand, beginners will often come to the documentation page first, before they ever lay their eyes (and SBTs) on the library itself. But now we're in a pickle; what if I'm wrong, and what if the biggest proportion of people who read our docs are actually people who are already familiar with the idea of an IO monad and they just want to jump into the technical nitty-gritty that's specific to Cats Effect? They definitely don't need over-verbose explanations of theoretical concepts, they just need "the meat". Your car's technical documentation doesn't tell you anything about combustion engines, does it?

Finally, the downside of the mixed approach is that we could end up being the best of both worlds, but also the worst - for example, theoretical part might be too concise and packed with lots of dense concepts in a very small space, therefore being not really beginner-friendly, or it could become too big and obstruct people from finding more concrete library-related information about available methods and best practices.

Maybe I'm overthinking this, but I'd really like to help make CE more approachable and "consumable", and I think these conversations are important. Feel free to suggest some other way forward (for starters, maybe this discussion belongs on Gitter rather than in a PR?). Also, as I said in the beginning, let me know if I'm the only one who's in the dark here, and the consensus already exists - in that case, all I need is a couple of pointers and I can then rework this PR into a mergable version.

@slouc
Copy link
Author

slouc commented Apr 19, 2021

Also, to nudge us further in some constructive direction, I'm going to suggest how I'd do it - number 3, but with separate sections. We currently have

  • Overview
  • Typeclasses
  • Standard Library

One could argue that typeclasses correspond to kernel, standard library corresponds to std, and we could/should have a separate big section for core:

  • Overview
  • Typeclasses
  • Standard Library
  • IO

Then we could pack at least three subpages in there:

  • A theoretical and perhaps even somewhat historical description (I'm fine with omitting historical tho)
  • A more technical one that resembles a user's manual with listed methods and examples
  • And possibly even a third one with some best practices specific to IO (e.g. methods that are not in the type class hierarchy, maybe some caveats etc)

@djspiewak
Copy link
Member

Sorry for the manic delay on this! 👀

@djspiewak djspiewak added this to the v3.3.0 milestone Aug 30, 2021
@djspiewak djspiewak modified the milestones: v3.3.0, v3.3.1 Nov 15, 2021
@djspiewak djspiewak closed this Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants