Add IO documentation #1877

slouc · 2021-04-02T16:47:34Z

Resolves one part of #1715 .

@djspiewak I'm gonna tag you because you submitted the issue. Apart from local fixes (typos, wording etc.), let me know what you think about it in general. I could contribute more in other parts, but it's my first time contributing documentation, so I'm not sure if I'm breaking any particular conventions here.

kubukoz

I think there's quite a lot of text/theory/history in here without concrete examples that would showcase the abilities of IO. While that's not bad per see (the explanations are quite good), I think that shouldn't be the main focus of the page.

There's also a lot of information written in a quite dense way, and I feel like each paragraph could easily be a couple paragraphs in a section explaining e.g. what the runtime means, how fibers are represented / how they use threads, what Ref/Deferred are etc. - hoping to see other maintainers' opinions on this, I'd like to avoid having a very long page like in CE2 but still not have too much information in just a couple paragraphs of dense text.

docs/io.md

kubukoz · 2021-04-02T19:25:30Z

docs/io.md

+- Interpretation of that description
+
+This "programs as data" concept allows us to obtain a value that represents the whole program.
+We can handle that value in a referentially transparent way and eventually interpret it, which means translating it into something valuable. For example, a description of a REST API can be interpreted by a web server into a set of endpoints to be served, or by Swagger to generate API documentation.


"Referentially transparent" should either be explained before, or avoided - ideally we could explain what IO is without ever introducing RT as a concept. Focusing on laziness and the ability to refactor would be better I think

I think the whole page makes sense even without this paragraph actually :)

I think it's a useful couple of sentences and doesn't hurt to have them, but I did omit the RT and introduce laziness / refactoring. Take a look, we can also choose to omit the whole paragraph in the next iteration. 👍

kubukoz · 2021-04-02T19:26:15Z

docs/io.md

+We can handle that value in a referentially transparent way and eventually interpret it, which means translating it into something valuable. For example, a description of a REST API can be interpreted by a web server into a set of endpoints to be served, or by Swagger to generate API documentation.
+
+So here's a simple definition: `IO` is a type that Cats Effect uses to capture the description of a program.
+This description can then be interpreted by the runtime, evaluating the program and performing its side effects.


Suggested change

This description can then be interpreted by the runtime, evaluating the program and performing its side effects.

This description can then be interpreted by the runtime, which will be responsible for evaluating the program and performing its side effects.

kubukoz · 2021-04-02T19:27:14Z

docs/io.md

+
+2. Using `IOApp`.
+
+First option runs the program here and now, and it should be done "at the end of the world".


needs explaining what EOW means.

I'd also say that it is the place where all guarantees on laziness and suspended side effects break loose (in other words)

kubukoz · 2021-04-02T19:27:49Z

docs/io.md

+
+First option runs the program here and now, and it should be done "at the end of the world".
+Second option is the preferred one - extending the main entry point of our program with `IOApp` allows us to simply provide the final `IO` value (containing the full description of our program) and let the library worry about execution.
+Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).


Suggested change

Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).

Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run) using various methods on IO and its type class instances.

kubukoz · 2021-04-02T19:29:15Z

docs/io.md

+Second option is the preferred one - extending the main entry point of our program with `IOApp` allows us to simply provide the final `IO` value (containing the full description of our program) and let the library worry about execution.
+Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).
+
+`IO` was initially conceived as a simple schoolbook example of a Cats Effect type for capturing effects.


I'm not sure this paragraph belongs here, it looks more like a changelog / migration guide notice - maybe someone disagrees though :)

I'd focus the page on how IO can be used any why the user would be interested in using it, what magical capabilities it has (both, race, start, parTraverse etc.) instead of the theory / history

Agreed on the second part, we'll add it (see the last commit; only TODOs for now though).

On the first part, well, I disagree that it's migration guide material. It just sets a bit of historical context in one or two sentences. But it's definitely not a hill I would die on 🙂 we can remove it or put it elsewhere if you and/or others really prefer it that way.

kubukoz · 2021-04-02T19:29:50Z

docs/io.md

+
+## IO runs on fibers
+
+We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:


flatmapping is mentioned, but a for comprehension is used. While this is the same thing, it could be non-obvious that flatMap is involved at first sight

kubukoz · 2021-04-02T19:30:22Z

docs/io.md

+
+We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:
+
+```scala


Prefer scala mdoc so that your snippets will be compiled and checked for errors (you can run that code with docs/mdoc in sbt)

kubukoz · 2021-04-02T19:31:08Z

docs/io.md

+Every fiber consists of a sequence of such flatmapped `IO` values.
+More information on fibers can be found in the [tutorial](tutorial.md), but in a nutshell, that's what a fiber really is. The continuation (sequence of flatmapped `IO` values) is then later assigned by the scheduler to run on one of the available threads. 
+Don't forget, the `IO` itself merely holds a description of the program - no database calls have yet been made, no HTTP requests sent, nothing. 
+Only once the runtime interprets it "at the end of the world" (see previous section) will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.


Suggested change

Only once the runtime interprets it "at the end of the world" (see previous section) will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.

Only once we pass that IO to the runtime, which interprets it "at the end of the world" (see previous section), will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.

also this might be confusing wrt threads, i.e. a single fiber can switch threads at any point

and also, every IO is on the heap anyway whether you run it or not, so it'd be clearer to say that passing an IO to the runtime starts the fibers (no need to mention the heap really, it's a low level detail of the JVM)

kubukoz · 2021-04-02T19:33:26Z

docs/io.md

+Having many more fibers than we have threads is thus OK, because a) fibers reside fully in memory, don't block any system resources and have very little context switching overhead, and b) starvation is prevented by cooperative yielding.
+This is also the reason why it's OK to "block a fiber" - it's only semantically blocking, because all that's really happening is that the fiber gets taken off its thread. The thread itself is not blocked, and it will happily run another fiber.
+
+Now, if we wanted to run the database query on one fiber, send the request on another, and use a third one for logging, all we'd have to do is append `.start` to each `IO`.


I wouldn't recommend start this early in the documentation, it's a quite low-level construct and most concurrent stuff can be done with combinators like race, both etc.

Kind of agreed. I wouldn't necessarily hide it from the reader, but I agree it should not be in the spotlight either.

kubukoz · 2021-04-02T19:37:54Z

@slouc thanks for the contribution! I think we can work with that :) see attached comments and let me know what you think. Also curious about @RaasAhsan @djspiewak @vasilmkd et al.'s opinions

slouc · 2021-04-02T20:47:20Z

I think there's quite a lot of text/theory/history in here without concrete examples that would showcase the abilities of IO. While that's not bad per see (the explanations are quite good), I think that shouldn't be the main focus of the page.

There's also a lot of information written in a quite dense way, and I feel like each paragraph could easily be a couple paragraphs in a section explaining e.g. what the runtime means, how fibers are represented / how they use threads, what Ref/Deferred are etc. - hoping to see other maintainers' opinions on this, I'd like to avoid having a very long page like in CE2 but still not have too much information in just a couple paragraphs of dense text.

I'm afraid that if we just show some examples using race, both, ref, deferred etc. then it basically becomes the showcase page for stuff from Spawn / Concurrent. But that being said, I totally get what you mean about dense text packed in a small page. I'd be happy to split it up and add more text in each section.

For example:

- IO:
  - History / theory
  - Runtime (unsafe / IOApp etc)
  - Supported operations

- Fibers
  - Overview (lightweight, semantic blocking)
  - Continuations / run loop
  - Scheduling
  - Cooperative yielding

It did cross my mind as well that I might be cramping several pages into one, and I considered going for something like the above instead, but then I realised that parts of it (fibers in particular) are already covered in the tutorial. So I didn't want to go and submit a bunch of stuff nobody asked for - instead, I wanted to just cover IO because it's an open issue, and I figured I'd do it by shedding some light on it from several angles.

I think documentation is super important so I'm OK with a bit of back-and-forth and some nitpicking until we all agree 100% on what we want the docs to look like. If we want to go for one long tutorial and short example-based explanations on the side, that's fine, but feels a bit arbitrary to have an IO section and not have a, say, Fibers section. 🤷 BTW I skimmed through your suggestions and they seem fine, but let's first agree on the big picture. 👍

slouc · 2021-04-03T09:55:36Z

@kubukoz Okay, I adopted your comments and spread the text into several different sections. Also marked the PR as draft because we're still sketching the layout.

I didn't introduce new pages yet, everything is still IO, but it's been stretched a bit more, so that we can more easily decide what goes in a different page and what goes away. You'll notice the TODOs on the methods section - these are the blanks that shouldn't be too hard to fill out once we know what we want.

Thanks for all the feedback, it was very useful. 👍 I'm hoping to hear other opinions on this as well.

kubukoz

I need to give it more thought but don't really have a mind for this today. I'll try to look again soon

docs/io.md

slouc · 2021-04-19T13:41:57Z

@kubukoz @djspiewak Hi, I'm going to bump this PR a bit.

My impression is that the general vision for the documentation is not super clear, and I'd love to help, but the only way to converge to something is to have a discussion. Perhaps it's only unclear to me, and there's already an established convention on what we want the docs to look like, but I'm not aware of one. 🤷

So here's a concrete question: How do we envision the documentation page for the IO type?

Listing some of the most important methods available on IO and explaining what they do via examples (the "ScalaDoc" approach)
Telling a story about how IO works, what it means to be a "monad that captures the description of an effectful program" (the "blogpost" approach)
Something in between

The downside of the ScalaDoc approach is, well, we're just repeating the stuff from ScalaDoc. Here's an example of racePair just to illustrate my point. Repeating the information from ScalaDoc is not a big deal in itself, but I feel that people who need information on available methods for things like racing fibers concurrently are usually the ones that are already writing some code, and there's a higher chance that they'll be looking for this info in the ScalaDoc on available IO methods, rather than browsing through docs (I know I do).

On the other hand, beginners will often come to the documentation page first, before they ever lay their eyes (and SBTs) on the library itself. But now we're in a pickle; what if I'm wrong, and what if the biggest proportion of people who read our docs are actually people who are already familiar with the idea of an IO monad and they just want to jump into the technical nitty-gritty that's specific to Cats Effect? They definitely don't need over-verbose explanations of theoretical concepts, they just need "the meat". Your car's technical documentation doesn't tell you anything about combustion engines, does it?

Finally, the downside of the mixed approach is that we could end up being the best of both worlds, but also the worst - for example, theoretical part might be too concise and packed with lots of dense concepts in a very small space, therefore being not really beginner-friendly, or it could become too big and obstruct people from finding more concrete library-related information about available methods and best practices.

Maybe I'm overthinking this, but I'd really like to help make CE more approachable and "consumable", and I think these conversations are important. Feel free to suggest some other way forward (for starters, maybe this discussion belongs on Gitter rather than in a PR?). Also, as I said in the beginning, let me know if I'm the only one who's in the dark here, and the consensus already exists - in that case, all I need is a couple of pointers and I can then rework this PR into a mergable version.

slouc · 2021-04-19T13:50:09Z

Also, to nudge us further in some constructive direction, I'm going to suggest how I'd do it - number 3, but with separate sections. We currently have

Overview
Typeclasses
Standard Library

One could argue that typeclasses correspond to kernel, standard library corresponds to std, and we could/should have a separate big section for core:

Overview
Typeclasses
Standard Library
IO

Then we could pack at least three subpages in there:

A theoretical and perhaps even somewhat historical description (I'm fine with omitting historical tho)
A more technical one that resembles a user's manual with listed methods and examples
And possibly even a third one with some best practices specific to IO (e.g. methods that are not in the type class hierarchy, maybe some caveats etc)

djspiewak · 2021-04-26T17:43:48Z

Sorry for the manic delay on this! 👀

Add IO documentation

9523dac

kubukoz reviewed Apr 2, 2021

View reviewed changes

rework the IO docs to reduce density and adopt some PR comments

fc2b40f

slouc marked this pull request as draft April 3, 2021 09:49

make IO docs code snippet compile

7b1a725

kubukoz reviewed Apr 3, 2021

View reviewed changes

docs/io.md Show resolved Hide resolved

djspiewak added the 📚 docs label Apr 10, 2021

djspiewak added this to the v3.3.0 milestone Aug 30, 2021

i10416 mentioned this pull request Oct 8, 2021

get sidebars-todo.json up-to-date #2398

Merged

djspiewak modified the milestones: v3.3.0, v3.3.1 Nov 15, 2021

djspiewak closed this Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IO documentation #1877

Add IO documentation #1877

slouc commented Apr 2, 2021

kubukoz left a comment

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

slouc Apr 3, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

slouc Apr 3, 2021 •

edited

Loading

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

kubukoz Apr 2, 2021

slouc Apr 3, 2021

kubukoz commented Apr 2, 2021

slouc commented Apr 2, 2021 •

edited

Loading

slouc commented Apr 3, 2021

kubukoz left a comment

slouc commented Apr 19, 2021

slouc commented Apr 19, 2021 •

edited

Loading

djspiewak commented Apr 26, 2021

	This description can then be interpreted by the runtime, evaluating the program and performing its side effects.
	This description can then be interpreted by the runtime, which will be responsible for evaluating the program and performing its side effects.


		2. Using `IOApp`.

		First option runs the program here and now, and it should be done "at the end of the world".

	Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run).
	Note that we still have the option of controlling the details of that execution (e.g. on which thread pool to run) using various methods on IO and its type class instances.


		## IO runs on fibers

		We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:


		We build our program description by using various `IO` constructs, most importantly flatmapping different `IO` values into a chain of computations:

		```scala

	Only once the runtime interprets it "at the end of the world" (see previous section) will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.
	Only once we pass that IO to the runtime, which interprets it "at the end of the world" (see previous section), will our `IO` chain be materialized on the heap in the form of a fiber, ready to be selected by the scheduler and submitted for execution on a thread.

Add IO documentation #1877

Add IO documentation #1877

Conversation

slouc commented Apr 2, 2021

kubukoz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slouc Apr 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kubukoz commented Apr 2, 2021

slouc commented Apr 2, 2021 • edited Loading

slouc commented Apr 3, 2021

kubukoz left a comment

Choose a reason for hiding this comment

slouc commented Apr 19, 2021

slouc commented Apr 19, 2021 • edited Loading

djspiewak commented Apr 26, 2021

slouc Apr 3, 2021 •

edited

Loading

slouc commented Apr 2, 2021 •

edited

Loading

slouc commented Apr 19, 2021 •

edited

Loading