Skip to content

Latest commit

 

History

History
316 lines (231 loc) · 21.6 KB

File metadata and controls

316 lines (231 loc) · 21.6 KB

Step 3.11: Async I/O, futures and actors

While threads represent a solution for CPU-bound problems, for I/O-bound problems, traditionally, the solution is async (non-blocking) I/O.

As of now, Rust has no async primitives in its standard library yet, so "by default" std I/O works in a synchronous manner (blocks the current thread). However, it provides core abstractions for building ones, using which, ecosystem crates (like tokio) implement and provide primitives for async I/O.

It's important to note, that async story in Rust is still maturing. That's why things could be quite cumbersome at the moment, often causing frustration (especially, when it comes to abstractions). wg-async (async working group) works on making this easier, simpler, more ergonomic and powerful in the future.

Future

The basic primitive of async story in Rust is a future abstraction (also often called "promise" in some other programming languages). There are two major concepts which differ Rust implementation of futures from other programming languages:

  1. Futures are poll-based rather than push-based. This means that after creation, a future is not going to be executed automatically in-place, but rather should be explicitly executed by some executor (runtime/event-loop for futures). Future does nothing unless polled, so generally represents a lazy computation.
  2. Futures are zero cost. This means that the code written on futures compiles down to something equivalent (or better than) a “hand-rolled” implementation, which would typically use manual state machines and careful memory management.

Rust provides only basic trait definitions in the std::future module of its standard library. To use futures with all its power, consider to use the futures crate (and/or similar ones like futures-lite, futures-time, etc).

To understand Rust futures concepts and design better, read through the following articles:

It's important to mention, that before futures design has been stabilized, for quite a long period of time Rust ecosystem used futures@0.1 crate, which resulted in a big part of ecosystem being built on top of them. Hopefully, as for now, only quite few outdated or dead crates still do use futures@0.1, and, fortunately, they still can be used simultaneously with the modern std::future-based ecosystem by using the compatibility layer.

async/.await

async/.await keywords make async programming much more intuitive, ergonomic, and solves numerous problems with types and borrows (which may be quite tricky when using raw futures).

Use async in front of fn, closure, or a block to turn the marked code into a Future. As such the code will not be run immediately, but will only be evaluated when the returned Future is .awaited.

Rust automatically desugars async functions and blocks into the ones returning a Future, applying the correct lifetime capturing and elision rules for the syntax ergonomics.

Though, async keyword in not supported in trait methods yet, there is the async-trait crate, which allows this for traits by desugaring into a Boxed Future (the main downside of which is being non-transparent over auto-traits like Send/Sync).

For better understanding async/.await keywords design, desugaring, usage, and features, read through the following articles:

Tasks and Waker

Except the future abstraction itself, it's important to understand what is an asynchronous task:

Each time a future is polled, it is polled as part of a "task". Tasks are the top-level futures that have been submitted to an executor.

When a task is suspended due to waiting some non-blocking operation to complete (it's used to call it "parked"), there should be a way to signal an executor to continue polling this task once the operation finishes. The Waker (being provided in the task::Context) serves exactly this purpose:

Waker provides a wake() method that can be used to tell the executor that the associated task should be awoken. When wake() is called, the executor knows that the task associated with the Waker is ready to make progress, and its future should be polled again.

For better understanding Waker design, usage, and features, read through the following articles:

More reading

Async I/O

Async I/O in Rust is possible due to two main ingredients: non-blocking I/O operations provided by operating system and an asynchronous runtime, which wraps those operations into usable asynchronous abstractions and provides an event loop for executing and driving them to completion.

For better understanding [mio] and [tokio] design, concepts, usage, and features, read through the following articles:

Non-blocking I/O

The async programming is not possible without support for non-blocking I/O, which is represented by various APIs on different operating systems, for example: epoll on Linux (or promising io_uring), kqueue on macOS/iOS, IOCP on Windows.

The low-level crates, like mio (powering tokio) and polling (powering async-std), provide a single multi-platform unified interface to the majority of those APIs. There are also low-level crates, specialized on a concrete API, like io-uring.

Runtime

The high-level crates, like tokio (pioneer and most mature, by far) and async-std (not to be confused by its name, it's neither official, nor std-related, just a name chosen by authors), provide not only an executor implementation for executing Futures, but also high-level APIs for non-blocking I/O, timers, and synchronization primitives for use in asynchronous contexts (usual synchronization primitives cannot be used across .await points as they will block the whole executor in its current thread).

All Rust asynchronous runtimes for Futures implement the idea of cooperative multitasking, meaning that the tasks (Futures in our case) yield control back to their runtime voluntarily (on .await points in our case), in contrast with preemptive multitasking where the runtime can suspend and take control back whenever it decides to (like in OS threads or Erlang VM). This gives the benefit of precise control on what is executed and how, but has the disadvantage of requiring to take great care about how asynchronous tasks are organized (like avoiding blocking them with synchronous or CPU-bound operations and yielding manually in busy loops).

Also, important to classify Rust asynchronous runtimes in the following manner:

Unfortunately, at the moment, there is no meaningful way to abstract over multiple asynchronous runtimes in Rust. That's why authors of the libraries using non-blocking I/O either stick with a single concrete runtime only (tokio, mostly), or support multiple runtimes via Cargo features.

For better understanding, read through the following articles:

Actors

Actor model is another very spread and famous concurrency programming paradigm. It fits quite good for solving major concurrent communication problems, so many languages adopted it as their main concurrency paradigm (the most famous implementations are Akka and Erlang).

Actor model was put forth by Carl Hewitt in 1973 and it adopts the philosophy that everything is an actor. This is similar to the everything is an object philosophy used by some object-oriented programming languages.

It is inherently asynchronous, a message sender will not block whether the reader is ready to pull from the mailbox or not, instead the message goes into a queue usually called a "mailbox". Which is convenient, but it's a bit harder to reason about and mailboxes potentially have to hold a lot of messages.

Each process has a single mailbox, messages are put into the receiver's mailbox by the sender, and fetched by the receiver.

It's somewhat very similar to and interchangeable with Communicating Sequential Processes (CSP) model, as operates on the same level of abstractions, but the main difference can be described like this:

Actors model represents identifiable processes (actors) with non-identifiable communication (message delivery), while CSP model represents non-identifiable processes with identifiable communication (channels). To deliver a message in actors model we should "name" the actor, while in CSP model we should "name" the channel.

In Rust, actor abstraction is mainly useful for expressing some long-living state to communicate with (like background worker or WebSocket connection, for example).

The most famous actors implementation in Rust is actix. At the time it was designed, it also served as a "glue" to unite sync and async worlds, providing both sync and async actors implementations. Nowadays, however, using spawn_blocking() is usually a more convenient alternative for this.

quickwit-actors is another simple implementation of actors, with its own advantages, built specifically for Quickwit needs.

More general-purpose and complex actors system implementations (similar to Akka) are bastion and riker.

For better understanding actors design, concepts, usage, and implementations, read through the following articles:

Mutlithreading vs Async

Multithreading programming is all about concurrent execution of different functions. Async programming is about non-blocking execution between functions, and we can apply async with single-threaded or multithreaded programming.

So, multithreading is one form of asynchronous programming.

Let’s take a simple analogy:

  • Synchronous: you cook the eggs, then you cook the toast.
  • Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
  • Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.

From that analogy, we can conclude that Multithreading is about workers, Async is about tasks.

Synchronous vs Async vs Multithreading

Task

Estimated time: 2 days

Implement an async-driven CLI tool, which downloads specified web pages:

cargo run -p step_3_11 -- [--max-threads=<number>] <file>

It must read a list of links from the <file>, and then concurrently download a content of each link into a separate .html file (named by a link).

--max-threads argument must control the maximum number of simultaneously running threads in the program (should default to CPUs number).

Questions

After completing everything above, you should be able to answer (and understand why) the following questions:

  • What is asynchronous programming? How does it relate to multithreading? Which problems does it solve? What are the prerequisites for its existing?
  • How does non-blocking I/O works? How does it differs from blocking I/O?
  • What is a Future? Why do we need it? How does it work in Rust and how do its semantics differ from other programming languages? What makes it zero-cost?
  • What is async/.await? Ho do they desugar into a Future? Why are they vital for ergonomics?
  • What is an asynchronous task? How does it compare to a Future?
  • What is a Waker? How does it work? Why is it required?
  • What is an asynchronous runtime? From which parts does it usually consist?
  • What kind of multitasking is represented by Futures in Rust? Which advantages and disadvantages does it have?
  • What kinds of asynchronous runtimes do exist in Rust regarding multithreading? Which advantages and disadvantages does each one have?
  • Why blocking an asynchronous runtime is bad? How to avoid it in practice?
  • What are the key points of actor model concurrency paradigm? How may it be useful in Rust?