Add a schematic state machine implementing Future (#2048)

@fw-immunant generated something like this on-the-fly in teaching the course, and I thought it was great. I think having a _schematic_ understanding of what's going on here helps students through some of the pitfalls. Particularly, it motivates `Pin`, which is where @fw-immunant did this derivation.
2025-07-04 05:40:29 +02:00 · 2025-06-03 08:45:22 -04:00
parent 2f37846e44
commit f9e58d9596
4 changed files with 131 additions and 15 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -407,6 +407,7 @@
 - [Async Basics](concurrency/async.md)
  - [`async`/`await`](concurrency/async/async-await.md)
  - [Futures](concurrency/async/futures.md)
+  - [State Machine](concurrency/async/state-machine.md)
  - [Runtimes](concurrency/async/runtimes.md)
    - [Tokio](concurrency/async/runtimes/tokio.md)
  - [Tasks](concurrency/async/tasks.md)
--- a/src/concurrency/async-pitfalls/pin.md
+++ b/src/concurrency/async-pitfalls/pin.md
@ -4,13 +4,10 @@ minutes: 20

 # `Pin`

-Async blocks and functions return types implementing the `Future` trait. The
-type returned is the result of a compiler transformation which turns local
-variables into data stored inside the future.
-
-Some of those variables can hold pointers to other local variables. Because of
-that, the future should never be moved to a different memory location, as it
-would invalidate those pointers.
+Recall an async function or block creates a type implementing `Future` and
+containing all of the local variables. Some of those variables can hold
+references (pointers) to other local variables. To ensure those remain valid,
+the future can never be moved to a different memory location.

 To prevent moving the future type in memory, it can only be polled through a
 pinned pointer. `Pin` is a wrapper around a reference that disallows all
--- a/src/concurrency/async/futures.md
+++ b/src/concurrency/async/futures.md
@ -36,14 +36,11 @@ pause until that Future is ready, and then evaluates to its output.
 - The `Future` and `Poll` types are implemented exactly as shown; click the
  links to show the implementations in the docs.

- We will not get to `Pin` and `Context`, as we will focus on writing async
-  code, rather than building new async primitives. Briefly:
-
-  - `Context` allows a Future to schedule itself to be polled again when an
-    event occurs.
+- `Context` allows a Future to schedule itself to be polled again when an event
+  such as a timeout occurs.

 - `Pin` ensures that the Future isn't moved in memory, so that pointers into
-    that future remain valid. This is required to allow references to remain
-    valid after an `.await`.
+  that future remain valid. This is required to allow references to remain valid
+  after an `.await`. We will address `Pin` in the "Pitfalls" segment.

 </details>
--- a/src/concurrency/async/state-machine.md
+++ b/src/concurrency/async/state-machine.md
@ -0,0 +1,121 @@
+---
+minutes: 10
+---
+
+# State Machine
+
+Rust transforms an async function or block to a hidden type that implements
+`Future`, using a state machine to track the function's progress. The details of
+this transform are complex, but it helps to have a schematic understanding of
+what is happening. The following function
+
+```rust,compile_fail
+/// Sum two D10 rolls plus a modifier.
+async fn two_d10(modifier: u32) -> u32 {
+    let first_roll = roll_d10().await;
+    let second_roll = roll_d10().await;
+    first_roll + second_roll + modifier
+}
+```
+
+is transformed to something like
+
+```rust,editable,compile_fail
+use std::future::Future;
+use std::pin::Pin;
+use std::task::{Context, Poll};
+
+/// Sum two D10 rolls plus a modifier.
+fn two_d10(modifier: u32) -> TwoD10 {
+    TwoD10::Init { modifier }
+}
+
+enum TwoD10 {
+    // Function has not begun yet.
+    Init { modifier: u32 },
+    // Waitig for first `.await` to complete.
+    FirstRoll { modifier: u32, fut: RollD10Future },
+    // Waitig for second `.await` to complete.
+    SecondRoll { modifier: u32, first_roll: u32, fut: RollD10Future },
+}
+
+impl Future for TwoD10 {
+    type Output = u32;
+    fn poll(mut self: Pin<&mut Self>, ctx: &mut Context) -> Poll<Self::Output> {
+        loop {
+            match *self {
+                TwoD10::Init { modifier } => {
+                    // Create future for first dice roll.
+                    let fut = roll_d10();
+                    *self = TwoD10::FirstRoll { modifier, fut };
+                }
+                TwoD10::FirstRoll { modifier, ref mut fut } => {
+                    // Poll sub-future for first dice roll.
+                    if let Poll::Ready(first_roll) = fut.poll(ctx) {
+                        // Create future for second roll.
+                        let fut = roll_d10();
+                        *self = TwoD10::SecondRoll { modifier, first_roll, fut };
+                    } else {
+                        return Poll::Pending;
+                    }
+                }
+                TwoD10::SecondRoll { modifier, first_roll, ref mut fut } => {
+                    // Poll sub-future for second dice roll.
+                    if let Poll::Ready(second_roll) = fut.poll(ctx) {
+                        return Poll::Ready(first_roll + second_roll + modifier);
+                    } else {
+                        return Poll::Pending;
+                    }
+                }
+            }
+        }
+    }
+}
+```
+
+<details>
+
+This example is illustrative, and isn't an accurate representation of the Rust
+compiler's transformation. The important things to notice here are:
+
+- Calling an async function does nothing but construct and return a future.
+- All local variables are stored in the function's future, using an enum to
+  identify where execution is currently suspended.
+- An `.await` in the async function is translated into an a new state containing
+  all live variables and the awaited future. The `loop` then handles that
+  updated state, polling the future until it returns `Poll::Ready`.
+- Execution continues eagerly until a `Poll::Pending` occurs. In this simple
+  example, every future is ready immediately.
+- `main` contains a naïve executor, which just busy-loops until the future is
+  ready. We will discuss real executors shortly.
+
+# More to Explore
+
+Imagine the `Future` data structure for a deeply nested stack of async
+functions. Each function's `Future` contains the `Future` structures for the
+functions it calls. This can result in unexpectedly large compiler-generated
+`Future` types.
+
+This also means that recursive async functions are challenging. Compare to the
+common error of building recursive type, such as
+
+```rust,compile_fail
+enum LinkedList<T> {
+    Node { value: T, next: LinkedList<T> },
+    Nil,
+}
+```
+
+The fix for a recursive type is to add a layer of indrection, such as with
+`Box`. Similarly, a recursive async function must box the recursive future:
+
+```rust,editable
+async fn count_to(n: u32) {
+    if n > 0 {
+        Box::pin(count_to(n - 1)).await;
+        println!("{n}");
+    }
+}
+```
+
+</details>