1
0
mirror of https://github.com/google/comprehensive-rust.git synced 2025-08-08 00:12:51 +02:00

rework the initial typestate no-generic content

This commit is contained in:
Glen De Cauwsemaecker
2025-08-02 11:46:36 +02:00
parent 4b0870eb35
commit 14cc136c3e
3 changed files with 156 additions and 63 deletions

View File

@ -438,6 +438,7 @@
- [Parse, Don't Validate](idiomatic/leveraging-the-type-system/newtype-pattern/parse-don-t-validate.md)
- [Is It Encapsulated?](idiomatic/leveraging-the-type-system/newtype-pattern/is-it-encapsulated.md)
- [Typestate Pattern](idiomatic/leveraging-the-type-system/typestate-pattern.md)
- [Typestate Pattern Example](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-example.md)
- [Typestate Pattern with Generics](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics.md)
---

View File

@ -1,94 +1,88 @@
---
minutes: 15
minutes: 30
---
## Typestate Pattern
## Typestate Pattern: Problem
Typestate is the practice of encoding a part of the state of the value in its type, preventing incorrect or inapplicable operations from being called on the value.
How can we ensure that only valid operations are allowed on a value based on its
current state?
```rust,editable
use std::fmt::Write as _;
```rust
# use std::fmt::Write;
#[derive(Default)]
struct Serializer { output: String }
struct SerializeStruct { serializer: Serializer }
impl Serializer {
fn serialize_struct(mut self, name: &str) -> SerializeStruct {
let _ = writeln!(&mut self.output, "{name} {{");
SerializeStruct { serializer: self }
}
struct Serializer {
output: String,
}
impl SerializeStruct {
fn serialize_field(mut self, key: &str, value: &str) -> Self {
let _ = writeln!(&mut self.serializer.output, " {key}={value};");
self
impl Serializer {
fn serialize_struct_start(&mut self, name: &str) {
let _ = writeln!(&mut self.output, "{name} {{");
}
fn finish_struct(mut self) -> Serializer {
self.serializer.output.push_str("}\n");
self.serializer
fn serialize_struct_field(&mut self, key: &str, value: &str) {
let _ = writeln!(&mut self.output, " {key}={value};");
}
fn serialize_struct_end(&mut self) {
self.output.push_str("}\n");
}
fn finish(self) -> String {
self.output
}
}
fn main() {
let serializer = Serializer::default()
.serialize_struct("User")
.serialize_field("id", "42")
.serialize_field("name", "Alice")
.finish_struct();
println!("{}", serializer.output);
let mut serializer = Serializer::default();
serializer.serialize_struct_start("User");
serializer.serialize_struct_field("id", "42");
serializer.serialize_struct_field("name", "Alice");
// serializer.serialize_struct_end(); // ← Oops! Forgotten
println!("{}", serializer.finish());
}
```
<details>
- This example is inspired by
[Serde's `Serializer` trait](https://docs.rs/serde/latest/serde/ser/trait.Serializer.html).
For a deeper explanation of how Serde models serialization as a state machine,
see <https://serde.rs/impl-serializer.html>.
- The typestate pattern allows us to model state machines using Rust’s type
system. In this case, the state machine is a simple serializer.
- The key idea is that at each state in the process, we can only
do the actions which are valid for that state. Transitions between
states happen by consuming one value and producing another.
- This `Serializer` is meant to write a structured value. The expected usage
follows this sequence:
```bob
+------------+ serialize struct +-----------------+
| Serializer +-------------------->| SerializeStruct |<-------+
+------------+ +-+-----+---------+ |
^ | | |
| finish struct | | serialize field |
+-----------------------------+ +------------------+
serialize struct start
-+---------------------
|
+--> serialize struct field
-+---------------------
|
+--> serialize struct field
-+---------------------
|
+--> serialize struct end
```
- In the example above:
- However, in this example we forgot to call `serialize_struct_end()` before
`finish()`. As a result, the serialized output is incomplete or syntactically
incorrect.
- Once we begin serializing a struct, the `Serializer` is moved into the
`SerializeStruct` state. At that point, we no longer have access to the
original `Serializer`.
- One approach to fix this would be to track internal state manually, and return
a `Result` from methods like `serialize_struct_field()` or `finish()` if the
current state is invalid.
- While in the `SerializeStruct` state, we can only call methods related to
writing fields. We cannot use the same instance to serialize a tuple, list,
or primitive. Those constructors simply do not exist here.
- But this has downsides:
- Only after calling `finish_struct` do we get the `Serializer` back. At that
point, we can inspect the output or start a new serialization session.
- It is easy to get wrong as an implementer. Rust’s type system cannot help
enforce the correctness of our state transitions.
- If we forget to call `finish_struct` and drop the `SerializeStruct` instead,
the original `Serializer` is lost. This ensures that incomplete or invalid
output can never be observed.
- It also adds unnecessary burden on the user, who must handle `Result` values
for operations that are misused in source code rather than at runtime.
- By contrast, if all methods were defined on `Serializer` itself, nothing would
prevent users from mixing serialization modes or leaving a struct unfinished.
- A better solution is to model the valid state transitions directly in the type
system.
- This pattern avoids such misuse by making it **impossible to represent invalid
transitions**.
- One downside of typestate modeling is potential code duplication between
states. In the next section, we will see how to use **generics** to reduce
duplication while preserving correctness.
In the next slide, we will apply the **typestate pattern** to enforce correct
usage at compile time and make invalid states unrepresentable.
</details>

View File

@ -0,0 +1,98 @@
## Typestate Pattern: Example
The typestate pattern encodes part of a value’s runtime state into its type.
This allows us to prevent invalid or inapplicable operations at compile time.
```rust,editable
use std::fmt::Write as _;
#[derive(Default)]
struct Serializer {
output: String,
}
struct SerializeStruct {
serializer: Serializer,
}
impl Serializer {
fn serialize_struct(mut self, name: &str) -> SerializeStruct {
let _ = writeln!(&mut self.output, "{name} {{");
SerializeStruct { serializer: self }
}
fn finish(self) -> String {
self.output
}
}
impl SerializeStruct {
fn serialize_field(mut self, key: &str, value: &str) -> Self {
let _ = writeln!(&mut self.serializer.output, " {key}={value};");
self
}
fn finish_struct(mut self) -> Serializer {
self.serializer.output.push_str("}\n");
self.serializer
}
}
fn main() {
let serializer = Serializer::default()
.serialize_struct("User")
.serialize_field("id", "42")
.serialize_field("name", "Alice")
.finish_struct();
println!("{}", serializer.finish());
}
```
<details>
- This example is inspired by Serde’s
[`Serializer` trait](https://docs.rs/serde/latest/serde/ser/trait.Serializer.html).
Serde uses typestates internally to ensure serialization follows a valid
structure. For more, see: <https://serde.rs/impl-serializer.html>
- The key idea behind typestate is that state transitions happen by consuming a
value and producing a new one. At each step, only operations valid for that
state are available.
```bob
+------------+ serialize struct +-----------------+
| Serializer +-------------------->| SerializeStruct |<-------+
+--+---------+ +-+-----+---------+ |
| ^ | | |
| | finish struct | | serialize field |
| +-----------------------------+ +------------------+
|
+---> finish
```
- In this example:
- We begin with a `Serializer`, which only allows us to start serializing a
struct.
- Once we call `.serialize_struct(...)`, ownership moves into a
`SerializeStruct` value. From that point on, we can only call methods
related to serializing struct fields.
- The original `Serializer` is no longer accessible — preventing us from
mixing modes (like writing a tuple or primitive mid-struct) or calling
`finish()` too early.
- Only after calling `.finish_struct()` do we receive the `Serializer` back.
At that point, the output can be finalized or reused.
- If we forget to call `finish_struct()` and drop the `SerializeStruct` early,
the `Serializer` is also dropped. This ensures incomplete output cannot leak
into the system.
- By contrast, if we had implemented everything on `Serializer` directly — as
seen on the previous slide, nothing would stop someone from skipping important
steps or mixing serialization flows.
</details>