1
0
mirror of https://github.com/google/comprehensive-rust.git synced 2025-11-25 23:53:12 +02:00

"borrow checker invariants" section of the "leveraging the type system" chapter (#2867)

Adds materials on the "leveraging the type system/borrow checker
invariants" subject.

I'm still calibrating what's expected subject-and-style wise, so do
spell out things where I've drifted off mark.

---------

Co-authored-by: tall-vase <fiona@mainmatter.com>
Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
This commit is contained in:
tall-vase
2025-11-07 13:12:59 +00:00
committed by GitHub
parent 03cd040dc2
commit bb4db3d7b8
9 changed files with 749 additions and 0 deletions

View File

@@ -455,6 +455,14 @@
- [Serializer: implement Struct](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/struct.md) - [Serializer: implement Struct](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/struct.md)
- [Serializer: implement Property](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/property.md) - [Serializer: implement Property](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/property.md)
- [Serializer: Complete implementation](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/complete.md) - [Serializer: Complete implementation](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/complete.md)
- [Borrow checking invariants](idiomatic/leveraging-the-type-system/borrow-checker-invariants.md)
- [Lifetimes and Borrows: the Abstract Rules](idiomatic/leveraging-the-type-system/borrow-checker-invariants/generalizing-ownership.md)
- [Single-use values](idiomatic/leveraging-the-type-system/borrow-checker-invariants/single-use-values.md)
- [Mutually Exclusive References / "Aliasing XOR Mutability"](idiomatic/leveraging-the-type-system/borrow-checker-invariants/aliasing-xor-mutability.md)
- [PhantomData and Types](idiomatic/leveraging-the-type-system/borrow-checker-invariants/phantomdata-01-types.md)
- [PhantomData and Types (implementation)](idiomatic/leveraging-the-type-system/borrow-checker-invariants/phantomdata-02-types-implemented.md)
- [PhantomData: Lifetimes for External Resources](idiomatic/leveraging-the-type-system/borrow-checker-invariants/phantomdata-03-lifetimes.md)
- [PhantomData: OwnedFd & BorrowedFd](idiomatic/leveraging-the-type-system/borrow-checker-invariants/phantomdata-04-borrowedfd.md)
- [Token Types](idiomatic/leveraging-the-type-system/token-types.md) - [Token Types](idiomatic/leveraging-the-type-system/token-types.md)
- [Permission Tokens](idiomatic/leveraging-the-type-system/token-types/permission-tokens.md) - [Permission Tokens](idiomatic/leveraging-the-type-system/token-types/permission-tokens.md)
- [Token Types with Data: Mutex Guards](idiomatic/leveraging-the-type-system/token-types/mutex-guard.md) - [Token Types with Data: Mutex Guards](idiomatic/leveraging-the-type-system/token-types/mutex-guard.md)

View File

@@ -0,0 +1,116 @@
---
minutes: 15
---
# Using the Borrow checker to enforce Invariants
The borrow checker, while added to enforce memory ownership, can model other
problems and prevent API misuse.
```rust,editable
/// Doors can be open or closed, and you need the right key to lock or unlock
/// one. Modelled with a Shared key and Owned door.
pub struct DoorKey {
pub key_shape: u32,
}
pub struct LockedDoor {
lock_shape: u32,
}
pub struct OpenDoor {
lock_shape: u32,
}
fn open_door(key: &DoorKey, door: LockedDoor) -> Result<OpenDoor, LockedDoor> {
if door.lock_shape == key.key_shape {
Ok(OpenDoor { lock_shape: door.lock_shape })
} else {
Err(door)
}
}
fn close_door(key: &DoorKey, door: OpenDoor) -> Result<LockedDoor, OpenDoor> {
if door.lock_shape == key.key_shape {
Ok(LockedDoor { lock_shape: door.lock_shape })
} else {
Err(door)
}
}
fn main() {
let key = DoorKey { key_shape: 7 };
let closed_door = LockedDoor { lock_shape: 7 };
let opened_door = open_door(&key, closed_door);
if let Ok(opened_door) = opened_door {
println!("Opened the door with key shape '{}'", key.key_shape);
} else {
eprintln!(
"Door wasn't opened! Your key only opens locks with shape '{}'",
key.key_shape
);
}
}
```
<details>
- We've seen the borrow checker prevent memory safety bugs (use-after-free, data
races).
- We've also used types to shape and restrict APIs already using
[the Typestate pattern](../leveraging-the-type-system/typestate-pattern.md).
- Language features are often introduced for a specific purpose.
Over time, users may develop ways of using a feature in ways that were not
predicted when they were introduced.
Java 5 introduced Generics in 2004 with the
[main stated purpose of enabling type-safe collections](https://jcp.org/en/jsr/detail?id=14).
Adoption was slow at first, but some new projects began designing their APIs
around generics from the beginning.
Since then, users and developers of the language expanded the use of generics
to other areas of type-safe API design:
- Class information can be held onto via Java's `Class<T>` or Guava's
`TypeToken<T>`.
- The Builder pattern can be implemented using Recursive Generics.
We aim to do something similar here: Even though the borrow checker was
introduced to prevent use-after-free and data races, we treat it as just
another API design tool.
It can be used to model program properties that have nothing to do with
preventing memory safety bugs.
- To use the borrow checker as a problem solving tool, we will need to "forget"
that the original purpose of it is to prevent mutable aliasing in the context
of preventing use-after-frees and data races.
We should imagine working within situations where the rules are the same but
the meaning is slightly different.
- This example uses ownership and borrowing are used to model the state of a
physical door.
`open_door` **consumes** a `LockedDoor` and returns a new `OpenDoor`. The old
`LockedDoor` value is no longer available.
If the wrong key is used, the door is left locked. It is returned as an `Err`
case of the `Result`.
It is a compile-time error to try and use a door that has already been opened.
- Similarly, `lock_door` consumes an `OpenDoor`, preventing closing the door
twice at compile time.
- The rules of the borrow checker exist to prevent memory safety bugs, but the
underlying logical system does not "know" what memory is.
All the borrow checker does is enforce a specific set of rules of how users
can order operations.
This is just one case of piggy-backing onto the rules of the borrow checker to
design APIs to be harder or impossible to misuse.
</details>

View File

@@ -0,0 +1,108 @@
---
minutes: 15
---
# Mutually Exclusive References / "Aliasing XOR Mutability"
We can use the mutual exclusion of `&T` and `&mut T` references to prevent data
from being used before it is ready.
```rust,editable
pub struct QueryResult;
pub struct DatabaseConnection {/* fields omitted */}
impl DatabaseConnection {
pub fn new() -> Self {
Self {}
}
pub fn results(&self) -> &[QueryResult] {
&[] // fake results
}
}
pub struct Transaction<'a> {
connection: &'a mut DatabaseConnection,
}
impl<'a> Transaction<'a> {
pub fn new(connection: &'a mut DatabaseConnection) -> Self {
Self { connection }
}
pub fn query(&mut self, _query: &str) {
// Send the query over, but don't wait for results.
}
pub fn commit(self) {
// Finish executing the transaction and retrieve the results.
}
}
fn main() {
let mut db = DatabaseConnection::new();
// The transaction `tx` mutably borrows `db`.
let mut tx = Transaction::new(&mut db);
tx.query("SELECT * FROM users");
// This won't compile because `db` is already mutably borrowed by `tx`.
// let results = db.results(); // ❌🔨
// The borrow of `db` ends when `tx` is consumed by `commit()`.
tx.commit();
// Now it is possible to borrow `db` again.
let results = db.results();
}
```
<details>
- Motivation: In this database API queries are kicked off for asynchronous
execution and the results are only available once the whole transaction is
finished.
A user might think that queries are executed immediately, and try to read
results before they are made available. This API misuse could make the app
read incomplete or incorrect data.
While an obvious misunderstanding, situations such as this can happen in
practice.
Ask: Has anyone misunderstood an API by not reading the docs for proper use?
Expect: Examples of early-career or in-university mistakes and
misunderstandings.
As an API grows in size and user base, a smaller percentage of users has deep
knowledge of the system the API represents.
- This example shows how we can use Aliasing XOR Mutability to prevent this kind
of misuse.
- The code might read results before they are ready if the programmer assumes
that the queries execute immediately rather than kicked off for asynchronous
execution.
- The constructor for the `Transaction` type takes a mutable reference to the
database connection, and stores it in the returned `Transaction` value.
The explicit lifetime here doesn't have to be intimidating, it just means
"`Transaction` is outlived by the `DatabaseConnection` that was passed to it"
in this case.
The reference is mutable to completely lock out the `DatabaseConnection` from
other usage, such as starting further transactions or reading the results.
- While a `Transaction` exists, we can't touch the `DatabaseConnection` variable
that was created from it.
Demonstrate: uncomment the `db.results()` line. Doing so will result in a
compile error, as `db` is already mutably borrowed.
- Note: The query results not being public and placed behind a getter function
lets us enforce the invariant "users can only look at query results if there
is no active transactions."
If the query results were placed in a public struct field, this invariant
could be violated.
</details>

View File

@@ -0,0 +1,72 @@
---
minutes: 10
---
# Lifetimes and Borrows: the Abstract Rules
```rust,editable
// An internal data type to have something to hold onto.
pub struct Internal;
// The "outer" data.
pub struct Data(Internal);
fn shared_use(value: &Data) -> &Internal {
&value.0
}
fn exclusive_use(value: &mut Data) -> &mut Internal {
&mut value.0
}
fn deny_future_use(value: Data) {}
fn demo_exclusive() {
let mut value = Data(Internal);
let shared = shared_use(&value);
// let exclusive = exclusive_use(&mut value); // ❌🔨
let shared_again = &shared;
}
fn demo_denied() {
let value = Data(Internal);
deny_future_use(value);
// let shared = shared_use(&value); // ❌🔨
}
# fn main() {}
```
<details>
- This example re-frames the borrow checker rules away from references and
towards semantic meaning in non-memory-safety settings.
Nothing is being mutated, nothing is being sent across threads.
- In rust's borrow checker we have access to three different ways of "taking" a
value:
- Owned value `T`. Value is dropped when the scope ends, unless it is not
returned to another scope.
- Shared Reference `&T`. Allows aliasing but prevents mutable access while
shared references are in use.
- Mutable Reference `&mut T`. Only one of these is allowed to exist for a
value at any one point, but can be used to create shared references.
- Ask: The two commented-out lines in the `demo` functions would cause
compilation errors, Why?
`demo_exclusive`: Because the `shared` value is still aliased after the
`exclusive` reference is taken.
`demo_denied`: Because `value` is consumed the line before the
`shared_again_again` reference is taken from `&value`.
- Remember that every `&T` and `&mut T` has a lifetime, just one the user
doesn't have to annotate or think about most of the time.
We rarely specify lifetimes because the Rust compiler allows us to _elide_
them in most cases. See:
[Lifetime Elision](../../../lifetimes/lifetime-elision.md)
</details>

View File

@@ -0,0 +1,48 @@
---
minutes: 5
---
# PhantomData 1/4: De-duplicating Same Data & Semantics
The newtype pattern can sometimes come up against the DRY principle, how do we
solve this?
<!-- dprint-ignore-start -->
```rust,editable,compile_fail
pub struct UserId(u64);
impl ChatUser for UserId { /* ... */ }
pub struct PatronId(u64);
impl ChatUser for PatronId { /* ... */ }
pub struct ModeratorId(u64);
impl ChatUser for ModeratorId { /* ... */ }
impl ChatModerator for ModeratorId { /* ... */ }
pub struct AdminId(u64);
impl ChatUser for AdminId { /* ... */ }
impl ChatModerator for AdminId { /* ... */ }
impl ChatAdmin for AdminId { /* ... */ }
// And so on ...
fn main() {}
```
<!-- dprint-ignore-end -->
<details>
- Problem: We want to use the newtype pattern to differentiate permissions, but
we're having to implement the same traits over and over again for the same
data.
- Ask: Assume the details of each implementation here are the same between
types, what are ways we can avoid repeating ourselves?
Expect:
- Make this an enum, not distinct data types.
- Bundle the user ID with permission tokens like
`struct Admin(u64, UserPermission, ModeratorPermission, AdminPermission);`
- Adding a type parameter which encodes permissions.
- Mentioning `PhantomData` ahead of schedule (it's in the title).
</details>

View File

@@ -0,0 +1,91 @@
---
minutes: 10
---
# PhantomData 2/4: Type-level tagging
Let's solve the problem from the previous slide by adding a type parameter.
<!-- dprint-ignore-start -->
```rust,editable
// use std::marker::PhantomData;
pub struct ChatId<T> { id: u64, tag: T }
pub struct UserTag;
pub struct AdminTag;
pub trait ChatUser {/* ... */}
pub trait ChatAdmin {/* ... */}
impl ChatUser for UserTag {/* ... */}
impl ChatUser for AdminTag {/* ... */} // Admins are users
impl ChatAdmin for AdminTag {/* ... */}
// impl <T> Debug for UserTag<T> {/* ... */}
// impl <T> PartialEq for UserTag<T> {/* ... */}
// impl <T> Eq for UserTag<T> {/* ... */}
// And so on ...
impl <T: ChatUser> ChatId<T> {/* All functionality for users and above */}
impl <T: ChatAdmin> ChatId<T> {/* All functionality for only admins */}
fn main() {}
```
<!-- dprint-ignore-end -->
<details>
- Here we're using a type parameter and gating permissions behind "tag" types
that implement different permission traits.
Tag types, or marker types, are zero-sized types that have some semantic
meaning to users and API designers.
- Ask: What issues does having it be an actual instance of that type pose?
Answer: If it's not a zero-sized type (like `()` or `struct MyTag;`), then
we're allocating more memory than we need to when all we care for is type
information that is only relevant at compile-time.
- Demonstrate: remove the `tag` value entirely, then compile!
This won't compile, as there's an unused (phantom) type parameter.
This is where `PhantomData` comes in!
- Demonstrate: Uncomment the `PhantomData` import, and make `ChatId<T>` the
following:
```rust,compile_fail
pub struct ChatId<T> {
id: u64,
tag: PhantomData<T>,
}
```
- `PhantomData<T>` is a zero-sized type with a type parameter. We can construct
values of it like other ZSTs with
`let phantom: PhantomData<UserTag> = PhantomData;` or with the
`PhantomData::default()` implementation.
Demonstrate: implement `From<u64>` for `ChatId<T>`, emphasizing the
construction of `PhantomData`
```rust,compile_fail
impl<T> From<u64> for ChatId<T> {
fn from(value: u64) -> Self {
ChatId {
id: value,
// Or `PhantomData::default()`
tag: PhantomData,
}
}
}
```
- `PhantomData` can be used as part of the Typestate pattern to have data with
the same structure but different methods, e.g., have `TaggedData<Start>`
implement methods or trait implementations that `TaggedData<End>` doesn't.
</details>

View File

@@ -0,0 +1,114 @@
---
minutes: 15
---
# PhantomData 3/4: Lifetimes for External Resources
The invariants of external resources often match what we can do with lifetime
rules.
```rust,editable
// use std::marker::PhantomData;
/// Direct FFI to a database library in C.
/// We got this API as is, we have no influence over it.
mod ffi {
pub type DatabaseHandle = u8; // maximum 255 databases open at the same time
fn database_open(name: *const std::os::raw::c_char) -> DatabaseHandle {
unimplemented!()
}
// ... etc.
}
struct DatabaseConnection(ffi::DatabaseHandle);
struct Transaction<'a>(&'a mut DatabaseConnection);
impl DatabaseConnection {
fn new_transaction(&mut self) -> Transaction<'_> {
Transaction(self)
}
}
fn main() {}
```
<details>
- Remember the transaction API from the
[Aliasing XOR Mutability](./aliasing-xor-mutability.md) example.
We held onto a mutable reference to the database connection within the
transaction type to lock out the database while a transaction is active.
In this example, we want to implement a `Transaction` API on top of an
external, non-Rust API.
We start by defining a `Transaction` type that holds onto
`&mut DatabaseConnection`.
- Ask: What are the limits of this implementation? Assume the `u8` is accurate
implementation-wise and enough information for us to use the external API.
Expect:
- Indirection takes up 7 bytes more than we need to on a 64-bit platform, as
well as costing a pointer dereference at runtime.
- Problem: We want the transaction to borrow the database connection that
created it, but we don't want the `Transaction` object to store a real
reference.
- Ask: What happens when we remove the mutable reference in `Transaction` while
keeping the lifetime parameter?
Expect: Unused lifetime parameter!
- Like with the type tagging from the previous slides, we can bring in
`PhantomData` to capture this unused lifetime parameter for us.
The difference is that we will need to use the lifetime alongside another
type, but that other type does not matter too much.
- Demonstrate: change `Transaction` to the following:
```rust,compile_fail
pub struct Transaction<'a> {
connection: DatabaseConnection,
_phantom: PhantomData<&mut 'a ()>,
}
```
Update the `DatabaseConnection::new_transaction()` method:
```rust,compile_fail
fn new_transaction<'a>(&'a mut self) -> Transaction<'a> {
Transaction { connection: DatabaseConnection(self.0), _phantom: PhantomData }
}
```
This gives an owned database connection that is tied to the
`DatabaseConnection` that created it, but with less runtime memory footprint
that the store-a-reference version did.
Because `PhantomData` is a zero-sized type (like `()` or
`struct MyZeroSizedType;`), the size of `Transaction` is now the same as `u8`.
The implementation that held onto a reference instead was as large as a
`usize`.
## More to Explore
- This way of encoding relationships between types and values is very powerful
when combined with unsafe, as the ways one can manipulate lifetimes becomes
almost arbitrary. This is also dangerous, but when combined with tools like
external, mechanically-verified proofs we can safely encode
cyclic/self-referential types while encoding lifetime & safety expectations in
the relevant data types.
- The [GhostCell (2021)](https://plv.mpi-sws.org/rustbelt/ghostcell/) paper and
its [relevant implementation](https://gitlab.mpi-sws.org/FP/ghostcell) show
this kind of work off. While the borrow checker is restrictive, there are
still ways to use escape hatches and then _show that the ways you used those
escape hatches are consistent and safe._
</details>

View File

@@ -0,0 +1,112 @@
---
minutes: 10
---
# PhantomData 4/4: OwnedFd & BorrowedFd
`BorrowedFd` is a prime example of `PhantomData` in action.
<!--
This code has to define a fake libc module even though libc works fine on
rust playground because the CI does not currently support dependencies.
TODO: Once we can use libc as a dependency in rust tests, replace the
faux libc code with appropriate imports & `O_WRONLY | O_CREAT` permissions.
-->
```rust,editable
use std::marker::PhantomData;
use std::os::raw::c_int;
mod libc_ffi {
use std::os::raw::{c_char, c_int};
pub unsafe fn open(path: *const c_char, oflag: c_int) -> c_int {
3
}
pub unsafe fn close(fd: c_int) {}
}
struct OwnedFd {
fd: c_int,
}
impl OwnedFd {
fn try_from_fd(fd: c_int) -> Option<Self> {
if fd < 0 {
return None;
}
Some(OwnedFd { fd })
}
fn as_fd<'a>(&'a self) -> BorrowedFd<'a> {
BorrowedFd { fd: self.fd, _phantom: PhantomData }
}
}
impl Drop for OwnedFd {
fn drop(&mut self) {
unsafe { libc_ffi::close(self.fd) };
}
}
struct BorrowedFd<'a> {
fd: c_int,
_phantom: PhantomData<&'a ()>,
}
fn main() {
// Create a file with a raw syscall with write-only and create permissions.
let fd = unsafe { libc_ffi::open(c"c_str.txt".as_ptr(), 065) };
// Pass the ownership of an integer file descriptor to an `OwnedFd`.
// `OwnedFd::drop()` closes the file descriptor.
let owned_fd =
OwnedFd::try_from_fd(fd).expect("Could not open file with syscall!");
// Create a `BorrowedFd` from an `OwnedFd`.
// `BorrowedFd::drop()` does not close the file because it doesn't own it!
let borrowed_fd: BorrowedFd<'_> = owned_fd.as_fd();
// std::mem::drop(owned_fd); // ❌🔨
std::mem::drop(borrowed_fd);
let second_borrowed = owned_fd.as_fd();
// owned_fd will be dropped here, and the file will be closed.
}
```
<details>
- A file descriptor represents a specific process's access to a file.
Reminder: Device and OS-specific features are exposed as if they were files on
unix-style systems.
- [`OwnedFd`](https://rust-lang.github.io/rfcs/3128-io-safety.html#ownedfd-and-borrowedfdfd)
is an owned wrapper type for a file descriptor. It _owns_ the file descriptor,
and closes it when dropped.
Note: We have our own implementation of it here, draw attention to the
explicit `Drop` implementation.
`BorrowedFd` is its borrowed counterpart, it does not need to close the file
when it is dropped.
Note: We have not explicitly implemented `Drop` for `BorrowedFd`.
- `BorrowedFd` uses a lifetime captured with a `PhantomData` to enforce the
invariant "if this file descriptor exists, the OS file descriptor is still
open even though it is not responsible for closing that file descriptor."
The lifetime parameter of `BorrowedFd` demands that there exists another value
in your program that lasts as long as that specific `BorrowedFd` or outlives
it (in this case an `OwnedFd`).
Demonstrate: Uncomment the `std::mem::drop(owned_fd)` line and try to compile
to show that `borrowed_fd` relies on the lifetime of `owned_fd`.
This has been encoded by the API designers to mean _that other value is what
keeps the access to the file open_.
Because Rust's borrow checker enforces this relationship where one value must
last at least as long as another, users of this API do not need to worry about
handling this correct file descriptor aliasing and closing logic themselves.
</details>

View File

@@ -0,0 +1,80 @@
---
minutes: 10
---
# Single-use values
Sometimes we want values that _can only be used once_. One critical example of
this is in cryptography: A "Nonce."
```rust,editable
pub struct Key(/* specifics omitted */);
/// A single-use number suitable for cryptographic purposes.
pub struct Nonce(u32);
/// A cryptographically sound random generator function.
pub fn new_nonce() -> Nonce {
Nonce(4) // chosen by a fair dice roll, https://xkcd.com/221/
}
/// Consume a nonce, but not the key or the data.
pub fn encrypt(nonce: Nonce, key: &Key, data: &[u8]) {}
fn main() {
let nonce = new_nonce();
let data_1: [u8; 4] = [1, 2, 3, 4];
let data_2: [u8; 4] = [4, 3, 2, 1];
let key = Key(/* specifics omitted */);
// The key and data can be re-used, copied, etc. but the nonce cannot.
encrypt(nonce, &key, &data_1);
// encrypt(nonce, &key, &data_2); // 🛠️❌
}
```
<details>
- Problem: How can we guarantee a value is used only once?
- Motivation: A nonce is a piece of random, unique data used in cryptographic
protocols to prevent replay attacks.
Background: In practice people have ended up accidentally re-using nonces.
Most commonly, this causes the cryptographic protocol to completely break down
and stop fulfilling its function.
Depending on the specifics of nonce reuse and cryptography at hand, private
keys can also become computable by attackers.
- Rust has an obvious tool for achieving the invariant "Once you use this, you
can't use it again": passing a value as an _owned argument_.
- Highlight: the `encrypt` function takes `nonce` by value (an owned argument),
but `key` and `data` by reference.
- The technique for single-use values is as follows:
- Keep constructors private, so a user can't construct values with the same
inner value twice.
- Don't implement `Clone`/`Copy` traits or equivalent methods, so a user can't
duplicate data we want to keep unique.
- Make the interior type opaque (like with the newtype pattern), so the user
cannot modify an existing value on their own.
- Ask: What are we missing from the newtype pattern in the slide's code?
Expect: Module boundary.
Demonstrate: Without a module boundary a user can construct a nonce on their
own.
Fix: Put `Key`, `Nonce`, and `new_nonce` behind a module.
## More to Explore
- Cryptography Nuance: A nonce might still be used twice if it was created
through pseudo-random process with no actual randomness. That can't be
prevented through this method. This API design prevents one nonce duplication,
but not all logic bugs.
</details>