1
0
mirror of https://github.com/google/comprehensive-rust.git synced 2025-08-08 08:22:52 +02:00

Add Unsafe Rust Deep Dive (#2806)

Adds the start of an unsafe deep dive to Comprehensive Rust. 

The `unsafe` keyword is easy to type, but hard to master. When used
appropriately, it forms a useful and indeed essential part of the Rust
programming language.

By the end of this deep dive, you'll know how to work with `unsafe` code, 
review others' changes that include the `unsafe` keyword, and produce 
your own.

What you'll learn:

- What the terms undefined behavior, soundness, and safety mean
- Why the `unsafe` keyword exists in the Rust language
- How to write your own code using `unsafe` safely
- How to review `unsafe` code

Here is a tentative outline of a 10h (2 day) treatment:

Day 1: Using and Reviewing Unsafe

- Welcome
- Motivations: explain why the `unsafe` keyword exists
- Foundations: provide background knowledge; what is soundness? what is
undefined behavior? what is validity in respect to pointers?
- Mechanics: what a safe `unsafe` block should look like
- Representations and Interoperability: explore how data is laid out in
memory and how that can be sent across the wire and/or stored on disk.
- Reviewing unsafe
- Patterns for safer unsafe: Encapsulating unsafe code in safe-to-use
abstractions, such as marking a type's constructor as `unsafe` so that
invariants only need to be enforced once by the programmer.

Day 2: Deploying Unsafe to Build Abstractions

- Welcome
- Validity in detail: A refresher. Emphasis on the details of the
invariants that are being upheld by a “typical” unsafe block, such as
aliasing, alignment, data validity, padding.
- Concurrency and thread safety: understanding `Send` and `Sync`,
knowing how to implement them on a user-defined type
- Case study: Small string optimization
- Case study: Zero-copy parsing
- Review

---------

Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
This commit is contained in:
Tim McNamara
2025-07-17 14:03:31 +12:00
committed by GitHub
parent 0a485b5a4c
commit 22d6af4abd
15 changed files with 674 additions and 0 deletions

View File

@ -440,6 +440,23 @@
--- ---
# Unsafe
- [Welcome](unsafe-deep-dive/welcome.md)
- [Setup](unsafe-deep-dive/setup.md)
- [Motivations](unsafe-deep-dive/motivations.md)
- [Interoperability](unsafe-deep-dive/motivations/interop.md)
- [Data Structures](unsafe-deep-dive/motivations/data-structures.md)
- [Performance](unsafe-deep-dive/motivations/performance.md)
- [Foundations](unsafe-deep-dive/foundations.md)
- [What is unsafe?](unsafe-deep-dive/foundations/what-is-unsafe.md)
- [When is unsafe used?](unsafe-deep-dive/foundations/when-is-unsafe-used.md)
- [Data structures are safe](unsafe-deep-dive/foundations/data-structures-are-safe.md)
- [Actions might not be](unsafe-deep-dive/foundations/actions-might-not-be.md)
- [Less powerful than it seems](unsafe-deep-dive/foundations/less-powerful.md)
---
# Final Words # Final Words
- [Thanks!](thanks.md) - [Thanks!](thanks.md)

View File

@ -82,6 +82,15 @@ You should be familiar with the material in
{{%course outline Idiomatic Rust}} {{%course outline Idiomatic Rust}}
### Unsafe (Work in Progress)
The [Unsafe](../unsafe-deep-dive/welcome.md) deep dive is a two-day class on the
_unsafe_ Rust language. It covers the fundamentals of Rust's safety guarantees,
the motivation for `unsafe`, review process for `unsafe` code, FFI basics, and
building data structures that the borrow checker would normally reject.
{{%course outline Unsafe}}
## Format ## Format
The course is meant to be very interactive and we recommend letting the The course is meant to be very interactive and we recommend letting the

View File

View File

@ -0,0 +1,5 @@
# Foundations
Some fundamental concepts and terms.
{{%segment outline}}

View File

@ -0,0 +1,19 @@
---
minutes: 2
---
# ... but actions on them might not be
```rust
fn main() {
let n: i64 = 12345;
let safe = &n as *const _;
println!("{safe:p}");
}
```
<details>
Modify the example to de-reference `safe` without an `unsafe` block.
</details>

View File

@ -0,0 +1,25 @@
---
minutes: 2
---
# Data structures are safe ...
Data structures are inert. They cannot do any harm by themselves.
Safe Rust code can create raw pointers:
```rust
fn main() {
let n: i64 = 12345;
let safe = &raw const n;
println!("{safe:p}");
}
```
<details>
Consider a raw pointer to an integer, i.e., the value `safe` is the raw pointer
type `*const i64`. Raw pointers can be out-of-bounds, misaligned, or be null.
But the unsafe keyword is not required when creating them.
</details>

View File

@ -0,0 +1,52 @@
---
minutes: 10
---
# Less powerful than it seems
The `unsafe` keyword does not allow you to break Rust.
```rust,ignore
use std::mem::transmute;
let orig = b"RUST";
let n: i32 = unsafe { transmute(orig) };
println!("{n}")
```
<details>
## Suggested outline
- Request that someone explains what `std::mem::transmute` does
- Discuss why it doesn't compile
- Fix the code
## Expected compiler output
```ignore
Compiling playground v0.0.1 (/playground)
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
--> src/main.rs:5:27
|
5 | let n: i32 = unsafe { transmute(orig) };
| ^^^^^^^^^
|
= note: source type: `&[u8; 4]` (64 bits)
= note: target type: `i32` (32 bits)
```
## Suggested change
```diff
- let n: i32 = unsafe { transmute(orig) };
+ let n: i64 = unsafe { transmute(orig) };
```
## Notes on less familiar Rust
- the `b` prefix on a string literal marks it as byte slice (`&[u8]`) rather
than a string slice (`&str`)
</details>

View File

@ -0,0 +1,98 @@
---
minutes: 6
---
# What is &ldquo;unsafety&rdquo;?
Unsafe Rust is a superset of Safe Rust.
Let's create a list of things that are enabled by the `unsafe` keyword.
<details>
## Definitions from authoritative docs:
From the [unsafe keyword's documentation]():
> Code or interfaces whose memory safety cannot be verified by the type system.
>
> ...
>
> Here are the abilities Unsafe Rust has in addition to Safe Rust:
>
> - Dereference raw pointers
> - Implement unsafe traits
> - Call unsafe functions
> - Mutate statics (including external ones)
> - Access fields of unions
From the [reference](https://doc.rust-lang.org/reference/unsafety.html)
> The following language level features cannot be used in the safe subset of
> Rust:
>
> - Dereferencing a raw pointer.
> - Reading or writing a mutable or external static variable.
> - Accessing a field of a union, other than to assign to it.
> - Calling an unsafe function (including an intrinsic or foreign function).
> - Calling a safe function marked with a target_feature from a function that
> does not have a target_feature attribute enabling the same features (see
> attributes.codegen.target_feature.safety-restrictions).
> - Implementing an unsafe trait.
> - Declaring an extern block.
> - Applying an unsafe attribute to an item.
## Group exercise
> You may have a group of learners who are not familiar with each other yet.
> This is a way for you to gather some data about their confidence levels and
> the psychological safety that they're feeling.
### Part 1: Informal definition
> Use this to gauge the confidence level of the group. If they are uncertain,
> then tailor the next section to be more directed.
Ask the class: **By raising your hand, indicate if you would feel comfortable
defining unsafe?**
If anyone's feeling confident, allow them to try to explain.
### Part 2: Evidence gathering
Ask the class to spend 3-5 minutes.
- Find a use of the unsafe keyword. What contract/invariant/pre-condition is
being established or satisfied?
- Write down terms that need to be defined (unsafe, memory safety, soundness,
undefined behavior)
### Part 3: Write a working definition
### Part 4: Remarks
Mention that we'll be reviewing our definition at the end of the day.
## Note: Avoid detailed discussion about precise semantics of memory safety
It's possible that the group will slide into a discussion about the precise
semantics of what memory safety actually is and how define pointer validity.
This isn't a productive line of discussion. It can undermine confidence in less
experienced learners.
Perhaps refer people who wish to discuss this to the discussion within the
official [documentation for pointer types] (excerpt below) as a place for
further research.
> Many functions in [this module] take raw pointers as arguments and read from
> or write to them. For this to be safe, these pointers must be _valid_ for the
> given access.
>
> ...
>
> The precise rules for validity are not determined yet.
[this module]: https://doc.rust-lang.org/std/ptr/index.html
[documentation for pointer types]: https://doc.rust-lang.org/std/ptr/index.html#safety
</details>

View File

@ -0,0 +1,48 @@
---
minutes: 2
---
# When is unsafe used?
The unsafe keyword indicates that the programmer is responsible for upholding
Rust's safety guarantees.
The keyword has two roles:
- define pre-conditions that must be satisfied
- assert to the compiler (= promise) that those defined pre-conditions are
satisfied
## Further references
- [The unsafe keyword chapter of the Rust Reference](https://doc.rust-lang.org/reference/unsafe-keyword.html)
<details>
Places where pre-conditions can be defined (Role 1)
- [unsafe functions] (`unsafe fn foo() { ... }`). Example: `get_unchecked`
method on slices, which requires callers to verify that the index is
in-bounds.
- unsafe traits (`unsafe trait`). Examples: [`Send`] and [`Sync`] marker traits
in the standard library.
Places where pre-conditions must be satisfied (Role 2)
- unsafe blocks (`unafe { ... }`)
- implementing unsafe traits (`unsafe impl`)
- access external items (`unsafe extern`)
- adding
[unsafe attributes](https://doc.rust-lang.org/reference/attributes.html) o an
item. Examples: [`export_name`], [`link_section`] and [`no_mangle`]. Usage:
`#[unsafe(no_mangle)]`
[unsafe functions]: https://doc.rust-lang.org/reference/unsafe-keyword.html#unsafe-functions-unsafe-fn
[unsafe traits]: https://doc.rust-lang.org/reference/unsafe-keyword.html#unsafe-traits-unsafe-trait
[`export_name`]: https://doc.rust-lang.org/reference/abi.html#the-export_name-attribute
[`link_section`]: https://doc.rust-lang.org/reference/abi.html#the-link_section-attribute
[`no_mangle`]: https://doc.rust-lang.org/reference/abi.html#the-no_mangle-attribute
[`Send`]: https://doc.rust-lang.org/std/marker/trait.Send.html
[`Sync`]: https://doc.rust-lang.org/std/marker/trait.Sync.html
</details>

View File

@ -0,0 +1,24 @@
---
minutes: 1
---
# Motivations
We know that writing code without the guarantees that Rust provides ...
> “Use-after-free (UAF), integer overflows, and out of bounds (OOB) reads/writes
> comprise 90% of vulnerabilities with OOB being the most common.”
>
> --— **Jeff Vander Stoep and Chong Zang**, Google.
> "[Queue the Hardening Enhancements](https://security.googleblog.com/2019/05/queue-hardening-enhancements.html)"
... so why is `unsafe` part of the language?
{{%segment outline}}
<details>
The `unsafe` keyword exists because there is no compiler technology available
today that makes it obsolete. Compilers cannot verify everything.
</details>

View File

@ -0,0 +1,30 @@
---
minutes: 5
---
# Data Structures
Some families of data structures are impossible to create in safe Rust.
- graphs
- bit twiddling
- self-referential types
- intrusive data structures
<details>
Graphs: General-purpose graphs cannot be created as they may need to represent
cycles. Cycles are impossible for the type system to reason about.
Bit twiddling: Overloading bits with multiple meanings. Examples include using
the NaN bits in `f64` for some other purpose or the higher-order bits of
pointers on `x86_64` platforms. This is somewhat common when writing language
interpreters to keep representations within the word size the target platform.
Self-referential types are too hard for the borrow checker to verify.
Intrusive data structures: store structural metadata (like pointers to other
elements) inside the elements themselves, which requires careful handling of
aliasing.
</details>

View File

@ -0,0 +1,245 @@
---
minutes: 5
---
> TODO: Refactor this content into multiple slides as this slide is intended as
> an introduction to the motivations only, rather than to be an elaborate
> discussion of the whole problem.
# Interoperability
Language interoperability allows you to:
- Call functions written in other languages from Rust
- Write functions in Rust that are callable from other languages
However, this requires unsafe.
```rust,editable,ignore
unsafe extern "C" {
safe fn random() -> libc::c_long;
}
fn main() {
let a = random() as i64;
println!("{a:?}");
}
```
<details>
The Rust compiler can't enforce any safety guarantees for programs that it
hasn't compiled, so it delegates that responsibility to you through the unsafe
keyword.
The code example we're seeing shows how to call the random function provided by
libc within Rust. libc is available to scripts in the Rust Playground.
This uses Rust's _foreign function interface_.
This isn't the only style of interoperability, however it is the method that's
needed if you want to work between Rust and some other language in a zero cost
way. Another important strategy is message passing.
Message passing avoids unsafe, but serialization, allocation, data transfer and
parsing all take energy and time.
## Answers to questions
- _Where does "random" come from?_\
libc is dynamically linked to Rust programs by default, allowing our code to
rely on its symbols, including `random`, being available to our program.
- _What is the "safe" keyword?_\
It allows callers to call the function without needing to wrap that call in
`unsafe`. The [`safe` function qualifier] was introduced in the 2024 edition
of Rust and can only be used within `extern` blocks. It was introduced because
`unsafe` became a mandatory qualifier for `extern` blocks in that edition.
- _What is the [`std::ffi::c_long`] type?_\
According to the C standard, an integer that's at least 32 bits wide. On
today's systems, It's an `i32` on Windows and an `i64` on Linux.
[`safe` keyword]: https://doc.rust-lang.org/reference/safe-keyword.html
[`std::ffi::c_long`]: https://doc.rust-lang.org/std/ffi/type.c_long.html
## Consideration: type safety
Modify the code example to remove the need for type casting later. Discuss the
potential UB - long's width is defined by the target.
```rust
unsafe extern "C" {
safe fn random() -> i64;
}
fn main() {
let a = random();
println!("{a:?}");
}
```
> Changes from the original:
>
> ```diff
> unsafe extern "C" {
> - safe fn random() -> libc::c_long;
> + safe fn random() -> i64;
> }
>
> fn main() {
> - let a = random() as i64;
> + let a = random();
> println!("{a:?}");
> }
> ```
It's also possible to completely ignore the intended type and create undefined
behavior in multiple ways. The code below produces output most of the time, but
generally results in a stack overflow. It may also produce illegal `char`
values. Although `char` is represented in 4 bytes (32 bits),
[not all bit patterns are permitted as a `char`][char].
Stress that the Rust compiler will trust that the wrapper is telling the truth.
[char]: https://doc.rust-lang.org/std/primitive.char.html#validity-and-layout
<!-- TODO(timclicks): add libc to the mdbook build system so that the example can be tested -->
```rust,ignore
unsafe extern "C" {
safe fn random() -> [char; 2];
}
fn main() {
let a = random();
println!("{a:?}");
}
```
> Changes from the original:
>
> ```diff
> unsafe extern "C" {
> - safe fn random() -> libc::c_long;
> + safe fn random() -> [char; 2];
> }
>
> fn main() {
> - let a = random() as i64;
> - println!("{a}");
> + let a = random();
> + println!("{a:?}");
> }
> ```
> Attempting to print a `[char; 2]` from randomly generated input will often
> produce strange output, including:
>
> ```ignore
> thread 'main' panicked at library/std/src/io/stdio.rs:1165:9:
> failed printing to stdout: Bad address (os error 14)
> ```
>
> ```ignore
> thread 'main' has overflowed its stack
> fatal runtime error: stack overflow, aborting
> ```
Mention that type safety is generally not a large concern in practice. Tools
that produce wrappers automatically, i.e. bindgen, are excellent at reading
header files and producing values of the correct type.
## Consideration: Ownership and lifetime management
While libc's `random` function doesn't use pointers, many do. This creates many
more possibilities for unsoundness.
- both sides might attempt to free the memory (double free)
- both sides can attempt to write to the data
For example, some C libraries expose functions that write to static buffers that
are re-used between calls.
<!--
TODO(timclicks): consider adding a safety comment in the docstring that discusses thread safety and the ownership of the returned pointer.
See <https://github.com/google/comprehensive-rust/pull/2806#discussion_r2207171041>.
-->
<!-- TODO(timclicks): add libc to the mdbook build system so that the example can be tested -->
```rust,ignore
use std::ffi::{CStr, c_char};
use std::time::{SystemTime, UNIX_EPOCH};
unsafe extern "C" {
/// Create a formatted time based on time `t`, including trailing newline.
/// Read `man 3 ctime` details.
fn ctime(t: *const libc::time_t) -> *const c_char;
}
unsafe fn format_timestamp<'a>(t: u64) -> &'a str {
let t = t as libc::time_t;
unsafe {
let fmt_ptr = ctime(&t);
CStr::from_ptr(fmt_ptr).to_str().unwrap()
}
}
fn main() {
let now = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
let now = now.as_secs();
let now_fmt = unsafe { format_timestamp(now) };
print!("now (1): {}", now_fmt);
let future = now + 60;
let future_fmt = unsafe { format_timestamp(future) };
print!("future: {}", future_fmt);
print!("now (2): {}", now_fmt);
}
```
> Aside: Lifetimes in the `format_timestamp()` function
>
> Neither `'a`, nor `'static` correctly describe the lifetime of the string
> that's returned. Rust treats it as an immutable reference, but subsequent
> calls to `ctime` will overwrite the static buffer that the string occupies.
Bonus points: can anyone spot the lifetime bug? `format_timestamp()` should
return a `&'static str`.
## Consideration: Representation mismatch
Different programming languages have made different design decisions and this
can create impedance mismatches between different domains.
Consider string handling. C++ defines `std::string`, which has an incompatible
memory layout with Rust's `String` type. `String` also requires text to be
encoded as UTF-8, whereas `std::string` does not. In C, text is represented by a
null-terminated sequence of bytes (`char*`).
```rust
fn main() {
let c_repr = b"Hello, C\0";
let rust_repr = (b"Hello, Rust", 11);
let c: &str = unsafe {
let ptr = c_repr.as_ptr() as *const i8;
std::ffi::CStr::from_ptr(ptr).to_str().unwrap()
};
println!("{c}");
let rust: &str = unsafe {
let ptr = rust_repr.0.as_ptr();
let bytes = std::slice::from_raw_parts(ptr, rust_repr.1);
std::str::from_utf8_unchecked(bytes)
};
println!("{rust}");
}
```
</details>

View File

@ -0,0 +1,10 @@
---
minutes: 5
---
# Performance
> TODO: Stub for now
It's easy to think of performance as the main reason for unsafe, but high
performance code makes up the minority of unsafe blocks.

View File

@ -0,0 +1,46 @@
---
minutes: 2
---
# Setting Up
## Local Rust installation
You should have a Rust compiler installed that supports the 2024 edition of the
language, which is any version of rustc higher than 1.84.
```console
$ rustc --version
rustc 1.87
```
<!--
TODO (tim): Adding this for later while I'm here.
TODO (tim): We should be able to avoid this by just relying on the `cc` crate
We recommend that you install the [Bazel build system](https://bazel.build/install).
This will allow you to easily compile project that combine multiple languages.
-->
## (Optional) Create a local instance of the course
```console
$ git clone --depth=1 https://github.com/google/comprehensive-rust.git
Cloning into 'comprehensive-rust'...
...
$ cd comprehensive-rust
$ cargo install-tools
...
$ cargo serve # then open http://127.0.0.1:3000/ in a browser
```
<details>
Ask everyone to confirm that everyone is able to execute `rustc` with a version
older that 1.87.
For those people who do not, tell them that we'll resolve that in the break.
</details>

View File

@ -0,0 +1,46 @@
---
course: Unsafe
session: Day 1 Morning
target_minutes: 300
---
# Welcome to Unsafe Rust
> IMPORTANT: THIS MODULE IS IN AN EARLY STAGE OF DEVELOPMENT
>
> Please do not consider this module of Comprehensive Rust to be complete. With
> that in mind, your feedback, comments, and especially your concerns, are very
> welcome.
>
> To comment on this module's development, please use the
> [GitHub issue tracker].
[GitHub issue tracker]: https://github.com/google/comprehensive-rust/issues
The `unsafe` keyword is easy to type, but hard to master. When used
appropriately, it forms a useful and indeed essential part of the Rust
programming language.
By the end of this deep dive, you'll know how to work with `unsafe` code, review
others' changes that include the `unsafe` keyword, and produce your own.
What you'll learn:
- What the terms undefined behavior, soundness, and safety mean
- Why the `unsafe` keyword exists in the Rust language
- How to write your own code using `unsafe` safely
- How to review `unsafe` code
## Links to other sections of the course
The `unsafe` keyword has treatment in:
- _Rust Fundamentals_, the main module of Comprehensive Rust, includes a session
on [Unsafe Rust] in its last day.
- _Rust in Chromium_ discusses how to [interoperate with C++]. Consult that
material if you are looking into FFI.
- _Bare Metal Rust_ uses unsafe heavily to interact with the underlying host,
among other things.
[interoperate with C++]: ../chromium/interoperability-with-cpp.md
[Unsafe Rust]: ../unsafe-rust.html