1
0
mirror of https://github.com/google/comprehensive-rust.git synced 2025-06-26 02:31:00 +02:00

Comprehensive Rust v2 (#1073)

I've taken some work by @fw-immunant and others on the new organization
of the course and condensed it into a form amenable to a text editor and
some computational analysis. You can see the inputs in `course.py` but
the interesting bits are the output: `outline.md` and `slides.md`.

The idea is to break the course into more, smaller segments with
exercises at the ends and breaks in between. So `outline.md` lists the
segments, their duration, and sums those durations up per-day. It shows
we're about an hour too long right now! There are more details of the
segments in `slides.md`, or you can see mostly the same stuff in
`course.py`.

This now contains all of the content from the v1 course, ensuring both
that we've covered everything and that we'll have somewhere to redirect
every page.

Fixes #1082.
Fixes #1465.

---------

Co-authored-by: Nicole LeGare <dlegare.1001@gmail.com>
Co-authored-by: Martin Geisler <mgeisler@google.com>
This commit is contained in:
Dustin J. Mitchell
2023-11-29 10:39:24 -05:00
committed by GitHub
parent ea204774b6
commit 6d19292f16
309 changed files with 6807 additions and 4281 deletions

View File

@ -0,0 +1,12 @@
[package]
name = "unsafe-rust"
version = "0.1.0"
edition = "2021"
publish = false
[dependencies]
tempfile = "*"
[[bin]]
name = "listdir"
path = "exercise.rs"

View File

@ -0,0 +1,58 @@
---
minutes: 10
---
# Dereferencing Raw Pointers
Creating pointers is safe, but dereferencing them requires `unsafe`:
```rust,editable
fn main() {
let mut s = String::from("careful!");
let r1 = &mut s as *mut String;
let r2 = r1 as *const String;
// Safe because r1 and r2 were obtained from references and so are
// guaranteed to be non-null and properly aligned, the objects underlying
// the references from which they were obtained are live throughout the
// whole unsafe block, and they are not accessed either through the
// references or concurrently through any other pointers.
unsafe {
println!("r1 is: {}", *r1);
*r1 = String::from("uhoh");
println!("r2 is: {}", *r2);
}
// NOT SAFE. DO NOT DO THIS.
/*
let r3: &String = unsafe { &*r1 };
drop(s);
println!("r3 is: {}", *r3);
*/
}
```
<details>
It is good practice (and required by the Android Rust style guide) to write a comment for each
`unsafe` block explaining how the code inside it satisfies the safety requirements of the unsafe
operations it is doing.
In the case of pointer dereferences, this means that the pointers must be
[_valid_](https://doc.rust-lang.org/std/ptr/index.html#safety), i.e.:
* The pointer must be non-null.
* The pointer must be _dereferenceable_ (within the bounds of a single allocated object).
* The object must not have been deallocated.
* There must not be concurrent accesses to the same location.
* If the pointer was obtained by casting a reference, the underlying object must be live and no
reference may be used to access the memory.
In most cases the pointer must also be properly aligned.
The "NOT SAFE" sectoin gives an example of a common kind of UB bug: `*r1` has
the `'static` lifetime, so `r3` has type `&'static String`, and thus outlives
`s`. Creating a reference from a pointer requires _great care_.
</details>

View File

@ -0,0 +1,74 @@
---
minutes: 30
---
# Safe FFI Wrapper
Rust has great support for calling functions through a _foreign function
interface_ (FFI). We will use this to build a safe wrapper for the `libc`
functions you would use from C to read the names of files in a directory.
You will want to consult the manual pages:
* [`opendir(3)`](https://man7.org/linux/man-pages/man3/opendir.3.html)
* [`readdir(3)`](https://man7.org/linux/man-pages/man3/readdir.3.html)
* [`closedir(3)`](https://man7.org/linux/man-pages/man3/closedir.3.html)
You will also want to browse the [`std::ffi`] module. There you find a number of
string types which you need for the exercise:
| Types | Encoding | Use |
|----------------------------|----------------|--------------------------------|
| [`str`] and [`String`] | UTF-8 | Text processing in Rust |
| [`CStr`] and [`CString`] | NUL-terminated | Communicating with C functions |
| [`OsStr`] and [`OsString`] | OS-specific | Communicating with the OS |
You will convert between all these types:
- `&str` to `CString`: you need to allocate space for a trailing `\0` character,
- `CString` to `*const i8`: you need a pointer to call C functions,
- `*const i8` to `&CStr`: you need something which can find the trailing `\0` character,
- `&CStr` to `&[u8]`: a slice of bytes is the universal interface for "some unknown data",
- `&[u8]` to `&OsStr`: `&OsStr` is a step towards `OsString`, use
[`OsStrExt`](https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStrExt.html)
to create it,
- `&OsStr` to `OsString`: you need to clone the data in `&OsStr` to be able to return it and call
`readdir` again.
The [Nomicon] also has a very useful chapter about FFI.
[`std::ffi`]: https://doc.rust-lang.org/std/ffi/
[`str`]: https://doc.rust-lang.org/std/primitive.str.html
[`String`]: https://doc.rust-lang.org/std/string/struct.String.html
[`CStr`]: https://doc.rust-lang.org/std/ffi/struct.CStr.html
[`CString`]: https://doc.rust-lang.org/std/ffi/struct.CString.html
[`OsStr`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html
[`OsString`]: https://doc.rust-lang.org/std/ffi/struct.OsString.html
[Nomicon]: https://doc.rust-lang.org/nomicon/ffi.html
Copy the code below to <https://play.rust-lang.org/> and fill in the missing
functions and methods:
```rust,should_panic
// TODO: remove this when you're done with your implementation.
#![allow(unused_imports, unused_variables, dead_code)]
{{#include exercise.rs:ffi}}
{{#include exercise.rs:DirectoryIterator}}
unimplemented!()
}
}
{{#include exercise.rs:Iterator}}
unimplemented!()
}
}
{{#include exercise.rs:Drop}}
unimplemented!()
}
}
{{#include exercise.rs:main}}
```

179
src/unsafe-rust/exercise.rs Normal file
View File

@ -0,0 +1,179 @@
// Copyright 2022 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ANCHOR: solution
// ANCHOR: ffi
mod ffi {
use std::os::raw::{c_char, c_int};
#[cfg(not(target_os = "macos"))]
use std::os::raw::{c_long, c_uchar, c_ulong, c_ushort};
// Opaque type. See https://doc.rust-lang.org/nomicon/ffi.html.
#[repr(C)]
pub struct DIR {
_data: [u8; 0],
_marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
}
// Layout according to the Linux man page for readdir(3), where ino_t and
// off_t are resolved according to the definitions in
// /usr/include/x86_64-linux-gnu/{sys/types.h, bits/typesizes.h}.
#[cfg(not(target_os = "macos"))]
#[repr(C)]
pub struct dirent {
pub d_ino: c_ulong,
pub d_off: c_long,
pub d_reclen: c_ushort,
pub d_type: c_uchar,
pub d_name: [c_char; 256],
}
// Layout according to the macOS man page for dir(5).
#[cfg(all(target_os = "macos"))]
#[repr(C)]
pub struct dirent {
pub d_fileno: u64,
pub d_seekoff: u64,
pub d_reclen: u16,
pub d_namlen: u16,
pub d_type: u8,
pub d_name: [c_char; 1024],
}
extern "C" {
pub fn opendir(s: *const c_char) -> *mut DIR;
#[cfg(not(all(target_os = "macos", target_arch = "x86_64")))]
pub fn readdir(s: *mut DIR) -> *const dirent;
// See https://github.com/rust-lang/libc/issues/414 and the section on
// _DARWIN_FEATURE_64_BIT_INODE in the macOS man page for stat(2).
//
// "Platforms that existed before these updates were available" refers
// to macOS (as opposed to iOS / wearOS / etc.) on Intel and PowerPC.
#[cfg(all(target_os = "macos", target_arch = "x86_64"))]
#[link_name = "readdir$INODE64"]
pub fn readdir(s: *mut DIR) -> *const dirent;
pub fn closedir(s: *mut DIR) -> c_int;
}
}
use std::ffi::{CStr, CString, OsStr, OsString};
use std::os::unix::ffi::OsStrExt;
#[derive(Debug)]
struct DirectoryIterator {
path: CString,
dir: *mut ffi::DIR,
}
// ANCHOR_END: ffi
// ANCHOR: DirectoryIterator
impl DirectoryIterator {
fn new(path: &str) -> Result<DirectoryIterator, String> {
// Call opendir and return a Ok value if that worked,
// otherwise return Err with a message.
// ANCHOR_END: DirectoryIterator
let path = CString::new(path).map_err(|err| format!("Invalid path: {err}"))?;
// SAFETY: path.as_ptr() cannot be NULL.
let dir = unsafe { ffi::opendir(path.as_ptr()) };
if dir.is_null() {
Err(format!("Could not open {:?}", path))
} else {
Ok(DirectoryIterator { path, dir })
}
}
}
// ANCHOR: Iterator
impl Iterator for DirectoryIterator {
type Item = OsString;
fn next(&mut self) -> Option<OsString> {
// Keep calling readdir until we get a NULL pointer back.
// ANCHOR_END: Iterator
// SAFETY: self.dir is never NULL.
let dirent = unsafe { ffi::readdir(self.dir) };
if dirent.is_null() {
// We have reached the end of the directory.
return None;
}
// SAFETY: dirent is not NULL and dirent.d_name is NUL
// terminated.
let d_name = unsafe { CStr::from_ptr((*dirent).d_name.as_ptr()) };
let os_str = OsStr::from_bytes(d_name.to_bytes());
Some(os_str.to_owned())
}
}
// ANCHOR: Drop
impl Drop for DirectoryIterator {
fn drop(&mut self) {
// Call closedir as needed.
// ANCHOR_END: Drop
if !self.dir.is_null() {
// SAFETY: self.dir is not NULL.
if unsafe { ffi::closedir(self.dir) } != 0 {
panic!("Could not close {:?}", self.path);
}
}
}
}
// ANCHOR: main
fn main() -> Result<(), String> {
let iter = DirectoryIterator::new(".")?;
println!("files: {:#?}", iter.collect::<Vec<_>>());
Ok(())
}
// ANCHOR_END: main
#[cfg(test)]
mod tests {
use super::*;
use std::error::Error;
#[test]
fn test_nonexisting_directory() {
let iter = DirectoryIterator::new("no-such-directory");
assert!(iter.is_err());
}
#[test]
fn test_empty_directory() -> Result<(), Box<dyn Error>> {
let tmp = tempfile::TempDir::new()?;
let iter = DirectoryIterator::new(
tmp.path().to_str().ok_or("Non UTF-8 character in path")?,
)?;
let mut entries = iter.collect::<Vec<_>>();
entries.sort();
assert_eq!(entries, &[".", ".."]);
Ok(())
}
#[test]
fn test_nonempty_directory() -> Result<(), Box<dyn Error>> {
let tmp = tempfile::TempDir::new()?;
std::fs::write(tmp.path().join("foo.txt"), "The Foo Diaries\n")?;
std::fs::write(tmp.path().join("bar.png"), "<PNG>\n")?;
std::fs::write(tmp.path().join("crab.rs"), "//! Crab\n")?;
let iter = DirectoryIterator::new(
tmp.path().to_str().ok_or("Non UTF-8 character in path")?,
)?;
let mut entries = iter.collect::<Vec<_>>();
entries.sort();
assert_eq!(entries, &[".", "..", "bar.png", "crab.rs", "foo.txt"]);
Ok(())
}
}

View File

@ -0,0 +1,43 @@
---
minutes: 5
---
# Mutable Static Variables
It is safe to read an immutable static variable:
```rust,editable
static HELLO_WORLD: &str = "Hello, world!";
fn main() {
println!("HELLO_WORLD: {HELLO_WORLD}");
}
```
However, since data races can occur, it is unsafe to read and write mutable
static variables:
```rust,editable
static mut COUNTER: u32 = 0;
fn add_to_counter(inc: u32) {
unsafe { COUNTER += inc; } // Potential data race!
}
fn main() {
add_to_counter(42);
unsafe { println!("COUNTER: {COUNTER}"); } // Potential data race!
}
```
<details>
- The program here is safe because it is single-threaded. However, the Rust compiler is conservative
and will assume the worst. Try removing the `unsafe` and see how the compiler explains that it is
undefined behavior to mutate a static from multiple threads.
- Using a mutable static is generally a bad idea, but there are some cases where it might make sense
in low-level `no_std` code, such as implementing a heap allocator or working with some C APIs.
</details>

View File

@ -0,0 +1,5 @@
# Solution
```rust,editable
{{#include exercise.rs:solution}}
```

32
src/unsafe-rust/unions.md Normal file
View File

@ -0,0 +1,32 @@
---
minutes: 5
---
# Unions
Unions are like enums, but you need to track the active field yourself:
```rust,editable
#[repr(C)]
union MyUnion {
i: u8,
b: bool,
}
fn main() {
let u = MyUnion { i: 42 };
println!("int: {}", unsafe { u.i });
println!("bool: {}", unsafe { u.b }); // Undefined behavior!
}
```
<details>
Unions are very rarely needed in Rust as you can usually use an enum. They are occasionally needed
for interacting with C library APIs.
If you just want to reinterpret bytes as a different type, you probably want
[`std::mem::transmute`](https://doc.rust-lang.org/stable/std/mem/fn.transmute.html) or a safe
wrapper such as the [`zerocopy`](https://crates.io/crates/zerocopy) crate.
</details>

View File

@ -0,0 +1,98 @@
---
minutes: 5
---
# Unsafe Functions
## Calling Unsafe Functions
A function or method can be marked `unsafe` if it has extra preconditions you
must uphold to avoid undefined behaviour:
```rust,editable
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
let emojis = "🗻∈🌏";
// Safe because the indices are in the correct order, within the bounds of
// the string slice, and lie on UTF-8 sequence boundaries.
unsafe {
println!("emoji: {}", emojis.get_unchecked(0..4));
println!("emoji: {}", emojis.get_unchecked(4..7));
println!("emoji: {}", emojis.get_unchecked(7..11));
}
println!("char count: {}", count_chars(unsafe { emojis.get_unchecked(0..7) }));
unsafe {
// Undefined behavior if abs misbehaves.
println!("Absolute value of -3 according to C: {}", abs(-3));
}
// Not upholding the UTF-8 encoding requirement breaks memory safety!
// println!("emoji: {}", unsafe { emojis.get_unchecked(0..3) });
// println!("char count: {}", count_chars(unsafe { emojis.get_unchecked(0..3) }));
}
fn count_chars(s: &str) -> usize {
s.chars().count()
}
```
## Writing Unsafe Functions
You can mark your own functions as `unsafe` if they require particular conditions to avoid undefined
behaviour.
```rust,editable
/// Swaps the values pointed to by the given pointers.
///
/// # Safety
///
/// The pointers must be valid and properly aligned.
unsafe fn swap(a: *mut u8, b: *mut u8) {
let temp = *a;
*a = *b;
*b = temp;
}
fn main() {
let mut a = 42;
let mut b = 66;
// Safe because ...
unsafe {
swap(&mut a, &mut b);
}
println!("a = {}, b = {}", a, b);
}
```
<details>
## Calling Unsafe Functions
`get_unchecked`, like most `_unchecked` functions, is unsafe, because it can
create UB if the range is incorrect. `abs` is incorrect for a different reason:
it is an external function (FFI). Calling external functions is usually only a
problem when those functions do things with pointers which might violate Rust's
memory model, but in general any C function might have undefined behaviour
under any arbitrary circumstances.
The `"C"` in this example is the ABI;
[other ABIs are available too](https://doc.rust-lang.org/reference/items/external-blocks.html).
## Writing Unsafe Functions
We wouldn't actually use pointers for a `swap` function - it can be done safely
with references.
Note that unsafe code is allowed within an unsafe function without an `unsafe`
block. We can prohibit this with `#[deny(unsafe_op_in_unsafe_fn)]`. Try adding
it and see what happens. This will likely change in a future Rust edition.
</details>

View File

@ -0,0 +1,41 @@
---
minutes: 5
---
# Implementing Unsafe Traits
Like with functions, you can mark a trait as `unsafe` if the implementation must guarantee
particular conditions to avoid undefined behaviour.
For example, the `zerocopy` crate has an unsafe trait that looks
[something like this](https://docs.rs/zerocopy/latest/zerocopy/trait.AsBytes.html):
```rust,editable
use std::mem::size_of_val;
use std::slice;
/// ...
/// # Safety
/// The type must have a defined representation and no padding.
pub unsafe trait AsBytes {
fn as_bytes(&self) -> &[u8] {
unsafe {
slice::from_raw_parts(self as *const Self as *const u8, size_of_val(self))
}
}
}
// Safe because u32 has a defined representation and no padding.
unsafe impl AsBytes for u32 {}
```
<details>
There should be a `# Safety` section on the Rustdoc for the trait explaining the requirements for
the trait to be safely implemented.
The actual safety section for `AsBytes` is rather longer and more complicated.
The built-in `Send` and `Sync` traits are unsafe.
</details>

36
src/unsafe-rust/unsafe.md Normal file
View File

@ -0,0 +1,36 @@
---
minutes: 5
---
# Unsafe Rust
The Rust language has two parts:
* **Safe Rust:** memory safe, no undefined behavior possible.
* **Unsafe Rust:** can trigger undefined behavior if preconditions are violated.
We saw mostly safe Rust in this course, but it's important to know
what Unsafe Rust is.
Unsafe code is usually small and isolated, and its correctness should be carefully
documented. It is usually wrapped in a safe abstraction layer.
Unsafe Rust gives you access to five new capabilities:
* Dereference raw pointers.
* Access or modify mutable static variables.
* Access `union` fields.
* Call `unsafe` functions, including `extern` functions.
* Implement `unsafe` traits.
We will briefly cover unsafe capabilities next. For full details, please see
[Chapter 19.1 in the Rust Book](https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html)
and the [Rustonomicon](https://doc.rust-lang.org/nomicon/).
<details>
Unsafe Rust does not mean the code is incorrect. It means that developers have
turned off some compiler safety features and have to write correct code by
themselves. It means the compiler no longer enforces Rust's memory-safety rules.
</details>