You've already forked comprehensive-rust
mirror of
https://github.com/google/comprehensive-rust.git
synced 2025-07-17 19:37:48 +02:00
Move slices and strings to references section (#1898)
This PR moves the slides for slices and strings into the day 1 section on references. This seems like the more natural place to introduce slices since slices are a type of reference. It then also made sense to me to follow that with the introduction of `&str` and `String`, since students now have the context to understand what a "string slice" is. I also removed the strings slide from the types and values section since it didn't make sense to cover the same topic twice in the same day. I tested this new organization in my class on Wednesday and it didn't cause day 1 to take too long.
This commit is contained in:
12
src/lifetimes/Cargo.toml
Normal file
12
src/lifetimes/Cargo.toml
Normal file
@ -0,0 +1,12 @@
|
||||
[package]
|
||||
name = "lifetimes"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
publish = false
|
||||
|
||||
[dependencies]
|
||||
thiserror = "*"
|
||||
|
||||
[[bin]]
|
||||
name = "protobuf"
|
||||
path = "exercise.rs"
|
64
src/lifetimes/exercise.md
Normal file
64
src/lifetimes/exercise.md
Normal file
@ -0,0 +1,64 @@
|
||||
---
|
||||
minutes: 30
|
||||
---
|
||||
|
||||
# Exercise: Protobuf Parsing
|
||||
|
||||
In this exercise, you will build a parser for the
|
||||
[protobuf binary encoding](https://protobuf.dev/programming-guides/encoding/).
|
||||
Don't worry, it's simpler than it seems! This illustrates a common parsing
|
||||
pattern, passing slices of data. The underlying data itself is never copied.
|
||||
|
||||
Fully parsing a protobuf message requires knowing the types of the fields,
|
||||
indexed by their field numbers. That is typically provided in a `proto` file. In
|
||||
this exercise, we'll encode that information into `match` statements in
|
||||
functions that get called for each field.
|
||||
|
||||
We'll use the following proto:
|
||||
|
||||
```proto
|
||||
message PhoneNumber {
|
||||
optional string number = 1;
|
||||
optional string type = 2;
|
||||
}
|
||||
|
||||
message Person {
|
||||
optional string name = 1;
|
||||
optional int32 id = 2;
|
||||
repeated PhoneNumber phones = 3;
|
||||
}
|
||||
```
|
||||
|
||||
A proto message is encoded as a series of fields, one after the next. Each is
|
||||
implemented as a "tag" followed by the value. The tag contains a field number
|
||||
(e.g., `2` for the `id` field of a `Person` message) and a wire type defining
|
||||
how the payload should be determined from the byte stream.
|
||||
|
||||
Integers, including the tag, are represented with a variable-length encoding
|
||||
called VARINT. Luckily, `parse_varint` is defined for you below. The given code
|
||||
also defines callbacks to handle `Person` and `PhoneNumber` fields, and to parse
|
||||
a message into a series of calls to those callbacks.
|
||||
|
||||
What remains for you is to implement the `parse_field` function and the
|
||||
`ProtoMessage` trait for `Person` and `PhoneNumber`.
|
||||
|
||||
<!-- compile_fail because `mdbook test` does not allow use of `thiserror` -->
|
||||
|
||||
```rust,editable,compile_fail
|
||||
{{#include exercise.rs:preliminaries }}
|
||||
|
||||
|
||||
{{#include exercise.rs:parse_field }}
|
||||
_ => todo!("Based on the wire type, build a Field, consuming as many bytes as necessary.")
|
||||
};
|
||||
todo!("Return the field, and any un-consumed bytes.")
|
||||
}
|
||||
|
||||
{{#include exercise.rs:parse_message }}
|
||||
|
||||
{{#include exercise.rs:message_types}}
|
||||
|
||||
// TODO: Implement ProtoMessage for Person and PhoneNumber.
|
||||
|
||||
{{#include exercise.rs:main }}
|
||||
```
|
263
src/lifetimes/exercise.rs
Normal file
263
src/lifetimes/exercise.rs
Normal file
@ -0,0 +1,263 @@
|
||||
// Copyright 2023 Google LLC
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
|
||||
// ANCHOR: solution
|
||||
// ANCHOR: preliminaries
|
||||
use std::convert::TryFrom;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
enum Error {
|
||||
#[error("Invalid varint")]
|
||||
InvalidVarint,
|
||||
#[error("Invalid wire-type")]
|
||||
InvalidWireType,
|
||||
#[error("Unexpected EOF")]
|
||||
UnexpectedEOF,
|
||||
#[error("Invalid length")]
|
||||
InvalidSize(#[from] std::num::TryFromIntError),
|
||||
#[error("Unexpected wire-type)")]
|
||||
UnexpectedWireType,
|
||||
#[error("Invalid string (not UTF-8)")]
|
||||
InvalidString,
|
||||
}
|
||||
|
||||
/// A wire type as seen on the wire.
|
||||
enum WireType {
|
||||
/// The Varint WireType indicates the value is a single VARINT.
|
||||
Varint,
|
||||
//I64, -- not needed for this exercise
|
||||
/// The Len WireType indicates that the value is a length represented as a
|
||||
/// VARINT followed by exactly that number of bytes.
|
||||
Len,
|
||||
/// The I32 WireType indicates that the value is precisely 4 bytes in
|
||||
/// little-endian order containing a 32-bit signed integer.
|
||||
I32,
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
/// A field's value, typed based on the wire type.
|
||||
enum FieldValue<'a> {
|
||||
Varint(u64),
|
||||
//I64(i64), -- not needed for this exercise
|
||||
Len(&'a [u8]),
|
||||
I32(i32),
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
/// A field, containing the field number and its value.
|
||||
struct Field<'a> {
|
||||
field_num: u64,
|
||||
value: FieldValue<'a>,
|
||||
}
|
||||
|
||||
trait ProtoMessage<'a>: Default + 'a {
|
||||
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error>;
|
||||
}
|
||||
|
||||
impl TryFrom<u64> for WireType {
|
||||
type Error = Error;
|
||||
|
||||
fn try_from(value: u64) -> Result<WireType, Error> {
|
||||
Ok(match value {
|
||||
0 => WireType::Varint,
|
||||
//1 => WireType::I64, -- not needed for this exercise
|
||||
2 => WireType::Len,
|
||||
5 => WireType::I32,
|
||||
_ => return Err(Error::InvalidWireType),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a> FieldValue<'a> {
|
||||
fn as_string(&self) -> Result<&'a str, Error> {
|
||||
let FieldValue::Len(data) = self else {
|
||||
return Err(Error::UnexpectedWireType);
|
||||
};
|
||||
std::str::from_utf8(data).map_err(|_| Error::InvalidString)
|
||||
}
|
||||
|
||||
fn as_bytes(&self) -> Result<&'a [u8], Error> {
|
||||
let FieldValue::Len(data) = self else {
|
||||
return Err(Error::UnexpectedWireType);
|
||||
};
|
||||
Ok(data)
|
||||
}
|
||||
|
||||
fn as_u64(&self) -> Result<u64, Error> {
|
||||
let FieldValue::Varint(value) = self else {
|
||||
return Err(Error::UnexpectedWireType);
|
||||
};
|
||||
Ok(*value)
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse a VARINT, returning the parsed value and the remaining bytes.
|
||||
fn parse_varint(data: &[u8]) -> Result<(u64, &[u8]), Error> {
|
||||
for i in 0..7 {
|
||||
let Some(b) = data.get(i) else {
|
||||
return Err(Error::InvalidVarint);
|
||||
};
|
||||
if b & 0x80 == 0 {
|
||||
// This is the last byte of the VARINT, so convert it to
|
||||
// a u64 and return it.
|
||||
let mut value = 0u64;
|
||||
for b in data[..=i].iter().rev() {
|
||||
value = (value << 7) | (b & 0x7f) as u64;
|
||||
}
|
||||
return Ok((value, &data[i + 1..]));
|
||||
}
|
||||
}
|
||||
|
||||
// More than 7 bytes is invalid.
|
||||
Err(Error::InvalidVarint)
|
||||
}
|
||||
|
||||
/// Convert a tag into a field number and a WireType.
|
||||
fn unpack_tag(tag: u64) -> Result<(u64, WireType), Error> {
|
||||
let field_num = tag >> 3;
|
||||
let wire_type = WireType::try_from(tag & 0x7)?;
|
||||
Ok((field_num, wire_type))
|
||||
}
|
||||
// ANCHOR_END: preliminaries
|
||||
|
||||
// ANCHOR: parse_field
|
||||
/// Parse a field, returning the remaining bytes
|
||||
fn parse_field(data: &[u8]) -> Result<(Field, &[u8]), Error> {
|
||||
let (tag, remainder) = parse_varint(data)?;
|
||||
let (field_num, wire_type) = unpack_tag(tag)?;
|
||||
let (fieldvalue, remainder) = match wire_type {
|
||||
// ANCHOR_END: parse_field
|
||||
WireType::Varint => {
|
||||
let (value, remainder) = parse_varint(remainder)?;
|
||||
(FieldValue::Varint(value), remainder)
|
||||
}
|
||||
WireType::Len => {
|
||||
let (len, remainder) = parse_varint(remainder)?;
|
||||
let len: usize = len.try_into()?;
|
||||
if remainder.len() < len {
|
||||
return Err(Error::UnexpectedEOF);
|
||||
}
|
||||
let (value, remainder) = remainder.split_at(len);
|
||||
(FieldValue::Len(value), remainder)
|
||||
}
|
||||
WireType::I32 => {
|
||||
if remainder.len() < 4 {
|
||||
return Err(Error::UnexpectedEOF);
|
||||
}
|
||||
let (value, remainder) = remainder.split_at(4);
|
||||
// Unwrap error because `value` is definitely 4 bytes long.
|
||||
let value = i32::from_le_bytes(value.try_into().unwrap());
|
||||
(FieldValue::I32(value), remainder)
|
||||
}
|
||||
};
|
||||
Ok((Field { field_num, value: fieldvalue }, remainder))
|
||||
}
|
||||
|
||||
// ANCHOR: parse_message
|
||||
/// Parse a message in the given data, calling `T::add_field` for each field in
|
||||
/// the message.
|
||||
///
|
||||
/// The entire input is consumed.
|
||||
fn parse_message<'a, T: ProtoMessage<'a>>(mut data: &'a [u8]) -> Result<T, Error> {
|
||||
let mut result = T::default();
|
||||
while !data.is_empty() {
|
||||
let parsed = parse_field(data)?;
|
||||
result.add_field(parsed.0)?;
|
||||
data = parsed.1;
|
||||
}
|
||||
Ok(result)
|
||||
}
|
||||
// ANCHOR_END: parse_message
|
||||
|
||||
// ANCHOR: message_types
|
||||
#[derive(Debug, Default)]
|
||||
struct PhoneNumber<'a> {
|
||||
number: &'a str,
|
||||
type_: &'a str,
|
||||
}
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
struct Person<'a> {
|
||||
name: &'a str,
|
||||
id: u64,
|
||||
phone: Vec<PhoneNumber<'a>>,
|
||||
}
|
||||
// ANCHOR_END: message_types
|
||||
|
||||
impl<'a> ProtoMessage<'a> for Person<'a> {
|
||||
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error> {
|
||||
match field.field_num {
|
||||
1 => self.name = field.value.as_string()?,
|
||||
2 => self.id = field.value.as_u64()?,
|
||||
3 => self.phone.push(parse_message(field.value.as_bytes()?)?),
|
||||
_ => {} // skip everything else
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl<'a> ProtoMessage<'a> for PhoneNumber<'a> {
|
||||
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error> {
|
||||
match field.field_num {
|
||||
1 => self.number = field.value.as_string()?,
|
||||
2 => self.type_ = field.value.as_string()?,
|
||||
_ => {} // skip everything else
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
// ANCHOR: main
|
||||
fn main() {
|
||||
let person: Person = parse_message(&[
|
||||
0x0a, 0x07, 0x6d, 0x61, 0x78, 0x77, 0x65, 0x6c, 0x6c, 0x10, 0x2a, 0x1a,
|
||||
0x16, 0x0a, 0x0e, 0x2b, 0x31, 0x32, 0x30, 0x32, 0x2d, 0x35, 0x35, 0x35,
|
||||
0x2d, 0x31, 0x32, 0x31, 0x32, 0x12, 0x04, 0x68, 0x6f, 0x6d, 0x65, 0x1a,
|
||||
0x18, 0x0a, 0x0e, 0x2b, 0x31, 0x38, 0x30, 0x30, 0x2d, 0x38, 0x36, 0x37,
|
||||
0x2d, 0x35, 0x33, 0x30, 0x38, 0x12, 0x06, 0x6d, 0x6f, 0x62, 0x69, 0x6c,
|
||||
0x65,
|
||||
])
|
||||
.unwrap();
|
||||
println!("{:#?}", person);
|
||||
}
|
||||
// ANCHOR_END: main
|
||||
|
||||
// ANCHOR: tests
|
||||
#[cfg(test)]
|
||||
mod test {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn as_string() {
|
||||
assert!(FieldValue::Varint(10).as_string().is_err());
|
||||
assert!(FieldValue::I32(10).as_string().is_err());
|
||||
assert_eq!(FieldValue::Len(b"hello").as_string().unwrap(), "hello");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn as_bytes() {
|
||||
assert!(FieldValue::Varint(10).as_bytes().is_err());
|
||||
assert!(FieldValue::I32(10).as_bytes().is_err());
|
||||
assert_eq!(FieldValue::Len(b"hello").as_bytes().unwrap(), b"hello");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn as_u64() {
|
||||
assert_eq!(FieldValue::Varint(10).as_u64().unwrap(), 10u64);
|
||||
assert!(FieldValue::I32(10).as_u64().is_err());
|
||||
assert!(FieldValue::Len(b"hello").as_u64().is_err());
|
||||
}
|
||||
}
|
||||
// ANCHOR_END: tests
|
64
src/lifetimes/lifetime-annotations.md
Normal file
64
src/lifetimes/lifetime-annotations.md
Normal file
@ -0,0 +1,64 @@
|
||||
---
|
||||
minutes: 10
|
||||
---
|
||||
|
||||
# Lifetime Annotations
|
||||
|
||||
A reference has a _lifetime_, which must not "outlive" the value it refers to.
|
||||
This is verified by the borrow checker.
|
||||
|
||||
The lifetime can be implicit - this is what we have seen so far. Lifetimes can
|
||||
also be explicit: `&'a Point`, `&'document str`. Lifetimes start with `'` and
|
||||
`'a` is a typical default name. Read `&'a Point` as "a borrowed `Point` which is
|
||||
valid for at least the lifetime `a`".
|
||||
|
||||
Lifetimes are always inferred by the compiler: you cannot assign a lifetime
|
||||
yourself. Explicit lifetime annotations create constraints where there is
|
||||
ambiguity; the compiler verifies that there is a valid solution.
|
||||
|
||||
Lifetimes become more complicated when considering passing values to and
|
||||
returning values from functions.
|
||||
|
||||
<!-- The multi-line formatting by rustfmt in left_most is apparently
|
||||
intentional: https://github.com/rust-lang/rustfmt/issues/1908 -->
|
||||
|
||||
```rust,editable,compile_fail
|
||||
#[derive(Debug)]
|
||||
struct Point(i32, i32);
|
||||
|
||||
fn left_most(p1: &Point, p2: &Point) -> &Point {
|
||||
if p1.0 < p2.0 {
|
||||
p1
|
||||
} else {
|
||||
p2
|
||||
}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let p1: Point = Point(10, 10);
|
||||
let p2: Point = Point(20, 20);
|
||||
let p3 = left_most(&p1, &p2); // What is the lifetime of p3?
|
||||
println!("p3: {p3:?}");
|
||||
}
|
||||
```
|
||||
|
||||
<details>
|
||||
|
||||
In this example, the compiler does not know what lifetime to infer for `p3`.
|
||||
Looking inside the function body shows that it can only safely assume that
|
||||
`p3`'s lifetime is the shorter of `p1` and `p2`. But just like types, Rust
|
||||
requires explicit annotations of lifetimes on function arguments and return
|
||||
values.
|
||||
|
||||
Add `'a` appropriately to `left_most`:
|
||||
|
||||
```rust,ignore
|
||||
fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {
|
||||
```
|
||||
|
||||
This says, "given p1 and p2 which both outlive `'a`, the return value lives for
|
||||
at least `'a`.
|
||||
|
||||
In common cases, lifetimes can be elided, as described on the next slide.
|
||||
|
||||
</details>
|
77
src/lifetimes/lifetime-elision.md
Normal file
77
src/lifetimes/lifetime-elision.md
Normal file
@ -0,0 +1,77 @@
|
||||
---
|
||||
minutes: 5
|
||||
---
|
||||
|
||||
# Lifetimes in Function Calls
|
||||
|
||||
Lifetimes for function arguments and return values must be fully specified, but
|
||||
Rust allows lifetimes to be elided in most cases with
|
||||
[a few simple rules](https://doc.rust-lang.org/nomicon/lifetime-elision.html).
|
||||
This is not inference -- it is just a syntactic shorthand.
|
||||
|
||||
- Each argument which does not have a lifetime annotation is given one.
|
||||
- If there is only one argument lifetime, it is given to all un-annotated return
|
||||
values.
|
||||
- If there are multiple argument lifetimes, but the first one is for `self`,
|
||||
that lifetime is given to all un-annotated return values.
|
||||
|
||||
<!-- mdbook-xgettext: skip -->
|
||||
|
||||
```rust,editable
|
||||
#[derive(Debug)]
|
||||
struct Point(i32, i32);
|
||||
|
||||
fn cab_distance(p1: &Point, p2: &Point) -> i32 {
|
||||
(p1.0 - p2.0).abs() + (p1.1 - p2.1).abs()
|
||||
}
|
||||
|
||||
fn nearest<'a>(points: &'a [Point], query: &Point) -> Option<&'a Point> {
|
||||
let mut nearest = None;
|
||||
for p in points {
|
||||
if let Some((_, nearest_dist)) = nearest {
|
||||
let dist = cab_distance(p, query);
|
||||
if dist < nearest_dist {
|
||||
nearest = Some((p, dist));
|
||||
}
|
||||
} else {
|
||||
nearest = Some((p, cab_distance(p, query)));
|
||||
};
|
||||
}
|
||||
nearest.map(|(p, _)| p)
|
||||
}
|
||||
|
||||
fn main() {
|
||||
println!(
|
||||
"{:?}",
|
||||
nearest(
|
||||
&[Point(1, 0), Point(1, 0), Point(-1, 0), Point(0, -1),],
|
||||
&Point(0, 2)
|
||||
)
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
<details>
|
||||
|
||||
In this example, `cab_distance` is trivially elided.
|
||||
|
||||
The `nearest` function provides another example of a function with multiple
|
||||
references in its arguments that requires explicit annotation.
|
||||
|
||||
Try adjusting the signature to "lie" about the lifetimes returned:
|
||||
|
||||
```rust,ignore
|
||||
fn nearest<'a, 'q>(points: &'a [Point], query: &'q Point) -> Option<&'q Point> {
|
||||
```
|
||||
|
||||
This won't compile, demonstrating that the annotations are checked for validity
|
||||
by the compiler. Note that this is not the case for raw pointers (unsafe), and
|
||||
this is a common source of errors with unsafe Rust.
|
||||
|
||||
Students may ask when to use lifetimes. Rust borrows _always_ have lifetimes.
|
||||
Most of the time, elision and type inference mean these don't need to be written
|
||||
out. In more complicated cases, lifetime annotations can help resolve ambiguity.
|
||||
Often, especially when prototyping, it's easier to just work with owned data by
|
||||
cloning values where necessary.
|
||||
|
||||
</details>
|
7
src/lifetimes/solution.md
Normal file
7
src/lifetimes/solution.md
Normal file
@ -0,0 +1,7 @@
|
||||
# Solution
|
||||
|
||||
<!-- compile_fail because `mdbook test` does not allow use of `thiserror` -->
|
||||
|
||||
```rust,editable,compile_fail
|
||||
{{#include exercise.rs:solution}}
|
||||
```
|
43
src/lifetimes/struct-lifetimes.md
Normal file
43
src/lifetimes/struct-lifetimes.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
minutes: 5
|
||||
---
|
||||
|
||||
# Lifetimes in Data Structures
|
||||
|
||||
If a data type stores borrowed data, it must be annotated with a lifetime:
|
||||
|
||||
```rust,editable
|
||||
#[derive(Debug)]
|
||||
struct Highlight<'doc>(&'doc str);
|
||||
|
||||
fn erase(text: String) {
|
||||
println!("Bye {text}!");
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let text = String::from("The quick brown fox jumps over the lazy dog.");
|
||||
let fox = Highlight(&text[4..19]);
|
||||
let dog = Highlight(&text[35..43]);
|
||||
// erase(text);
|
||||
println!("{fox:?}");
|
||||
println!("{dog:?}");
|
||||
}
|
||||
```
|
||||
|
||||
<details>
|
||||
|
||||
- In the above example, the annotation on `Highlight` enforces that the data
|
||||
underlying the contained `&str` lives at least as long as any instance of
|
||||
`Highlight` that uses that data.
|
||||
- If `text` is consumed before the end of the lifetime of `fox` (or `dog`), the
|
||||
borrow checker throws an error.
|
||||
- Types with borrowed data force users to hold on to the original data. This can
|
||||
be useful for creating lightweight views, but it generally makes them somewhat
|
||||
harder to use.
|
||||
- When possible, make data structures own their data directly.
|
||||
- Some structs with multiple references inside can have more than one lifetime
|
||||
annotation. This can be necessary if there is a need to describe lifetime
|
||||
relationships between the references themselves, in addition to the lifetime
|
||||
of the struct itself. Those are very advanced use cases.
|
||||
|
||||
</details>
|
Reference in New Issue
Block a user