1
0
mirror of https://github.com/google/comprehensive-rust.git synced 2025-07-17 19:37:48 +02:00

Move slices and strings to references section (#1898)

This PR moves the slides for slices and strings into the day 1 section
on references. This seems like the more natural place to introduce
slices since slices are a type of reference. It then also made sense to
me to follow that with the introduction of `&str` and `String`, since
students now have the context to understand what a "string slice" is. I
also removed the strings slide from the types and values section since
it didn't make sense to cover the same topic twice in the same day. I
tested this new organization in my class on Wednesday and it didn't
cause day 1 to take too long.
This commit is contained in:
Nicole L
2024-03-14 13:21:15 -07:00
committed by GitHub
parent 4b27e28e7f
commit 7cd25c0262
15 changed files with 46 additions and 88 deletions

12
src/lifetimes/Cargo.toml Normal file
View File

@ -0,0 +1,12 @@
[package]
name = "lifetimes"
version = "0.1.0"
edition = "2021"
publish = false
[dependencies]
thiserror = "*"
[[bin]]
name = "protobuf"
path = "exercise.rs"

64
src/lifetimes/exercise.md Normal file
View File

@ -0,0 +1,64 @@
---
minutes: 30
---
# Exercise: Protobuf Parsing
In this exercise, you will build a parser for the
[protobuf binary encoding](https://protobuf.dev/programming-guides/encoding/).
Don't worry, it's simpler than it seems! This illustrates a common parsing
pattern, passing slices of data. The underlying data itself is never copied.
Fully parsing a protobuf message requires knowing the types of the fields,
indexed by their field numbers. That is typically provided in a `proto` file. In
this exercise, we'll encode that information into `match` statements in
functions that get called for each field.
We'll use the following proto:
```proto
message PhoneNumber {
optional string number = 1;
optional string type = 2;
}
message Person {
optional string name = 1;
optional int32 id = 2;
repeated PhoneNumber phones = 3;
}
```
A proto message is encoded as a series of fields, one after the next. Each is
implemented as a "tag" followed by the value. The tag contains a field number
(e.g., `2` for the `id` field of a `Person` message) and a wire type defining
how the payload should be determined from the byte stream.
Integers, including the tag, are represented with a variable-length encoding
called VARINT. Luckily, `parse_varint` is defined for you below. The given code
also defines callbacks to handle `Person` and `PhoneNumber` fields, and to parse
a message into a series of calls to those callbacks.
What remains for you is to implement the `parse_field` function and the
`ProtoMessage` trait for `Person` and `PhoneNumber`.
<!-- compile_fail because `mdbook test` does not allow use of `thiserror` -->
```rust,editable,compile_fail
{{#include exercise.rs:preliminaries }}
{{#include exercise.rs:parse_field }}
_ => todo!("Based on the wire type, build a Field, consuming as many bytes as necessary.")
};
todo!("Return the field, and any un-consumed bytes.")
}
{{#include exercise.rs:parse_message }}
{{#include exercise.rs:message_types}}
// TODO: Implement ProtoMessage for Person and PhoneNumber.
{{#include exercise.rs:main }}
```

263
src/lifetimes/exercise.rs Normal file
View File

@ -0,0 +1,263 @@
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ANCHOR: solution
// ANCHOR: preliminaries
use std::convert::TryFrom;
use thiserror::Error;
#[derive(Debug, Error)]
enum Error {
#[error("Invalid varint")]
InvalidVarint,
#[error("Invalid wire-type")]
InvalidWireType,
#[error("Unexpected EOF")]
UnexpectedEOF,
#[error("Invalid length")]
InvalidSize(#[from] std::num::TryFromIntError),
#[error("Unexpected wire-type)")]
UnexpectedWireType,
#[error("Invalid string (not UTF-8)")]
InvalidString,
}
/// A wire type as seen on the wire.
enum WireType {
/// The Varint WireType indicates the value is a single VARINT.
Varint,
//I64, -- not needed for this exercise
/// The Len WireType indicates that the value is a length represented as a
/// VARINT followed by exactly that number of bytes.
Len,
/// The I32 WireType indicates that the value is precisely 4 bytes in
/// little-endian order containing a 32-bit signed integer.
I32,
}
#[derive(Debug)]
/// A field's value, typed based on the wire type.
enum FieldValue<'a> {
Varint(u64),
//I64(i64), -- not needed for this exercise
Len(&'a [u8]),
I32(i32),
}
#[derive(Debug)]
/// A field, containing the field number and its value.
struct Field<'a> {
field_num: u64,
value: FieldValue<'a>,
}
trait ProtoMessage<'a>: Default + 'a {
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error>;
}
impl TryFrom<u64> for WireType {
type Error = Error;
fn try_from(value: u64) -> Result<WireType, Error> {
Ok(match value {
0 => WireType::Varint,
//1 => WireType::I64, -- not needed for this exercise
2 => WireType::Len,
5 => WireType::I32,
_ => return Err(Error::InvalidWireType),
})
}
}
impl<'a> FieldValue<'a> {
fn as_string(&self) -> Result<&'a str, Error> {
let FieldValue::Len(data) = self else {
return Err(Error::UnexpectedWireType);
};
std::str::from_utf8(data).map_err(|_| Error::InvalidString)
}
fn as_bytes(&self) -> Result<&'a [u8], Error> {
let FieldValue::Len(data) = self else {
return Err(Error::UnexpectedWireType);
};
Ok(data)
}
fn as_u64(&self) -> Result<u64, Error> {
let FieldValue::Varint(value) = self else {
return Err(Error::UnexpectedWireType);
};
Ok(*value)
}
}
/// Parse a VARINT, returning the parsed value and the remaining bytes.
fn parse_varint(data: &[u8]) -> Result<(u64, &[u8]), Error> {
for i in 0..7 {
let Some(b) = data.get(i) else {
return Err(Error::InvalidVarint);
};
if b & 0x80 == 0 {
// This is the last byte of the VARINT, so convert it to
// a u64 and return it.
let mut value = 0u64;
for b in data[..=i].iter().rev() {
value = (value << 7) | (b & 0x7f) as u64;
}
return Ok((value, &data[i + 1..]));
}
}
// More than 7 bytes is invalid.
Err(Error::InvalidVarint)
}
/// Convert a tag into a field number and a WireType.
fn unpack_tag(tag: u64) -> Result<(u64, WireType), Error> {
let field_num = tag >> 3;
let wire_type = WireType::try_from(tag & 0x7)?;
Ok((field_num, wire_type))
}
// ANCHOR_END: preliminaries
// ANCHOR: parse_field
/// Parse a field, returning the remaining bytes
fn parse_field(data: &[u8]) -> Result<(Field, &[u8]), Error> {
let (tag, remainder) = parse_varint(data)?;
let (field_num, wire_type) = unpack_tag(tag)?;
let (fieldvalue, remainder) = match wire_type {
// ANCHOR_END: parse_field
WireType::Varint => {
let (value, remainder) = parse_varint(remainder)?;
(FieldValue::Varint(value), remainder)
}
WireType::Len => {
let (len, remainder) = parse_varint(remainder)?;
let len: usize = len.try_into()?;
if remainder.len() < len {
return Err(Error::UnexpectedEOF);
}
let (value, remainder) = remainder.split_at(len);
(FieldValue::Len(value), remainder)
}
WireType::I32 => {
if remainder.len() < 4 {
return Err(Error::UnexpectedEOF);
}
let (value, remainder) = remainder.split_at(4);
// Unwrap error because `value` is definitely 4 bytes long.
let value = i32::from_le_bytes(value.try_into().unwrap());
(FieldValue::I32(value), remainder)
}
};
Ok((Field { field_num, value: fieldvalue }, remainder))
}
// ANCHOR: parse_message
/// Parse a message in the given data, calling `T::add_field` for each field in
/// the message.
///
/// The entire input is consumed.
fn parse_message<'a, T: ProtoMessage<'a>>(mut data: &'a [u8]) -> Result<T, Error> {
let mut result = T::default();
while !data.is_empty() {
let parsed = parse_field(data)?;
result.add_field(parsed.0)?;
data = parsed.1;
}
Ok(result)
}
// ANCHOR_END: parse_message
// ANCHOR: message_types
#[derive(Debug, Default)]
struct PhoneNumber<'a> {
number: &'a str,
type_: &'a str,
}
#[derive(Debug, Default)]
struct Person<'a> {
name: &'a str,
id: u64,
phone: Vec<PhoneNumber<'a>>,
}
// ANCHOR_END: message_types
impl<'a> ProtoMessage<'a> for Person<'a> {
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error> {
match field.field_num {
1 => self.name = field.value.as_string()?,
2 => self.id = field.value.as_u64()?,
3 => self.phone.push(parse_message(field.value.as_bytes()?)?),
_ => {} // skip everything else
}
Ok(())
}
}
impl<'a> ProtoMessage<'a> for PhoneNumber<'a> {
fn add_field(&mut self, field: Field<'a>) -> Result<(), Error> {
match field.field_num {
1 => self.number = field.value.as_string()?,
2 => self.type_ = field.value.as_string()?,
_ => {} // skip everything else
}
Ok(())
}
}
// ANCHOR: main
fn main() {
let person: Person = parse_message(&[
0x0a, 0x07, 0x6d, 0x61, 0x78, 0x77, 0x65, 0x6c, 0x6c, 0x10, 0x2a, 0x1a,
0x16, 0x0a, 0x0e, 0x2b, 0x31, 0x32, 0x30, 0x32, 0x2d, 0x35, 0x35, 0x35,
0x2d, 0x31, 0x32, 0x31, 0x32, 0x12, 0x04, 0x68, 0x6f, 0x6d, 0x65, 0x1a,
0x18, 0x0a, 0x0e, 0x2b, 0x31, 0x38, 0x30, 0x30, 0x2d, 0x38, 0x36, 0x37,
0x2d, 0x35, 0x33, 0x30, 0x38, 0x12, 0x06, 0x6d, 0x6f, 0x62, 0x69, 0x6c,
0x65,
])
.unwrap();
println!("{:#?}", person);
}
// ANCHOR_END: main
// ANCHOR: tests
#[cfg(test)]
mod test {
use super::*;
#[test]
fn as_string() {
assert!(FieldValue::Varint(10).as_string().is_err());
assert!(FieldValue::I32(10).as_string().is_err());
assert_eq!(FieldValue::Len(b"hello").as_string().unwrap(), "hello");
}
#[test]
fn as_bytes() {
assert!(FieldValue::Varint(10).as_bytes().is_err());
assert!(FieldValue::I32(10).as_bytes().is_err());
assert_eq!(FieldValue::Len(b"hello").as_bytes().unwrap(), b"hello");
}
#[test]
fn as_u64() {
assert_eq!(FieldValue::Varint(10).as_u64().unwrap(), 10u64);
assert!(FieldValue::I32(10).as_u64().is_err());
assert!(FieldValue::Len(b"hello").as_u64().is_err());
}
}
// ANCHOR_END: tests

View File

@ -0,0 +1,64 @@
---
minutes: 10
---
# Lifetime Annotations
A reference has a _lifetime_, which must not "outlive" the value it refers to.
This is verified by the borrow checker.
The lifetime can be implicit - this is what we have seen so far. Lifetimes can
also be explicit: `&'a Point`, `&'document str`. Lifetimes start with `'` and
`'a` is a typical default name. Read `&'a Point` as "a borrowed `Point` which is
valid for at least the lifetime `a`".
Lifetimes are always inferred by the compiler: you cannot assign a lifetime
yourself. Explicit lifetime annotations create constraints where there is
ambiguity; the compiler verifies that there is a valid solution.
Lifetimes become more complicated when considering passing values to and
returning values from functions.
<!-- The multi-line formatting by rustfmt in left_most is apparently
intentional: https://github.com/rust-lang/rustfmt/issues/1908 -->
```rust,editable,compile_fail
#[derive(Debug)]
struct Point(i32, i32);
fn left_most(p1: &Point, p2: &Point) -> &Point {
if p1.0 < p2.0 {
p1
} else {
p2
}
}
fn main() {
let p1: Point = Point(10, 10);
let p2: Point = Point(20, 20);
let p3 = left_most(&p1, &p2); // What is the lifetime of p3?
println!("p3: {p3:?}");
}
```
<details>
In this example, the compiler does not know what lifetime to infer for `p3`.
Looking inside the function body shows that it can only safely assume that
`p3`'s lifetime is the shorter of `p1` and `p2`. But just like types, Rust
requires explicit annotations of lifetimes on function arguments and return
values.
Add `'a` appropriately to `left_most`:
```rust,ignore
fn left_most<'a>(p1: &'a Point, p2: &'a Point) -> &'a Point {
```
This says, "given p1 and p2 which both outlive `'a`, the return value lives for
at least `'a`.
In common cases, lifetimes can be elided, as described on the next slide.
</details>

View File

@ -0,0 +1,77 @@
---
minutes: 5
---
# Lifetimes in Function Calls
Lifetimes for function arguments and return values must be fully specified, but
Rust allows lifetimes to be elided in most cases with
[a few simple rules](https://doc.rust-lang.org/nomicon/lifetime-elision.html).
This is not inference -- it is just a syntactic shorthand.
- Each argument which does not have a lifetime annotation is given one.
- If there is only one argument lifetime, it is given to all un-annotated return
values.
- If there are multiple argument lifetimes, but the first one is for `self`,
that lifetime is given to all un-annotated return values.
<!-- mdbook-xgettext: skip -->
```rust,editable
#[derive(Debug)]
struct Point(i32, i32);
fn cab_distance(p1: &Point, p2: &Point) -> i32 {
(p1.0 - p2.0).abs() + (p1.1 - p2.1).abs()
}
fn nearest<'a>(points: &'a [Point], query: &Point) -> Option<&'a Point> {
let mut nearest = None;
for p in points {
if let Some((_, nearest_dist)) = nearest {
let dist = cab_distance(p, query);
if dist < nearest_dist {
nearest = Some((p, dist));
}
} else {
nearest = Some((p, cab_distance(p, query)));
};
}
nearest.map(|(p, _)| p)
}
fn main() {
println!(
"{:?}",
nearest(
&[Point(1, 0), Point(1, 0), Point(-1, 0), Point(0, -1),],
&Point(0, 2)
)
);
}
```
<details>
In this example, `cab_distance` is trivially elided.
The `nearest` function provides another example of a function with multiple
references in its arguments that requires explicit annotation.
Try adjusting the signature to "lie" about the lifetimes returned:
```rust,ignore
fn nearest<'a, 'q>(points: &'a [Point], query: &'q Point) -> Option<&'q Point> {
```
This won't compile, demonstrating that the annotations are checked for validity
by the compiler. Note that this is not the case for raw pointers (unsafe), and
this is a common source of errors with unsafe Rust.
Students may ask when to use lifetimes. Rust borrows _always_ have lifetimes.
Most of the time, elision and type inference mean these don't need to be written
out. In more complicated cases, lifetime annotations can help resolve ambiguity.
Often, especially when prototyping, it's easier to just work with owned data by
cloning values where necessary.
</details>

View File

@ -0,0 +1,7 @@
# Solution
<!-- compile_fail because `mdbook test` does not allow use of `thiserror` -->
```rust,editable,compile_fail
{{#include exercise.rs:solution}}
```

View File

@ -0,0 +1,43 @@
---
minutes: 5
---
# Lifetimes in Data Structures
If a data type stores borrowed data, it must be annotated with a lifetime:
```rust,editable
#[derive(Debug)]
struct Highlight<'doc>(&'doc str);
fn erase(text: String) {
println!("Bye {text}!");
}
fn main() {
let text = String::from("The quick brown fox jumps over the lazy dog.");
let fox = Highlight(&text[4..19]);
let dog = Highlight(&text[35..43]);
// erase(text);
println!("{fox:?}");
println!("{dog:?}");
}
```
<details>
- In the above example, the annotation on `Highlight` enforces that the data
underlying the contained `&str` lives at least as long as any instance of
`Highlight` that uses that data.
- If `text` is consumed before the end of the lifetime of `fox` (or `dog`), the
borrow checker throws an error.
- Types with borrowed data force users to hold on to the original data. This can
be useful for creating lightweight views, but it generally makes them somewhat
harder to use.
- When possible, make data structures own their data directly.
- Some structs with multiple references inside can have more than one lifetime
annotation. This can be necessary if there is a need to describe lifetime
relationships between the references themselves, in addition to the lifetime
of the struct itself. Those are very advanced use cases.
</details>