Skip to content

Commit a450ede

Browse files
authored
blog post: recent & future pattern matching improvements (#529)
1 parent b48c507 commit a450ede

File tree

1 file changed

+236
-0
lines changed

1 file changed

+236
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
---
2+
layout: post
3+
title: "Recent and future pattern matching improvements"
4+
author: Mazdak "Centril" Farrokhzad
5+
description: "Reviewing recent pattern matching improvements"
6+
team: the language <https://www.rust-lang.org/governance/teams/lang> and compiler <https://www.rust-lang.org/governance/teams/compiler> teams
7+
---
8+
9+
[ch_6]: https://doc.rust-lang.org/book/ch06-00-enums.html
10+
[ch_18]: https://doc.rust-lang.org/book/ch18-00-patterns.html
11+
[ref_match]: https://doc.rust-lang.org/reference/expressions/match-expr.html
12+
[ref_pat]: https://doc.rust-lang.org/reference/patterns.html
13+
[ref_place]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions
14+
15+
Much of writing software revolves around checking if some data has some shape ("pattern"), extracting information from it, and then reacting if there was a match. To facilitate this, many modern languages, Rust included, support what is known as "pattern matching".
16+
17+
> If you are new to Rust or want to refresh your knowledge, you may first want to read chapters [6, Enums and Pattern Matching][ch_6] and [18, Patterns and Matching][ch_18] in the book, or read more about [`match` expressions][ref_match] and [patterns][ref_pat] in the reference.
18+
19+
Pattern matching in Rust works by checking if a [*place*][ref_place] in memory (the "data") matches a certain *pattern*. In this post, we will look at some recent improvements to patterns soon available in stable Rust as well as some more in already available in nightly.
20+
21+
If you are familiar with the nightly features discussed and would like to help out with the efforts to drive them to stable, jump ahead to [*How can I help?](#how-can-i-help?).
22+
23+
## Subslice patterns, `[head, tail @ ..]`
24+
25+
[fixed_slice]: https://blog.rust-lang.org/2018/05/10/Rust-1.26.html#basic-slice-patterns
26+
[recover_attrs_no_item]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.recover_attrs_no_item
27+
[pr_subslice]: https://github.com/rust-lang/rust/pull/67712
28+
29+
Lists are one of the most basic and common data structures found in software. In Rust, lists are usually a contiguous sequence of elements in memory, or a *slice*.
30+
31+
Since slices are so commonplace, it is important that working with them is easy. To that end, we stabilized [*fixed-length slice patterns* in Rust 1.26.0][fixed_slice]. So now it is possible to e.g., write `let [a, b, c] = my_array;` to destructure an array of 3 elements. Oftentimes, however, we're working with a slice of unknown length, so given only fixed-length slice patterns, we have to provide a fallback `match` arm with e.g. `_` as the pattern.
32+
33+
In Rust 1.42.0, [we are stabilizing *subslice patterns*][pr_subslice]. To introduce a subslice pattern, we use `..` which denotes a variable-length gap, matching as many elements as possible not matched by the patterns before and after the `..`. For example, in a parser, we would like to error when a list of attributes, `attrs`, is not followed by an item, [so we write][recover_attrs_no_item]:
34+
35+
```rust
36+
/// Recover if we parsed attributes and expected an item but there was none.
37+
fn recover_attrs_no_item(&mut self, attrs: &[Attribute]) -> PResult<'a, ()> {
38+
let (start, end) = match attrs {
39+
[] => return Ok(()),
40+
[x0] => (x0, x0),
41+
[x0, .., xn] => (x0, xn),
42+
};
43+
let msg = if end.is_doc_comment() {
44+
"expected item after doc comment"
45+
} else {
46+
"expected item after attributes"
47+
};
48+
let mut err = self.struct_span_err(end.span, msg);
49+
if end.is_doc_comment() {
50+
err.span_label(end.span, "this doc comment doesn't document anything");
51+
}
52+
if let [.., penultimate, _] = attrs {
53+
err.span_label(start.span.to(penultimate.span), "other attributes here");
54+
}
55+
Err(err)
56+
}
57+
```
58+
59+
Here we have two subslice patterns, the first one being `[x0, .., xn]`. In this case, the pattern binds `x0`, the first element, and `xn`, the last element, and ignores everything in the middle, matching a slice with at least two elements in total. Meanwhile, `[]` and `[x0]` match cases with fewer than two elements, so the compiler knows that we have covered all possibilities. In the latter case, we extract the `penultimate` element of the slice, which, as the name suggests, also requires that the slice has at least two elements.
60+
61+
We can also bind a subslice to a variable. For example, suppose we want to disallow `...` in all but the last parameter of a function. If so, we can write:
62+
63+
```rust
64+
match &*fn_decl.inputs {
65+
... // other arms
66+
[ps @ .., _] => {
67+
for Param { ty, span, .. } in ps {
68+
if let TyKind::CVarArgs = ty.kind {
69+
self.err_handler().span_err(
70+
*span,
71+
"`...` must be the last argument of a C-variadic function",
72+
);
73+
}
74+
}
75+
}
76+
}
77+
```
78+
79+
Here, `ps @ ..` will bind the initial elements of the slice to `ps` and ignore the last element.
80+
81+
After more than 7 years of baking in nightly, with many twists and turns, subslice patterns will finally be stable. To get here, we've had to redesign the feature, plug soundness holes in the borrow checker, and substantially refactor the exhaustiveness checker. For more on how we got here, [read the stabilization report][pr_subslice], [Thomas Hartmann's blog post][thomas_subslice], and stay tuned for the 1.42.0 release announcement on the 12th of March.
82+
83+
[thomas_subslice]: https://thomashartmann.dev/blog/feature(slice_patterns)/
84+
85+
## Nested OR-patterns
86+
87+
[tracking_or_pats]: https://github.com/rust-lang/rust/issues/54883
88+
89+
When pattern matching on an `enum`, the logic for some of the variants may be exactly the same. To avoid repeating ourselves, the `|` separator in `match`, `if let`, or `while let` expressions can be used to say that the branch should be taken if any of the `|`-separated patterns match. For example, we may write:
90+
91+
```rust
92+
// Any local node that may call something in its body block should be explored.
93+
fn should_explore(tcx: TyCtxt<'_>, hir_id: hir::HirId) -> bool {
94+
match tcx.hir().find(hir_id) {
95+
Some(Node::Item(..))
96+
| Some(Node::ImplItem(..))
97+
| Some(Node::ForeignItem(..))
98+
| Some(Node::TraitItem(..))
99+
| Some(Node::Variant(..))
100+
| Some(Node::AnonConst(..))
101+
| Some(Node::Pat(..)) => true,
102+
_ => false,
103+
}
104+
}
105+
```
106+
107+
This is serviceable, but `Some(_)` is still repeated several times. With [`#![feature(or_patterns)]`][tracking_or_pats], which recently became usable on nightly, this repetition can be avoided:
108+
109+
```rust
110+
// Any local node that may call something in its body block should be explored.
111+
fn should_explore(tcx: TyCtxt<'_>, hir_id: hir::HirId) -> bool {
112+
match tcx.hir().find(hir_id) {
113+
Some(
114+
Node::Item(..)
115+
| Node::ImplItem(..)
116+
| Node::ForeignItem(..)
117+
| Node::TraitItem(..)
118+
| Node::Variant(..)
119+
| Node::AnonConst(..)
120+
| Node::Pat(..),
121+
) => true,
122+
_ => false,
123+
}
124+
}
125+
```
126+
127+
Previously, when using `|` in a `match` expression, the `|` syntax was part of `match` itelf. With `or_patterns`, this is now part of patterns themselves, so you can nest OR-patterns arbitrarily, and use them in `let` statements too:
128+
129+
```rust
130+
let Ok(x) | Err(x) = foo();
131+
```
132+
133+
An OR-pattern covers the *union* of all the `|`-ed ("or-ed") patterns. To ensure that whatever alternative matched, all bindings are consistent and initialized, each or-ed pattern must include the exact same set of bindings, with the same types, and the same binding modes.
134+
135+
## Bindings after `@`
136+
137+
[#16053]: https://github.com/rust-lang/rust/pull/16053
138+
[MIR]: https://rust-lang.github.io/rustc-guide/mir/index.html
139+
[rip_ast_borrowck]: https://github.com/rust-lang/rust/pull/64790
140+
[tracking_at]: https://github.com/rust-lang/rust/issues/65490
141+
142+
When matching on a certain substructure, you sometimes want to hold on to the whole. For example, given `Some(Expr { .. })`, you would like to bind the outer `Some(_)` layer. In Rust, this can be done using e.g., `expr @ Some(Expr { .. })`, which binds the matched place to `expr` while also ensuring that it matches `Some(Expr { .. })`.
143+
144+
Suppose also that `Expr` has a field `span` that you would also use. In ancient times, that is before Rust 1.0, this was possible, but today, it results in an error:
145+
146+
```rust
147+
error[E0303]: pattern bindings are not allowed after an `@`
148+
--> src/lib.rs:L:C
149+
|
150+
L | bar @ Some(Expr { span }) => {}
151+
| ^^^^ not allowed after `@`
152+
```
153+
154+
This was turned into an error in [#16053], mainly due to the difficulties of encoding borrow checking rules in a sound way in the old AST based borrow checker.
155+
156+
Since then, we have [removed the old borrow checker][rip_ast_borrowck] in favor of one based on [MIR], which is a simpler, and more appropriate data structure for borrow checking. Specifically, in the case of a statement like `let ref x @ ref y = a;`, we would get roughly the same MIR as if we had used `let x = &a; let y = &a;`.
157+
158+
So now that having bindings to the right of `@` is handled uniformly and correctly by the borrow checker (e.g., the compiler won't allow `ref x @ ref mut y`), we have decided to allow them under [`#![feature(bindings_after_at)]`][tracking_at], now available on nightly. With the feature gate enabled, you may for example write:
159+
160+
```rust
161+
#![feature(bindings_after_at)]
162+
163+
fn main() {
164+
if let x @ Some(y) = Some(0) {
165+
dbg!(x, y);
166+
}
167+
}
168+
```
169+
170+
Our hope is that with providing this feature, we remove one surprising corner of the language.
171+
172+
## Combining by-move and by-`ref` bindings
173+
174+
[tracking_move_ref]: https://github.com/rust-lang/rust/pull/68376
175+
176+
For similar reasons as noted in the case of bindings after `@`, Rust does not currently allow you to combine normal by-move bindings with those that are by-`ref`. For example, should you write...:
177+
178+
```rust
179+
fn main() {
180+
let tup = ("foo".to_string(), 0);
181+
let (x, ref y) = tup;
182+
}
183+
```
184+
185+
... you would get an error:
186+
187+
```rust
188+
error[E0009]: cannot bind by-move and by-ref in the same pattern
189+
--> src/main.rs:3:10
190+
|
191+
3 | let (x, ref y) = tup;
192+
| ^ ----- by-ref pattern here
193+
| |
194+
| by-move pattern here
195+
```
196+
197+
At the same time, however, the compiler is perfectly happy to allow...:
198+
199+
```rust
200+
fn main() {
201+
let tup = ("foo".to_string(), 0);
202+
let x = tup.0;
203+
let ref y = tup.1;
204+
}
205+
```
206+
207+
... even though there is no semantic difference between these programs.
208+
209+
Now that we have moved to the new borrow checker, as outlined in the previous section, we have relaxed this restriction on nightly as well, so under [`#![feature(move_ref_pattern)]`][tracking_move_ref], you may write:
210+
211+
```rust
212+
#![feature(move_ref_pattern)]
213+
214+
fn main() {
215+
let tup = ("foo".to_string(), 0);
216+
let (x, ref y) = tup;
217+
}
218+
```
219+
220+
## How can I help?
221+
222+
[F-or_patterns]: https://github.com/rust-lang/rust/labels/F-or_patterns
223+
224+
To recap, we have three unstable features, all improving pattern matching in different ways:
225+
226+
- `#![feature(or_patterns)]`, which allows you to arbitrarily nest or-patterns e.g. `Some(Foo | Bar)`
227+
- `#![feature(bindings_after_at)]`, which allows e.g., `ref x @ Some(ref y)`
228+
- `#![feature(move_ref_pattern)]`, which allows e.g., `(x, ref y)` where `x` is by-move and `y` is by-reference
229+
230+
To help us transition these features over to stable Rust, we need your help to ensure that they meet the expected quality standards. To help out, consider:
231+
232+
- Using the features in your code where applicable, if a nightly compiler is something you are OK with, and reporting any bugs, problems, deficiencies in diagnostics, etc. as issues.
233+
- Looking through the reported issues under the feature gate labels (e.g., [`F-or_patterns`][F-or_patterns]) and seeing if you can help out with any of them.
234+
- In particular, if you can help out with writing tests, that is appreciated.
235+
236+
Thanks for reading, and happy pattern matching!

0 commit comments

Comments
 (0)