Skip to content

Commit 01ce3e9

Browse files
committed
Overhaul comments in collect_tokens_trailing_token.
Adding details, clarifying lots of little things, etc. In particular, the commit adds details of an example. I find this very helpful, because it's taken me a long time to understand how this code works.
1 parent d50da1e commit 01ce3e9

File tree

3 files changed

+129
-75
lines changed

3 files changed

+129
-75
lines changed

compiler/rustc_parse/src/parser/attr.rs

+5-8
Original file line numberDiff line numberDiff line change
@@ -303,13 +303,9 @@ impl<'a> Parser<'a> {
303303
None
304304
};
305305
if let Some(attr) = attr {
306-
// If we are currently capturing tokens, mark the location of this inner attribute.
307-
// If capturing ends up creating a `LazyAttrTokenStream`, we will include
308-
// this replace range with it, removing the inner attribute from the final
309-
// `AttrTokenStream`. Inner attributes are stored in the parsed AST note.
310-
// During macro expansion, they are selectively inserted back into the
311-
// token stream (the first inner attribute is removed each time we invoke the
312-
// corresponding macro).
306+
// If we are currently capturing tokens (i.e. we are within a call to
307+
// `Parser::collect_tokens_trailing_tokens`) record the token positions of this
308+
// inner attribute, for possible later processing in a `LazyAttrTokenStream`.
313309
if let Capturing::Yes = self.capture_state.capturing {
314310
let end_pos = self.num_bump_calls;
315311
let range = start_pos..end_pos;
@@ -463,7 +459,8 @@ impl<'a> Parser<'a> {
463459
}
464460
}
465461

466-
/// The attributes are complete if all attributes are either a doc comment or a builtin attribute other than `cfg_attr`
462+
/// The attributes are complete if all attributes are either a doc comment or a
463+
/// builtin attribute other than `cfg_attr`.
467464
pub fn is_complete(attrs: &[ast::Attribute]) -> bool {
468465
attrs.iter().all(|attr| {
469466
attr.is_doc_comment()

compiler/rustc_parse/src/parser/attr_wrapper.rs

+123-67
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,12 @@ use std::{iter, mem};
1717
///
1818
/// This wrapper prevents direct access to the underlying `ast::AttrVec`.
1919
/// Parsing code can only get access to the underlying attributes
20-
/// by passing an `AttrWrapper` to `collect_tokens_trailing_tokens`.
20+
/// by passing an `AttrWrapper` to `collect_tokens_trailing_token`.
2121
/// This makes it difficult to accidentally construct an AST node
2222
/// (which stores an `ast::AttrVec`) without first collecting tokens.
2323
///
2424
/// This struct has its own module, to ensure that the parser code
25-
/// cannot directly access the `attrs` field
25+
/// cannot directly access the `attrs` field.
2626
#[derive(Debug, Clone)]
2727
pub struct AttrWrapper {
2828
attrs: AttrVec,
@@ -76,14 +76,13 @@ fn has_cfg_or_cfg_attr(attrs: &[Attribute]) -> bool {
7676
})
7777
}
7878

79-
// Produces a `TokenStream` on-demand. Using `cursor_snapshot`
80-
// and `num_calls`, we can reconstruct the `TokenStream` seen
81-
// by the callback. This allows us to avoid producing a `TokenStream`
82-
// if it is never needed - for example, a captured `macro_rules!`
83-
// argument that is never passed to a proc macro.
84-
// In practice token stream creation happens rarely compared to
85-
// calls to `collect_tokens` (see some statistics in #78736),
86-
// so we are doing as little up-front work as possible.
79+
// From a value of this type we can reconstruct the `TokenStream` seen by the
80+
// `f` callback passed to a call to `Parser::collect_tokens_trailing_token`, by
81+
// replaying the getting of the tokens. This saves us producing a `TokenStream`
82+
// if it is never needed, e.g. a captured `macro_rules!` argument that is never
83+
// passed to a proc macro. In practice, token stream creation happens rarely
84+
// compared to calls to `collect_tokens` (see some statistics in #78736) so we
85+
// are doing as little up-front work as possible.
8786
//
8887
// This also makes `Parser` very cheap to clone, since
8988
// there is no intermediate collection buffer to clone.
@@ -163,44 +162,53 @@ impl ToAttrTokenStream for LazyAttrTokenStreamImpl {
163162
}
164163

165164
impl<'a> Parser<'a> {
166-
/// Records all tokens consumed by the provided callback,
167-
/// including the current token. These tokens are collected
168-
/// into a `LazyAttrTokenStream`, and returned along with the result
169-
/// of the callback.
165+
/// Parses code with `f`. If appropriate, it records the tokens (in
166+
/// `LazyAttrTokenStream` form) that were parsed in the result, accessible
167+
/// via the `HasTokens` trait.
170168
///
171169
/// The `attrs` passed in are in `AttrWrapper` form, which is opaque. The
172170
/// `AttrVec` within is passed to `f`. See the comment on `AttrWrapper` for
173171
/// details.
174172
///
175-
/// Note: If your callback consumes an opening delimiter
176-
/// (including the case where you call `collect_tokens`
177-
/// when the current token is an opening delimiter),
178-
/// you must also consume the corresponding closing delimiter.
173+
/// Note: If your callback consumes an opening delimiter (including the
174+
/// case where `self.token` is an opening delimiter on entry to this
175+
/// function), you must also consume the corresponding closing delimiter.
176+
/// E.g. you can consume `something ([{ }])` or `([{}])`, but not `([{}]`.
177+
/// This restriction isn't a problem in practice, because parsed AST items
178+
/// always have matching delimiters.
179179
///
180-
/// That is, you can consume
181-
/// `something ([{ }])` or `([{}])`, but not `([{}]`
182-
///
183-
/// This restriction shouldn't be an issue in practice,
184-
/// since this function is used to record the tokens for
185-
/// a parsed AST item, which always has matching delimiters.
180+
/// The following example code will be used to explain things in comments
181+
/// below. It has an outer attribute and an inner attribute. Parsing it
182+
/// involves two calls to this method, one of which is indirectly
183+
/// recursive.
184+
/// ```
185+
/// #[cfg_eval] // token pos
186+
/// mod m { // 0.. 3
187+
/// #[cfg_attr(linux, inline)] // 3..12
188+
/// fn g() { // 12..17
189+
/// #![cfg_attr(unix, cold)] // 17..27
190+
/// let _x = 3; // 27..32
191+
/// } // 32..33
192+
/// } // 33..34
193+
/// ```
186194
pub fn collect_tokens_trailing_token<R: HasAttrs + HasTokens>(
187195
&mut self,
188196
attrs: AttrWrapper,
189197
force_collect: ForceCollect,
190198
f: impl FnOnce(&mut Self, ast::AttrVec) -> PResult<'a, (R, TrailingToken)>,
191199
) -> PResult<'a, R> {
192-
// We only bail out when nothing could possibly observe the collected tokens:
193-
// 1. We cannot be force collecting tokens (since force-collecting requires tokens
194-
// by definition
200+
// Skip collection when nothing could observe the collected tokens, i.e.
201+
// all of the following conditions hold.
202+
// - We are not force collecting tokens (because force collection
203+
// requires tokens by definition).
195204
if matches!(force_collect, ForceCollect::No)
196-
// None of our outer attributes can require tokens (e.g. a proc-macro)
205+
// - None of our outer attributes require tokens.
197206
&& attrs.is_complete()
198-
// If our target supports custom inner attributes, then we cannot bail
199-
// out early, since we may need to capture tokens for a custom inner attribute
200-
// invocation.
207+
// - Our target doesn't support custom inner attributes (custom
208+
// inner attribute invocation might require token capturing).
201209
&& !R::SUPPORTS_CUSTOM_INNER_ATTRS
202-
// Never bail out early in `capture_cfg` mode, since there might be `#[cfg]`
203-
// or `#[cfg_attr]` attributes.
210+
// - We are not in `capture_cfg` mode (which requires tokens if
211+
// the parsed node has `#[cfg]` or `#[cfg_attr]` attributes).
204212
&& !self.capture_cfg
205213
{
206214
return Ok(f(self, attrs.attrs)?.0);
@@ -212,51 +220,62 @@ impl<'a> Parser<'a> {
212220
let has_outer_attrs = !attrs.attrs.is_empty();
213221
let replace_ranges_start = self.capture_state.replace_ranges.len();
214222

223+
// We set and restore `Capturing::Yes` on either side of the call to
224+
// `f`, so we can distinguish the outermost call to
225+
// `collect_tokens_trailing_token` (e.g. parsing `m` in the example
226+
// above) from any inner (indirectly recursive) calls (e.g. parsing `g`
227+
// in the example above). This distinction is used below and in
228+
// `Parser::parse_inner_attributes`.
215229
let (mut ret, trailing) = {
216230
let prev_capturing = mem::replace(&mut self.capture_state.capturing, Capturing::Yes);
217231
let ret_and_trailing = f(self, attrs.attrs);
218232
self.capture_state.capturing = prev_capturing;
219233
ret_and_trailing?
220234
};
221235

222-
// When we're not in `capture-cfg` mode, then bail out early if:
223-
// 1. Our target doesn't support tokens at all (e.g we're parsing an `NtIdent`)
224-
// so there's nothing for us to do.
225-
// 2. Our target already has tokens set (e.g. we've parsed something
226-
// like `#[my_attr] $item`). The actual parsing code takes care of
227-
// prepending any attributes to the nonterminal, so we don't need to
228-
// modify the already captured tokens.
229-
// Note that this check is independent of `force_collect`- if we already
230-
// have tokens, or can't even store them, then there's never a need to
231-
// force collection of new tokens.
236+
// When we're not in `capture_cfg` mode, then skip collecting and
237+
// return early if either of the following conditions hold.
238+
// - `None`: Our target doesn't support tokens at all (e.g. `NtIdent`).
239+
// - `Some(Some(_))`: Our target already has tokens set (e.g. we've
240+
// parsed something like `#[my_attr] $item`). The actual parsing code
241+
// takes care of prepending any attributes to the nonterminal, so we
242+
// don't need to modify the already captured tokens.
243+
//
244+
// Note that this check is independent of `force_collect`. There's no
245+
// need to collect tokens when we don't support tokens or already have
246+
// tokens.
232247
if !self.capture_cfg && matches!(ret.tokens_mut(), None | Some(Some(_))) {
233248
return Ok(ret);
234249
}
235250

236-
// This is very similar to the bail out check at the start of this function.
237-
// Now that we've parsed an AST node, we have more information available.
251+
// This is similar to the "skip collection" check at the start of this
252+
// function, but now that we've parsed an AST node we have more
253+
// information available. (If we return early here that means the
254+
// setup, such as cloning the token cursor, was unnecessary. That's
255+
// hard to avoid.)
256+
//
257+
// Skip collection when nothing could observe the collected tokens, i.e.
258+
// all of the following conditions hold.
259+
// - We are not force collecting tokens.
238260
if matches!(force_collect, ForceCollect::No)
239-
// We now have inner attributes available, so this check is more precise
240-
// than `attrs.is_complete()` at the start of the function.
241-
// As a result, we don't need to check `R::SUPPORTS_CUSTOM_INNER_ATTRS`
261+
// - None of our outer *or* inner attributes require tokens.
262+
// (`attrs` was just outer attributes, but `ret.attrs()` is outer
263+
// and inner attributes. That makes this check more precise than
264+
// `attrs.is_complete()` at the start of the function, and we can
265+
// skip the subsequent check on `R::SUPPORTS_CUSTOM_INNER_ATTRS`.
242266
&& crate::parser::attr::is_complete(ret.attrs())
243-
// Subtle: We call `has_cfg_or_cfg_attr` with the attrs from `ret`.
244-
// This ensures that we consider inner attributes (e.g. `#![cfg]`),
245-
// which require us to have tokens available
246-
// We also call `has_cfg_or_cfg_attr` at the beginning of this function,
247-
// but we only bail out if there's no possibility of inner attributes
248-
// (!R::SUPPORTS_CUSTOM_INNER_ATTRS)
249-
// We only capture about `#[cfg]` or `#[cfg_attr]` in `capture_cfg`
250-
// mode - during normal parsing, we don't need any special capturing
251-
// for those attributes, since they're builtin.
252-
&& !(self.capture_cfg && has_cfg_or_cfg_attr(ret.attrs()))
267+
// - We are not in `capture_cfg` mode, or we are but there are no
268+
// `#[cfg]` or `#[cfg_attr]` attributes. (During normal
269+
// non-`capture_cfg` parsing, we don't need any special capturing
270+
// for those attributes, because they're builtin.)
271+
&& (!self.capture_cfg || !has_cfg_or_cfg_attr(ret.attrs()))
253272
{
254273
return Ok(ret);
255274
}
256275

257276
let replace_ranges_end = self.capture_state.replace_ranges.len();
258277

259-
// Capture a trailing token if requested by the callback 'f'
278+
// Capture a trailing token if requested by `f`.
260279
let captured_trailing = match trailing {
261280
TrailingToken::None => false,
262281
TrailingToken::Gt => {
@@ -285,7 +304,10 @@ impl<'a> Parser<'a> {
285304

286305
let num_calls = end_pos - start_pos;
287306

288-
// Take the captured ranges for any inner attributes that we parsed.
307+
// Take the captured ranges for any inner attributes that we parsed in
308+
// `Parser::parse_inner_attributes`, and pair them in a `ReplaceRange`
309+
// with `None`, which means the relevant tokens will be removed. (More
310+
// details below.)
289311
let mut inner_attr_replace_ranges = Vec::new();
290312
for inner_attr in ret.attrs().iter().filter(|a| a.style == ast::AttrStyle::Inner) {
291313
if let Some(attr_range) = self.capture_state.inner_attr_ranges.remove(&inner_attr.id) {
@@ -301,9 +323,9 @@ impl<'a> Parser<'a> {
301323
if replace_ranges_start == replace_ranges_end && inner_attr_replace_ranges.is_empty() {
302324
Box::new([])
303325
} else {
304-
// Grab any replace ranges that occur *inside* the current AST node.
305-
// We will perform the actual replacement when we convert the `LazyAttrTokenStream`
306-
// to an `AttrTokenStream`.
326+
// Grab any replace ranges that occur *inside* the current AST node. We will
327+
// perform the actual replacement only when we convert the `LazyAttrTokenStream` to
328+
// an `AttrTokenStream`.
307329
self.capture_state.replace_ranges[replace_ranges_start..replace_ranges_end]
308330
.iter()
309331
.cloned()
@@ -312,6 +334,28 @@ impl<'a> Parser<'a> {
312334
.collect()
313335
};
314336

337+
// What is the status here when parsing the example code at the top of this method?
338+
//
339+
// When parsing `g`:
340+
// - `start_pos..end_pos` is `12..33` (`fn g { ... }`, excluding the outer attr).
341+
// - `inner_attr_replace_ranges` has one entry (`5..15`, when counting from `fn`), to
342+
// delete the inner attr's tokens.
343+
// - This entry is put into the lazy tokens for `g`, i.e. deleting the inner attr from
344+
// those tokens (if they get evaluated).
345+
// - Those lazy tokens are also put into an `AttrsTarget` that is appended to `self`'s
346+
// replace ranges at the bottom of this function, for processing when parsing `m`.
347+
// - `replace_ranges_start..replace_ranges_end` is empty.
348+
//
349+
// When parsing `m`:
350+
// - `start_pos..end_pos` is `0..34` (`mod m`, excluding the `#[cfg_eval]` attribute).
351+
// - `inner_attr_replace_ranges` is empty.
352+
// - `replace_range_start..replace_ranges_end` has two entries.
353+
// - One to delete the inner attribute (`17..27`), obtained when parsing `g` (see above).
354+
// - One `AttrsTarget` (the one from parsing `g`) to replace all of `g` (`3..33`,
355+
// including its outer attribute), with:
356+
// - `attrs`: includes the outer and the inner attr.
357+
// - `tokens`: lazy tokens for `g` (with its inner attr deleted).
358+
315359
let tokens = LazyAttrTokenStream::new(LazyAttrTokenStreamImpl {
316360
start_token,
317361
num_calls,
@@ -335,15 +379,27 @@ impl<'a> Parser<'a> {
335379
{
336380
assert!(!self.break_last_token, "Should not have unglued last token with cfg attr");
337381

338-
// Replace the entire AST node that we just parsed, including attributes, with
339-
// `target`. If this AST node is inside an item that has `#[derive]`, then this will
340-
// allow us to cfg-expand this AST node.
382+
// What is the status here when parsing the example code at the top of this method?
383+
//
384+
// When parsing `g`, we add two entries:
385+
// - The `start_pos..end_pos` (`3..33`) entry has a new `AttrsTarget` with:
386+
// - `attrs`: includes the outer and the inner attr.
387+
// - `tokens`: lazy tokens for `g` (with its inner attr deleted).
388+
// - `inner_attr_replace_ranges` contains the one entry to delete the inner attr's
389+
// tokens (`17..27`).
390+
//
391+
// When parsing `m`, we do nothing here.
392+
393+
// Set things up so that the entire AST node that we just parsed, including attributes,
394+
// will be replaced with `target` in the lazy token stream. This will allow us to
395+
// cfg-expand this AST node.
341396
let start_pos = if has_outer_attrs { attrs.start_pos } else { start_pos };
342397
let target = AttrsTarget { attrs: ret.attrs().iter().cloned().collect(), tokens };
343398
self.capture_state.replace_ranges.push((start_pos..end_pos, Some(target)));
344399
self.capture_state.replace_ranges.extend(inner_attr_replace_ranges);
345400
} else if matches!(self.capture_state.capturing, Capturing::No) {
346-
// Only clear the ranges once we've finished capturing entirely.
401+
// Only clear the ranges once we've finished capturing entirely, i.e. we've finished
402+
// the outermost call to this method.
347403
self.capture_state.replace_ranges.clear();
348404
self.capture_state.inner_attr_ranges.clear();
349405
}

compiler/rustc_parse/src/parser/mod.rs

+1
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ enum Capturing {
231231
Yes,
232232
}
233233

234+
// This state is used by `Parser::collect_tokens_trailing_token`.
234235
#[derive(Clone, Debug)]
235236
struct CaptureState {
236237
capturing: Capturing,

0 commit comments

Comments
 (0)