diff --git a/src/type-inference.md b/src/type-inference.md index 0fc99252a..6e2032fee 100644 --- a/src/type-inference.md +++ b/src/type-inference.md @@ -1 +1,227 @@ # Type inference + +The type inference is based on standard HM-type inference, but +extended in various way to accommodate subtyping, region inference, +and higher-ranked types. + +## A note on terminology + +We use the notation `?T` to refer to inference variables, also called +existential variables. + +We use the term "region" and "lifetime" interchangeably. Both refer to +the `'a` in `&'a T`. + +The term "bound region" refers to regions bound in a function +signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is +"free" if it is not bound. + +## Creating an inference context + +You create and "enter" an inference context by doing something like +the following: + +```rust +tcx.infer_ctxt().enter(|infcx| { + // use the inference context `infcx` in here +}) +``` + +Each inference context creates a short-lived type arena to store the +fresh types and things that it will create, as described in +[the README in the ty module][ty-readme]. This arena is created by the `enter` +function and disposed after it returns. + +[ty-readme]: src/librustc/ty/README.md + +Within the closure, the infcx will have the type `InferCtxt<'cx, 'gcx, +'tcx>` for some fresh `'cx` and `'tcx` -- the latter corresponds to +the lifetime of this temporary arena, and the `'cx` is the lifetime of +the `InferCtxt` itself. (Again, see [that ty README][ty-readme] for +more details on this setup.) + +The `tcx.infer_ctxt` method actually returns a build, which means +there are some kinds of configuration you can do before the `infcx` is +created. See `InferCtxtBuilder` for more information. + +## Inference variables + +The main purpose of the inference context is to house a bunch of +**inference variables** -- these represent types or regions whose precise +value is not yet known, but will be uncovered as we perform type-checking. + +If you're familiar with the basic ideas of unification from H-M type +systems, or logic languages like Prolog, this is the same concept. If +you're not, you might want to read a tutorial on how H-M type +inference works, or perhaps this blog post on +[unification in the Chalk project]. + +[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/ + +All told, the inference context stores four kinds of inference variables as of this +writing: + +- Type variables, which come in three varieties: + - General type variables (the most common). These can be unified with any type. + - Integral type variables, which can only be unified with an integral type, and + arise from an integer literal expression like `22`. + - Float type variables, which can only be unified with a float type, and + arise from a float literal expression like `22.0`. +- Region variables, which represent lifetimes, and arise all over the dang place. + +All the type variables work in much the same way: you can create a new +type variable, and what you get is `Ty<'tcx>` representing an +unresolved type `?T`. Then later you can apply the various operations +that the inferencer supports, such as equality or subtyping, and it +will possibly **instantiate** (or **bind**) that `?T` to a specific +value as a result. + +The region variables work somewhat differently, and are described +below in a separate section. + +## Enforcing equality / subtyping + +The most basic operations you can perform in the type inferencer is +**equality**, which forces two types `T` and `U` to be the same. The +recommended way to add an equality constraint is using the `at` +method, roughly like so: + +``` +infcx.at(...).eq(t, u); +``` + +The first `at()` call provides a bit of context, i.e., why you are +doing this unification, and in what environment, and the `eq` method +performs the actual equality constraint. + +When you equate things, you force them to be precisely equal. Equating +returns a `InferResult` -- if it returns `Err(err)`, then equating +failed, and the enclosing `TypeError` will tell you what went wrong. + +The success case is perhaps more interesting. The "primary" return +type of `eq` is `()` -- that is, when it succeeds, it doesn't return a +value of any particular interest. Rather, it is executed for its +side-effects of constraining type variables and so forth. However, the +actual return type is not `()`, but rather `InferOk<()>`. The +`InferOk` type is used to carry extra trait obligations -- your job is +to ensure that these are fulfilled (typically by enrolling them in a +fulfillment context). See the [trait README] for more background here. + +[trait README]: ../traits/README.md + +You can also enforce subtyping through `infcx.at(..).sub(..)`. The same +basic concepts apply as above. + +## "Trying" equality + +Sometimes you would like to know if it is *possible* to equate two +types without error. You can test that with `infcx.can_eq` (or +`infcx.can_sub` for subtyping). If this returns `Ok`, then equality +is possible -- but in all cases, any side-effects are reversed. + +Be aware though that the success or failure of these methods is always +**modulo regions**. That is, two types `&'a u32` and `&'b u32` will +return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the +"two-phase" nature of how we solve region constraints. + +## Snapshots + +As described in the previous section on `can_eq`, often it is useful +to be able to do a series of operations and then roll back their +side-effects. This is done for various reasons: one of them is to be +able to backtrack, trying out multiple possibilities before settling +on which path to take. Another is in order to ensure that a series of +smaller changes take place atomically or not at all. + +To allow for this, the inference context supports a `snapshot` method. +When you call it, it will start recording changes that occur from the +operations you perform. When you are done, you can either invoke +`rollback_to`, which will undo those changes, or else `confirm`, which +will make the permanent. Snapshots can be nested as long as you follow +a stack-like discipline. + +Rather than use snapshots directly, it is often helpful to use the +methods like `commit_if_ok` or `probe` that encapsulate higher-level +patterns. + +## Subtyping obligations + +One thing worth discussing are subtyping obligations. When you force +two types to be a subtype, like `?T <: i32`, we can often convert those +into equality constraints. This follows from Rust's rather limited notion +of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`. + +However, in some cases we have to be more careful. For example, when +regions are involved. So if you have `?T <: &'a i32`, what we would do +is to first "generalize" `&'a i32` into a type with a region variable: +`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then +relate this new variable with the original bound: + + &'?b i32 <: &'a i32 + +This will result in a region constraint (see below) of `'?b: 'a`. + +One final interesting case is relating two unbound type variables, +like `?T <: ?U`. In that case, we can't make progress, so we enqueue +an obligation `Subtype(?T, ?U)` and return it via the `InferOk` +mechanism. You'll have to try again when more details about `?T` or +`?U` are known. + +## Region constraints + +Regions are inferred somewhat differently from types. Rather than +eagerly unifying things, we simply collect constraints as we go, but +make (almost) no attempt to solve regions. These constraints have the +form of an outlives constraint: + + 'a: 'b + +Actually the code tends to view them as a subregion relation, but it's the same +idea: + + 'b <= 'a + +(There are various other kinds of constriants, such as "verifys"; see +the `region_constraints` module for details.) + +There is one case where we do some amount of eager unification. If you have an equality constraint +between two regions + + 'a = 'b + +we will record that fact in a unification table. You can then use +`opportunistic_resolve_var` to convert `'b` to `'a` (or vice +versa). This is sometimes needed to ensure termination of fixed-point +algorithms. + +## Extracting region constraints + +Ultimately, region constraints are only solved at the very end of +type-checking, once all other constraints are known. There are two +ways to solve region constraints right now: lexical and +non-lexical. Eventually there will only be one. + +To solve **lexical** region constraints, you invoke +`resolve_regions_and_report_errors`. This will "close" the region +constraint process and invoke the `lexical_region_resolve` code. Once +this is done, any further attempt to equate or create a subtyping +relationship will yield an ICE. + +Non-lexical region constraints are not handled within the inference +context. Instead, the NLL solver (actually, the MIR type-checker) +invokes `take_and_reset_region_constraints` periodically. This +extracts all of the outlives constraints from the region solver, but +leaves the set of variables intact. This is used to get *just* the +region constraints that resulted from some particular point in the +program, since the NLL solver needs to know not just *what* regions +were subregions but *where*. Finally, the NLL solver invokes +`take_region_var_origins`, which "closes" the region constraint +process in the same way as normal solving. + +## Lexical region resolution + +Lexical region resolution is done by initially assigning each region +variable to an empty value. We then process each outlives constraint +repeatedly, growing region variables until a fixed-point is reached. +Region variables can be grown using a least-upper-bound relation on +the region lattice in a fairly straight-forward fashion.