Why Rust needs scoped generics
Tags: rust programming memory | Written by Alex Smith, 2024-08-26
Introduction
Although most of the posts in this blog are about programming in C, more recently I've been using Rust for the bulk of my programming. For many years, there weren't any languages which were "better enough" than C to persuade me to switch (there were plenty that were slightly better, but switching has costs); but Rust has cleared that bar by some distance.
One particular attraction of Rust is the extent to which it is capable of proving, at compile time, that your program doesn't contain certain classes of bug. The highest-profile aspect of that is that it can entirely prevent many categories of security bugs (although, obviously, not all of them); but in some cases, it accomplishes that by replacing them with runtime crashes, which makes your programs more secure but not any more reliable. The more exciting aspect, to me, is the range of cases in which it not only prevents bugs from having a security impact, it proves that they won't happen at all, and often without any runtime overhead. Sound code is one thing – but it's even better to have code that's sound and panic-free, and yet still runs efficiently.
Rust already goes quite some distance to helping programs be reliable in this sense, which makes the places where it falls short feel like more of a disappointment, especially in places where I'd expect the language to be able to help. In particular, there's one problem that comes up in pretty much every program I try to write (and which therefore is probably coming up in programs that other people are writing, too), and which has a variety of negative effects: memory wasted storing the same values repetitively, which then ends up either requiring runtime checks that all the copies actually are the same even though that should be possible to determine at compile time, or omitting the checks and just hoping that you wrote the code correctly (knowing that a mistake wouldn't be caught by the compiler and would produce garbage results, and that the compiler might have to add checks anyway in order to prevent this leading to security bugs).
The problem is hard to define precisely, and I've dedicated a substantial portion of this blog post to it (although a summary would be "you have multiple values that need extra context to make sense of and/or use safely, but the context is, or at least is supposed to be, the same for all of them or for a subset that can be tracked at compile time"). It also has a range of possible solutions, none of which really work.
This blog post is about two main things. One is to explain the problem to the reader, along with the existing solutions and the reasons why none of them really work to solve it. The other is to explain how a relatively-easy-to-implement (and backwards-compatible) change to the Rust programming language could more or less eliminate the problem entirely, via adding a new language feature which I'm tentatively calling "scoped generics", and to try to convince the reader that implementing such a change would be worthwhile. Along the way, I cover the mysterious topic of "generativity" (which is a partial but incomplete solution to the problem, and which scoped generics would make much more ergonomic to use), recontextualise some of Rust's existing behaviour (it turns out that a lifetime is actually just a zero-sized scoped generic!), and end up discovering that the new feature would make it possible to (sort of) implement a standard library API entirely using safe code, even though it's one of those APIs for which that should be clearly impossible.
(A note about the intended audience: this post assumes that the reader already understands Rust – it's already long enough as it is, and I'm expecting most of the people who are interested in improvements to the language will already know the language I'm trying to improve.)
Defining the problem with some examples
Before I can start to explain a solution, I'll need to explain the problem I'm trying to solve. Because it's such a commonly occurring problem, the easy way to do that is to take examples where it actually occurred, either in code I've already written or in code I was considering writing. There were quite a lot of examples, but hopefully three examples will be enough to demonstrate the nature of the problem – these examples were chosen to be easy to understand (hopefully), and to illustrate a range of different aspects of the issue.
Example 1: arbitrary-precision integers in varying bases
One of my hobbies is studying "Turing tarpits" – programming languages that can in theory do the same computations as any program in any other language that we know how to implement, but that aren't very good at it. These languages can be interesting either because they are exceedingly simple (e.g. FRACTRAN or the I/D machine), or because they were discovered naturally in something not meant to be programmed (e.g. the rules and card pool of Magic: the Gathering form a Turing tarpit, as do the rules and card pool of Netrunner). To work out what a program in a Turing tarpit does, the obvious approach is to write an interpreter for the language with a few debugging features, and run the program in the interpreter to see what happens.
However, there's a problem: programs in Turing tarpits are sometimes incredibly inefficient, because the languages frequently don't have any sensible means of data storage. As a simple example, it's not uncommon for the easiest (or perhaps even the only viable) way to store a number n to be to represent it as a string with length 2n, and trying to execute a program that uses those sorts of numbers one instruction at a time is far too slow in practice. So instead, practically usable interpreters for Turing tarpits generally have to do optimisations, doing things like detecting simple loops within the program they're interpreting and working out what the effect of the loop as a whole would be rather than trying to interpret each iteration individually.
In one of my more recent experiments into Turing tarpits, I was working on analysing languages which had a tendency to store data by encoding it into very large numbers, produced by adding and subtracting small multiples of powers of a given base: one program might use numbers like 260318−11, whereas another might use numbers like 211−26−24. As such, I wanted to write an interpreter that could handle that sort of number efficiently, via storing a number as a small set of exponents and multipliers in a given base: the latter number could for example be represented as {(11, 1), (6, -1), (4, -1)} in base 2.
The obvious way to express this in Rust is to use a data structure that stores a sparse map from exponent to multiplier, along with the base. I came up with something that was somewhat similar to this (with a few irrelevant details elided for clarity):
pub struct SparseBigInt { base: i32, digits: BTreeMap<i64, i32> }
This definition worked, to the extent that it was possible to write a working
program using it (and I haven't revisited the program since, so it is still
defining SparseBigInt
with this sort of definition). However, it was clear
that something was off with the way that base
was being implemented.
The first problem came when I was trying to write the implementations of the
arithmetic instructions; the problem impacted all of them, but let's talk about
addition, because it's the first one I tried to implement. There's a pretty
simple, efficient algorithm for adding unbounded integers of this sort
(basically the same long-addition algorithm that most schoolchildren learn,
except that you can speed it up by skipping over places where both numbers have
a long run of zeroes, so the algorithm runs in linear time in the number of
nonzero digits). Unfortunately, that doesn't work when the two numbers have
different base
s (and the addition algorithm when the base
s differ is
incredibly slow by comparison, and would ruin the performance of my program if I
ever had to use it). Fortunately, this case was never going to come up in
practice: with the tarpits I was analysing, all the numbers would naturally tend
to be stored in the same base (which I could work out near the start of the
program, before performing any arithmetic); and even more fortunately, Add
and
AddAssign
are allowed to panic (they can do so even on normal data types like
i32
, if overflow checks are turned on). So all I had to do was to add a check
that the base
s were equal, and panic if they weren't.
Still, all this seemed a bit inelegant; every panic in your program is a warning
of a place where bugs might exist (because it indicates both "this is something
that, if it happened, indicates a bug in the program", and "it was impossible
for the compiler to verify that this never happens" – the former is inherent in
the meaning of a panic, and in the latter case the panic wouldn't be required).
I am fairly sure that no base mismatches exist in the program that I wrote, but
I am planning to reuse SparseBigInt
for the analysis of other Turing tarpits
in the future, and the requirement for the bases to match is the sort of thing
that could easily catch me out then – it isn't documented in any
machine-readable way, just in the implementation and in doc comments.
(Looking back at my code, I did in fact end up implementing the inefficient-by-comparison algorithm for performing arithmetic on numbers in different bases, just so that the arithmetic operators worked in all cases! This still ends up creating a potential performance pitfall, though, if it ever ends up happening, even though there is no longer a correctness pitfall. It might also have meant that the program would now contain quite a bit of code that is "supposed to be dead" and run only by the test suite; fortunately, the algorithm in question ended up being needed for an unrelated purpose, so the time spent writing the algorithm wasn't wasted, but there's no reason to expect that sort of luck to happen in similar cases in the future.)
This wasn't the only way in which my existing solution ended up inelegant. When
writing a new type in Rust, it's usual to implement as many of the standard
traits on it as possible, because this increases the chance that an operation
you need will have been implemented in the standard library already, saving you
from having to write your own. As a numeric type capable of representing 0,
there is an obvious and useful default value for a SparseBigInt
, so I
naturally tried to implement Default
on it. Unfortunately, this wasn't
possible: the default digits
is simple enough (an empty map), but what should
the default base
be? The only valid option I could think of at the time was
"use some invalid sentinel value for base
, then when performing an arithmetic
operation on the number that makes it non-zero, change its base to match that of
the other number" – but that would slow down every arithmetic operation in the
program (due to the need to check for the sentinel value), and that wasn't worth
it simply to make Default
work. ("Change the base of zero to match", which I
thought of only while writing this blog post, would also work, but would have
similar downsides.) So instead, I ended up going without.
I did end up finishing the program in question; it helped me solve the question
about Turing tarpits that I was trying to solve in the first place, so it did
its job correctly. But it still left me wondering whether something was wrong
with the way that I'd implemented base
– even though my existing
implementation seemed like it was the only available option.
Example 2: indexes that are used to represent something else
Recently, I've been interested in the problem of "reachability": given a large number of statements of the form "you can go from A to B", "you can go from B to D", etc., the goal is to determine whether it's possible to go from some given point to some other given point, possibly indirectly via a chain of other points. (The connections are one-way, so being able to go from A to B doesn't necessarily mean you can go from B to A.) It's pretty easy to do a single reachability query "efficiently" via spending time proportional to the number of statements, but in the situations where I need to solve this problem, I need to do a large number of reachability queries on a single reachability graph, and doing that efficiently seems to still be an unsolved problem of algorithmic graph theory (and which I don't yet have a solution for that I'm happy with).
Anyway, although for the reachability problem in general I'm still trying to find a good algorithm, I've already started implementing various easier special cases (e.g. it's possible to efficiently create a data structure that can efficiently answer queries like "is it possible to go from A to B and back again, and if so, what path makes that possible?" – only having to handle circular paths makes the algorithm much simpler). The data structure in question basically contains a map, where the keys are the various points, and the values are information about the corresponding point (which includes, among other things, references to other points).
How to implement something like this in Rust? Well, given that for any given data structure the graph it corresponds to is fixed, the obvious solution is to represent the points as integers from 0 inclusive up to the number of points exclusive – that makes it possible to implement the map efficiently by using an array-like object, and it's possible to store a reference to a point in the map values via just storing the appropriate integer (thus avoiding issues with circular references, which are generally very difficult to handle in Rust). A simplified version of the data structure might look like this:
struct PointInformation { // all of these `usize`s are point indexes root: usize, path_to_root: usize, path_from_root: usize, } pub struct ReachabilityData { point_info: Box<[PointInformation]> }
Just like in the previous example, this worked, to the extent that I was able to create a working program (to solve the special case of finding paths from A to B and back to A); and just like the previous example, there were various issues that made the solution feel unsatisfying.
The first issue is that usize
is not very type-safe as types go. A usize
,
in the abstract, could mean more or less anything. This means that if a typo
causes the wrong variable to be used – a different usize
with an entirely
different meaning, perhaps – it won't be caught by the type system. There are a
couple of solutions in Rust that can help to ease this issue; the most common is
probably to use a wrapper type, typically implemented with the same
representation as usize
, but treated by the compiler as incompatible with it:
pub struct PointIndex(usize);
That solution prevents point indexes from being confused with other uses of
usize
. Unfortunately, I found myself needing to store data for multiple
different reachability problems simultaneously, and this definition of the
PointIndex
type makes it global – although it confirms that a value is a point
index for some reachability problem, it doesn't confirm that the value is a
point index for the specific reachability problem whose solution is being
looked up. So this solution would still fail to detect situations in which I
accidentally passed a point index from one reachability problem to a data
structure designed for a different problem; and doing that would not even cause
a panic, it would just silently return meaningless results. As a result, this
would still produce an API in which a fairly straightforward thinko or typo
could lead to a hard-to-detect bug (which might not necessarily even be caught
in testing, depending on whether the two reachability data structures happened
to return the same value for the indexes that were tested).
Another, possibly larger, issue is related to the fact that the point indexes
are used as indexes. This means that basically any use of them is going to end
up indexing into an array (such as point_info
in the example above) – and that
in turn means that it needs to be in bounds. If the point indexes are used
correctly – i.e. all the point indexes given as function arguments, and all the
point indexes stored within the PointInformation
, are actually point indexes
for the appropriate reachability problem – then they will of course always be in
bounds. However, the compiler doesn't know that, and will insert bounds checks
on every use of an index. The bounds checks don't cost all that much time (an
error-handling branch that is never taken will be very fast on modern
processors, because they speculatively run the success case in parallel with
checking whether the error case has occurred). However, they will still take
some amount of time and cause some instruction cache pressure, so it would be
better if the checks weren't necessary. It also isn't possible to use unsafe
code in order to skip the bounds checks without the reachability library as a
whole becoming unsound, because due to the previous problem, it isn't possible
to guarantee that the indexing will always be inbounds – someone might have
passed a point index from a different reachability problem as an argument, and
then it genuinely could be out of bounds.
These two problems together mean that the documentation for methods on
ReachabilityData
tends to get really cluttered; pretty much every method needs
to have a disclaimer like this:
Panics
If the given point index does not belong to this reachability problem, this method may return incorrect results or panic.
And that is very close to saying "this method has undefined behaviour" – it isn't fully undefined because it can't violate memory safety, but that just means that the consequences of a bug will be smaller, it doesn't do anything to prevent the bug happening in the first place.
A separate issue arose in the same program when I started writing display code.
For efficiency and implementation convenience, ReachabilityData
and
PointInformation
use point indexes: but the end user isn't thinking in terms
of point indexes, but in terms of the actual points that they specified.
Suppose that the code has found a path from A to B back to A, and want to
display it to the user. One obvious solution is to have some sort of object
that represents the path, with a Display
implementation that displays the path
– and that in turn would logically Display
each point on the path. However,
if the points are stored as point indexes, then Display
won't have enough
context to know what to print: Display::fmt
has no arguments other than self
and a formatter (which doesn't help), so when displaying, e.g., a
PointIndex(5)
, then Display::fmt
has no idea what that corresponds to from
the user's point of view.
One solution that works for Display
(and that I have used in cases like this
before) is to have a structure that groups together a point index, and a
reference to a structure that describes which points exist and what their names
are. Perhaps something like this:
struct PointIndexDisplay<'a> { point_names: &'a [String], point_index: usize, }
This creates an intriguing possibility for solving the API problems described
earlier in this section: it might be possible for the various methods on
ReachabilityData
to take a (suitably named) PointIndexDisplay
rather than a
PointIndex
or usize
. The idea is that each problem would correspond to a
unique set of points, and thus have a different point_names
reference. If the
point_names
reference were also stored as a field in ReachabilityData
, it
could check at runtime whether or not a method argument corresponded to the
correct reachability problem by comparing its point_names
to the stored
point_names
(to see whether they are slices with the same start address and
same length). This could be augmented by checking to see if the index were
inbound in the PointIndexDisplay
constructor, refusing to construct an object
with an out-of-bounds index; that would then guarantee that the references into
point_info
were always in-bounds, even without a separate bounds check.
Does this technique solve all the problems? Sadly, no. Even if it makes it
possible to save a bounds check, it's just been replaced with a check that the
point_names
are equal, so there's no actual saving there. Additionally, it
doesn't really make sense to store PointIndexDisplay
values within the
PointInformation
; it would technically work, but would cause huge memory
bloat, because every single index would be accompanied by a slice reference
(typically 16 bytes on modern computers), and all of those references would be
the same, so it would be wasting a large amount of memory to store a lot of
repetitive data. Thus, although the technique is somewhat helpful for catching
mistakes (messing up the point index now gives you a deterministic panic rather
than potentially silently producing the wrong result), and makes a Display
implementation possible, it can't be used more pervasively because the
comparisons make it too inefficient, and the extra error detection happens at
runtime (whereas it's preferable to catch bugs at compile time if possible).
Still, it seems like there might be a seed of a good idea here, even if this alternative solution is also unsatisfying.
Example 3: local allocators
Here's an example from some code that I haven't written – but which lots of people want to write.
Some programs need to allocate a lot of short-lived data on the heap. In the general case, heap allocations have a lot of overhead; if data is being allocated and deallocated frequently, this can often be a major performance bottleneck.
One potential solution to the problem is to use a garbage collector (because garbage collectors can be optimized to have pretty good performance in this scenario), but those can be hard to integrate into programs written in systems programming languages like Rust. An alternative solution, commonly seen in games development, is the use of local allocators that allow for more optimized allocations of objects that exist only within a particular scope (such as the body of a particular function or method); as these only have to handle a limited number of allocation points, they can be specialised to work as fast as possible for the allocation patterns actually used with those objects via sacrificing functionality that happens not to be needed. (As a simple example, you can imagine an allocator operating under the conditions "all the objects allocated by this allocator will have the same type, and all of them will be deallocated simultaneously"; such an allocator can be orders of magnitude simpler and faster than a general-purpose allocator, because the calculations of where to allocate each object are incredibly simple and require only very minimal bookkeeping.)
Let's think about what a local allocator would have to do in order to interoperate with normal-looking Rust code. The main tasks of an allocator are allocating memory and deallocating memory, and the main problem with this is a visibility problem: the code that interacts with the allocator needs to actually be able to reference the allocator, in order to ask it for memory, and in order to let it know when it can reclaim the memory from an allocated object.
Allocation is very simple: it's possible to just pass a reference to the allocator around to every function/method that needs it (even though conceptually a mutable reference is needed, Rust "reborrows" the reference with every call to prevent it being moved into the function being called). This can be a little obnoxious sometimes, because all the functions will have the same parameter and all the calls will have to mention it, but isn't particularly problematic (and the same problem can come up in situations mostly unrelated to the situation discussed in this blog post – I may end up writing more about this in future). So this side of the problem has a "sufficiently good" solution already, and isn't really blocking the writing of any code. The solution would look something like this:
// the allocator exists in this function's scope fn function_that_uses_a_local_allocator() { // overly simple allocator creation, but just assume this works for now let mut allocator = MyLocalAllocator::new(); do_something(&mut allocator, /* other args */); } fn do_something( allocator: &mut MyLocalAllocator<'_>, /* other args */) { /* ... */ // sugar for: do_something_else(&mut *allocator, /* other args */); do_something_else(allocator, /* other args */); /* ... */ } fn do_something_else( allocator: &mut MyLocalAllocator<'_>, /* other args */) { /* ... */ // Box::new_in is a real function, but doesn't actually work like this let my_object = Box::new_in(/* value to box */, allocator); /* ... */ }
Unfortunately, attempts to use the same solution for deallocation run into trouble. It's conceptually very appealing to have deallocation mirror allocation:
let my_object = Box::new_in(/* value to box */, allocator); /* ... */ // I made this method up; Rust doesn't have it for reasons explained below Box::drop_in(my_object, allocator);
Something like this is actually seen in practice sometimes in languages like C. Unfortunately, it has a tendency to lead to security holes; although it's perfectly secure if written correctly, in practice programs that use multiple allocators often seem to end up accidentally deallocating things with the wrong allocator, which is generally considered undefined behaviour and frequently a security bug. If allocators in Rust worked like that, you would have a soundness hole there, too:
fn main() { let mut allocator1 = MyLocalAllocator::new(); let mut allocator2 = MyLocalAllocator::new(); wrong(&mut allocator1, &mut allocator2); } fn wrong<'a, 'b>(allocator1: &'a mut MyLocalAllocator<'b>, allocator2: &'a mut MyLocalAllocator<'b>) { let my_object = Box::new_in(0u64, allocator1); Box::drop_in(my_object, allocator2); /* oops */ }
The type system has unfortunately not been able to stop the attempt to
deallocate the object using the wrong allocator, because there were two
allocators with identical types (allocator1
and allocator2
are alive for the
same lifetime, and the references to them also have the same lifetime) — there
are a lot of things that can be checked by the type system (e.g. that
my_object
was allocated before it gets dropped, and isn't used afterwards),
but "this is the right allocator" is not one of them.
Even ignoring that problem (e.g. by making Box::drop_in
unsafe
and leaving
it up to the programmer to manually verify that they're using the right
allocator), there is a second problem. Dropping objects manually isn't very
common in Rust (although you can do it); normally, objects are dropped only in a
destructor. What happens when you try to write one of those? Things start to
go wrong even in the very simplest case, a struct
that contains just one
Box
, and which tries to free it in the destructor:
struct BoxedError<'a> { my_field: Box<dyn Error, MyLocalAllocator<'a>> } impl<'a> Drop for BoxedError<'a> { fn drop(&mut self) { Box::drop_in(self.my_field, /* ??? */); } }
The technique of passing around the allocator to every function and method that
needs it works by adding an extra parameter to all those methods, but
unfortunately, this isn't very compatible with standard traits like Drop
,
which have a fixed parameter list.
All this means that in practice, an alternative implementation is needed:
instead of the allocator being known by every function that deallocates, it's
tracked by every object that it allocates. This makes deallocation very easy,
in destructors or otherwise: just ask the object which allocator allocated it,
and then tell that allocator to deallocate it. For example, the current
unstable implementation of non-global allocators in Rust uses this approach: if
you call Box::new_in
in order to create a Box
using a specific allocator,
the Box
will store both a reference to the allocation, and the allocator
itself, with <Box as Drop>::drop
using the stored allocator to do the
deallocation.
Unfortunately, this alternative approach, although it technically works, has at
least two problems which mean that it isn't particularly usable in practice for
local allocators. The more obvious is a performance issue: the primary reason
to use a local allocator in the first place is for the performance gains, but if
the allocator needs to be stored alongside every object it allocates, that can
use up a substantial amount of extra memory. (The "natural" way to represent
the allocator would be as a reference to memory describing it; but that ends up
doubling the size of Box
, because it now contains two references rather than
one. In some programs, it's not uncommon for data structures to be made almost
entirely of Box
es, so this could double the memory usage of the program.) All
that extra memory (and all the extra time spent reading and writing it) is
conceptually not really required in this case: it isn't conceptually too
difficult to track which allocator to use for deallocations, with the stored
allocator being useful only as a) a check that it's the right allocator / to
catch situations where the wrong allocator is used by mistake, and b) a way to
access the allocator in unusual contexts, such as from inside a destructor.
The less obvious problem is to do with verifying that the allocator itself is
sound. Conceptually, allocations and deallocations mutate the allocator
(because it needs to change its internal state to remember what memory has been
allocated), and the natural/standard way to do that in Rust is to make the
allocations and deallocations take a mutable/unique reference to the allocator.
But that conflicts with the notion of "store the allocator in every Box
it
allocates"; you can't store the allocator itself because it's stateful, so you
need to store a reference to the allocator but in Rust you can't have two
references to the same thing if either of them are mutable, making it unclear
what exactly is supposed to be stored. Rust's actual implementation of
Box::new_in
contains a workaround for the problem, which leads to calls to it
looking pretty weird:
// Allocator is implemented on a shared reference, not the allocator itself impl<'a> Allocator for &'a MyLocalAllocator<'a> { /* ... */ } // the allocator exists in this function's scope fn function_that_uses_a_local_allocator() { let mut allocator = MyLocalAllocator::new(); do_something(&allocator, /* other args */); } // do_something could be given a specific type, // but this generic type is probably more instructive fn do_something<'a, T>(allocator: &'a T, /* other args */) where &'a T: Allocator { /* ... */ // Box::new_in actually works like this let my_object = Box::new_in(/* value to box */, allocator); /* ... */ }
In other words, the value that's used to allocate and deallocate (and implements
Allocator
and is stored in the Box
) is just a shared reference to the real
allocator. (It doesn't technically have to be a literal shared reference, just
an object that acts the same way, e.g. it could be a zero-sized type that acts
like a "reference" to a global variable. But if it doesn't implement Copy
, it
gets moved into the Box
and the first allocation thus prevents any further use
of the allocator, so in practice it has to be shareable.) This makes the
Allocator
trait very awkward to implement correctly, because there are no
compile-time checks for things like "the allocator isn't being used
recursively", which in turn means that the allocator is not given unique access
to any of its own internal state or to the memory that it's trying to allocate.
This problem is solvable even in safe code, by using Cell
, but then the code
will end up running checks at runtime to ensure it isn't being used recursively
(all the safe variants of Cell
prevent the value inside being
double-mutably-borrowed via one means or another, but it generally involves
panicking or returning unusable dummy values rather than rejecting attempts at
compile time), so you end up with an allocator full of potential panic points
that the compiler can't prove aren't necessary.
In summary, local allocators just don't really work very well in current Rust. The efficient solution, of passing the allocator as an argument to every function that needs to interact with it, using it for both allocation and deallocation, can't obviously be checked for soundness by the compiler and doesn't mix well with destructors, which is particularly frustrating because it almost works and would produce pretty efficient code if it did. The alternative solution, of storing the allocator in every object, is inefficient and creates so many references to the allocator that it becomes hard for the compiler to prove that the allocator itself is implemented correctly, which tends to mean runtime checks (or unsafe code) and even slower performance. In order to prevent the idea of using local allocators to make Rust programs more performant turning into a dead end, some other solution is needed.
A survey of solutions
Hopefully, the examples above are enough to get an idea of the sort of situation
that Rust has problems representing at the moment: there is some type of
information that's needed to make sense of an object and/or to use it safely
(the base, the list of which indexes correspond to which points, the allocator
used to allocate it), and that information needs to be the same across a range
of related objects (so that they can be added efficiently, or to ensure the
correct array is referenced, or that the correct allocator is used to deallocate
them). Often it becomes difficult to correctly implement a trait (the first
example struggles with Default
, the second Display
, the third Drop
).
In this section, I'd like to look at some potential solutions, none of which are ideal in the general case – but it's instructive to see the ways in which they fail, and will help to narrow down the problem and to be able to precisely state the requirements on the solution. (One possible solution has been omitted intentionally; it's covered later in this blog post. Let me know if I've missed any others.) I'll start with the two most obvious solutions – both of which were attempted at least twice in the discussion above – and then move on to some less obvious possibilities.
Incomplete objects
The approach that is probably most obvious to a programmer (such as me!) who
learned systems programming in low-level languages like C is what I'll call the
"incomplete object" approach. In this approach, the information that's common
to a range of objects is not stored within any of the objects themselves;
rather, it's stored separately by the code that uses those objects. Examples
include the PointIndex
in the second example (which represents a point, but is
missing the information on what indexes represent which points), and the memory
returned by the allocator in the third example (which needs to be deallocated by
the same allocator, but which doesn't itself store information on what allocator
that is).
This approach is generally pretty efficient, as the information only needs to be
stored once for the entire set of objects. It complicates the signatures of
methods that use those objects; that's a downside when using a single object,
but it can turn into an advantage when two objects need to have the same
information to be used together. For example, if the SparseBigInt
objects
from the first example were represented in this style via omitting the base
,
the resulting incomplete object would just be a digit list, and an addition
function would look like this:
fn add_bigint(x: DigitList, y: DigitList, base: i32) -> DigitList { ... }
The signature makes it very clear that two integers can be added only if they're in the same base: it take two different digit lists, but the base only once.
Unfortunately, this approach has two big downsides. One of them is that it has no way to prevent the common information accidentally being specified incorrectly. Depending on the context, this could produce incorrect results (e.g. interpreting the digits of an integer as the wrong base), or even be a soundness hole (e.g. indexing beyond the end of an array, or deallocating an object with the wrong allocator).
The other downside is that it makes it very hard to implement traits. In my
first example above, I didn't use this approach; if I had tried, I would have
discovered that I couldn't implement Add
(because trying to add one
DigitList
to another using +
would discover that there was nowhere to
specify the base, which is information required to do carries correctly). In
the second example, Display
couldn't be implemented on the incomplete
PointIndex
objects, and I needed to resort to a separate PointIndexDisplay
;
and incomplete objects can't usefully implement Drop
, which was a major
problem in the third example.
Complete objects
The opposite approach is to store the common information in every object that
needs it: a full SparseBigInt
rather than a DigitList
, a PointIndexDisplay
rather than a PointIndex
, Rust's current unstable Box
that stores the
allocator. This can be considered something of a "brute force" solution to
visibility problems: in order to ensure that something is always visible, just
store it, or a reference to it, in everything that might possibly need it!
With SparseBigInt
, this approach doesn't struggle too badly. The common
information, the base
(an i32
), is pretty easy to store: it doesn't use up
very much memory, it's Copy
so there are no lifetime problems, and it doesn't
change over time so there are no issues with ensuring that the correct version
is stored. Even so, the problem with the huge proliferation of copies of
base
is that when there are multiple sources of truth for a particular value,
it's generally only possible to ensure correctness by proving that they must all
be the same. Adding two SparseBigInt
s involves checking that they have the
same base
, and this approach makes it impossible to do that at compile time.
(There is an interesting symmetry with the previous approach: there, base
was
specified once, causing problems if it were specified incorrectly but statically
guaranteeing that there's only one value; here, the way base
is stored means
that it cannot be specified incorrectly, but there is still no proof that the
two base
values match and things go wrong if they don't.)
Unfortunately, nothing is guaranteeing that the common information will be
small, or Copy
, or even cloneable. In the second example, the point_names
is a shared reference to a list of strings – and that was a cut-down example,
and in practice the common information in that sort of program can often be more
complex than that. In the third example, the common information is conceptually
a mutable reference to an allocator, and storing two of those is known to lead
to trouble; Rust's current unstable allocator implementation has to work around
the problem by representing it with a type with significantly different
semantics, which in turn means that the compiler has trouble reasoning about the
resulting allocator and can't prove very many safety properties. When using
incomplete objects, none of this is a problem – you simply use one copy of the
common information and borrow it as necessary. But it can create somewhat
insurmountable problems here.
As seen in the first example above, this approach also doesn't play well with
traits that contain methods that construct new objects, such as Default
and
From
; for most traits, it can expose the common information to the trait
implementation via the self
parameter, but for methods that don't have one,
the visibility problems remain.
Constant common information
Here's an approach suitable for people who don't mind recompiling their code a lot – and one that I've commonly seen used in practice in C code (and which is used even more pervasively in older programming languages like FORTRAN). It isn't very general, but when it does work, it solves all the problems on its own.
The idea is that sometimes, only one piece of common information is needed
throughout the entire program. For example, when I was analysing Turing
tarpits, I generally only needed one base for all the SparseBigInt
s in my
program (e.g. the analysis I was most interested in used base 260). In the case
where all that's needed is to be able to run the program on one single input,
why not just make the common information a constant (e.g. in Rust, either a
const
or an immutable static
)? This has lots of advantages:
No per-object overhead, because the constant is compiled into the code rather than stored in individual objects;
No syntactic overhead when calling functions, because the constants used by a function don't have to be mentioned in its signature;
Because there's only the one constant, and it's always the same, it's impossible for the common information to differ between two different objects or between two uses of the same object, avoiding the potential unsoundnesses mentioned in the previous sections;
Constants are (or at least can be) globally visible, so they're accessible even inside implementations of standard traits.
The obvious disadvantages are a) you can only have one piece of common information for the whole program, and b) you have to recompile the program whenever it changes. Still, if you're writing a program for yourself, rather than writing a library, or writing a program that you plan to distribute for general use in a range of contexts, the latter isn't as much of a problem as it seems.
Interestingly, this approach comes up in Rust itself on occasion. In the allocator example above, I was primarily discussing the current state of the unstable allocator API – but there's a stable allocator API too. It looks like this:
#[global_allocator] static GLOBAL_ALLOCATOR: MyAllocator = /* ... */;
In order to make sure that there is no unsoundness with allocator mismatches,
but without spending bytes storing references to the allocator, stable Rust
simply has one constant allocator for the whole program: whenever you use Box
,
Vec
and friends, that's the allocator that's going to allocate the memory, and
whenever you drop them, that's the allocator to which the memory is returned.
Somewhat simplistic, but it's powering the vast majority of production-ready
Rust code at the moment.
Storing the common information in a global variable
The previous solution used a compile-time constant, but it's possible to do something very similar using a mutable global variable instead. The basic idea is to initialise the global variable with the common information before any of the relevant objects are created, and then ensure that it keeps that value for as long as the objects are still alive.
In cases where only one piece of common information is needed for the lifetime
of the program, such as was the case for me with the SparseBigInt
example,
this can be done pretty simply using cell::OnceCell
. There are two soundness
requirements (that the global variable is initialised in time, and that it keeps
its value for long enough); and OnceCell
trivially satisfies the second
requirement (because it keeps its value forever). Unfortunately, the first
requirement leads to some inefficiency, because the compiler cannot statically
verify that the global variable is initialized before the first object is
created, and the obvious implementation will end up instead checking whether the
common information has been initialized every time it is requested. (As far as
I can tell, the use of unsafe
code would be the only way to eliminate this
particular overhead entirely.)
Using global variables for the common information might also be possible in cases where it changes (but you would need one global variable for each piece of common information that might exist at a time; that implies in particular that this could not work with recursive code, which might have arbitrarily many nested "common information lifetimes"). It seems hard to get much help from the compiler when checking that the globals are initialized with the right value and stay alive for long enough (Rust doesn't have much in the way of compile-time checks of the status of global or interior-mutable variables). On the other hand, I am not convinced that it's impossible: consider a scheme like this:
Each global variable in question has a corresponding zero-sized type, which behaves like a "reference to the global variable"; it has a lifetime parameter that specifies how long the reference is valid for, and implements a trait method (perhaps even
Deref
) that accesses the global variable (without needing to store a reference in it, because the reference is stored in the code of the method that retrieves it);The zero-sized types can only be constructed by "locking and writing a value to" the global variable, ensuring that it is always initialized, and are given a lifetime that ensures that they cannot exist past the point at which it is unlocked.
Because the types are zero-sized, they can be included as a field in the object – in the style of the "complete objects" solution above – but without using up extra memory; and because they are types, it's possible to verify at compile time that two objects must be using the same common information if they have the same type.
This seems like, although it might need unsafe
code to implement the locking
scheme, it might be able to create a sound and safe interface for the code that
uses it – this could possibly be a good basis on which to write a new library
crate! It would be quite different from typical code, though, with these
zero-sized types behaving in quite an unusual way; the closest thing that exists
in Rust at the moment is &sync::RwLockReadGuard
, but that isn't zero-sized,
and a zero-sized version (that relied on hardcoded locations for storing its
data) would probably need to be written from scratch.
It also doesn't solve all the problems with the "complete objects" solution, and
in fact makes one worse; that solution had trouble with using mutable references
as the common information, whereas this one struggles with using even shared
references (storing references in global variables is generally really difficult
because they generally have to be 'static
). This is a serious downside, given
that the common information frequently turns out to be a reference in practice.
Everything mentioned in this section could work with thread-local variables too, but it isn't clear that they have any significant advantages over global variables for the purpose.
Const generics
The zero-sized types in the previous example point to another possible solution:
if the common information were known at compile time, then instead of carrying a
reference to a global variable that stores the value, why not make them carry
the value itself? Instead of implementing a trait that dereferences by
accessing a global variable, you could instead hardcode the dereferenced value.
That could look something like this, using the SparseBigInt
example:
#[derive(Default)] pub struct SparseBigInt<Base: Deref<Target=i32>> { base: Base, digits: BTreeMap<i64, i32> } fn main() { #[derive(Default)] struct Base260 {} impl Deref for Base260 { type Target = i32; fn deref(&self) -> &i32 { &260 } } let zero_in_base_260: SparseBigInt<Base260> = Default::default(); }
However, this implementation doesn't guarantee that the base always dereferences
to the same value (nothing is forcing deref
to be deterministic).
Fortunately, Rust already has a feature that solves this particular problem: a
const generic:
#[derive(Default)] pub struct SparseBigInt<const BASE: i32> { digits: BTreeMap<i64, i32> } fn main() { let zero_in_base_260: SparseBigInt<260> = Default::default(); }
There are a lot of things to like about this solution:
It's type-safe and can be checked at compile time, e.g. the implementation of
Add
can beimpl<const BASE: i32> Add<SparseBigInt<BASE>> for SparseBigInt<BASE>
and the Rust compiler will allow the addition of two
SparseBigInt
s only if they have the same base;There are no visibility issues with trait methods;
impl<const BASE> Default for SparseBigInt<BASE>
is perfectly acceptable Rust code, and the value ofBASE
will be visible inside the implementation ofdefault
;There's no per-object overhead;
The syntax is really nice compared to the alternatives (given that this is something that the language supports directly).
In other words, the value of the common information isn't being specified in a field of the object, or being given as a parameter to methods called on the object; instead, it's being stored in a generic.
It's therefore something of a pity that this solution isn't usable in the majority of cases. There are two major problems. One is that a const generic has to be a compile-time constant, whereas it's frequently the case that the common information isn't known until runtime. The other is that there are currently extreme restrictions on what type a const parameter can have (it has to be a primitive, integer-like type); there are plans to ease some of these restrictions in the future, but even after that, the common information will frequently be of a type that's inappropriate for use in a const generic (such as with the allocator example above, which conceptually wants the common information to be a mutable reference to a value created at runtime).
Arriving at a solution
With none of Rust's existing language features providing a perfect solution to the problem, it therefore makes sense to think about how a new language feature could solve it (especially as, in this blog post, I'm planning to argue that one should be added). Having seen some partial solutions above, what does the perfect solution look like?
The two main things that have to be decided, when implementing any new language feature, are a) the semantics (i.e. how do programs that use the feature behave?), and b) the syntax (i.e. what do programs that use the feature look like?). Out of the attempts in the previous section, one syntax stands out as much neater than the others: the const generics looked really nice and were easy to use and understand. That indicates that the language feature we're looking for should be some new kind of generic that indicates the common information. That way, structures, enums, etc. that need to be interpreted using a particular piece of common information would declare it as a generic parameter; and the functions and methods that operated on them would be made generic over the common information in the same way.
What about the semantics? One important property of the common information is that it's common to multiple objects, and there's often a need to prove at compile time that it's the same across multiple objects. As such, when the common information is provided via a generic parameter, it needs to stay the same within all uses of a particular object with that generic parameter, during the entirety of a call of a funciton with that generic parameter, and so on – it therefore has to be treated like a constant in those cases (a problem that the const generics solved via being an actual constant, holding its value throughout the entire execution of the program). On the other hand, unlike an actual constant, it might not have a value at some points in the program where it isn't actively being used, and might, e.g., have a different value on every iteration of a loop.
This suggests that the correct solution is something along the lines of a "scoped constant" – instead of being constant for the entire program, it's constant only within a particular region of code. The basic idea is that the code should be able to declare "within this block, and all the functions it calls, etc., I am going to give this value this name, and the name will continue to correspond to that value throughout the block" – and then it should be able to use that name as though it were a const generic, when creating objects, calling functions, and the like.
Here's an example of what that might look like syntactically:
#[derive(Default, Debug)] pub struct SparseBigInt<const 'a BASE: i32> { digits: BTreeMap<i64, i32> } // this would probably be implemented as a trait implemenation, but I'm writing // it as a function as an example fn add_sparse_big_ints<const 'a BASE: i32>( x: &SparseBigInt<BASE>, y: &SparseBigInt<BASE>) -> SparseBigInt<BASE> { /* addition algorithm, can mention BASE and use it like a regular constant */ } fn main() { let base = /* read a base from user input */; { let const BASE: i32 = base; let zero: SparseBigInt<BASE> = Default::default(); // sugar for: let twice_zero = add_sparse_big_ints::<BASE>(zero, zero); let twice_zero = add_sparse_big_ints(zero, zero); println!("{:?}", twice_zero); } // now BASE is undefined again, and we know there are no allocated objects // or currently executing functions that use it }
The basic idea is that let const BASE: i32 = base;
defines a new scoped
constant that lasts for the current scope, in much the same way as let
base_copy: i32 = base;
would define a new immutable variable that lasts for the
current scope. After that, the scoped constant can be used as a generic
parameter in much the same way as a regular constant; the only difference is
that the constant "has a lifetime" in the sense that it will no longer be valid
after its scope ends, so objects that use the constant as a const generic must
be destroyed before the end of the constant's scope, functions that use the
constant as a constant generic must finish executing before the end of the
constant's scope, and so on. (To indicate that the constant has a lifetime, I
wrote the generic using const 'a
rather than just const
– this limits the
lifetime of the object to at most 'a
in the same way that containing a
reference &'a
would.)
One other difference from normal const generics is that scoped constants are
considered equal in the sense of type equality only if they were created by the
same let const
statement. You can't create two different scoped constants
that happen to have the same value, then try to, e.g., store two objects, one
using each of the contants, in the same array. (This shouldn't be surprising
given that they have a value that's expected to vary at runtime – the power of
type systems typically doesn't go to the extent of "prove these
runtime-calculated values are always equal".) Because "these were produced by
the same declaration" is used as the definition of equality, rather than an ==
comparison, the scoped constants don't actually need to implement Eq
(unlike
regular constants, which do).
As usual, there are a few restrictions on what types can be used with this sort
of "scoped generic". The primary restriction is on the type of the scoped
constants; the simple case is when the type is Copy
, perhaps because it's a
shared reference. It is also possible to use a mutable reference, but this
comes with additional restrictions (see the next subsection).
There also probably needs to be a restriction on using objects with scoped generics as trait objects (and maybe also using traits with scoped generics as trait objects), and on using functions/methods with a scoped generic as function pointers. Unlike most uses of scoped generics, I have not been able to prove that these uses are sound, and suspect they may be unsound in some cases (you would probably have to at least limit the lifetime of the trait object to the lifetime of the scoped constant, but I'm not sure that's enough). Eventually it might possibly be worthwhile to determine exactly which uses are sound and allow them; but as these are relatively fringe uses, disallowing them entirely in an initial implementation would be reasonable.
Mutable references as scoped constants
When a function or method has a scoped generic parameter, it can act a lot like a normal method parameter:
// the normal way to write a "square an i32" function fn square(x: i32) -> i32 { x * x } // writing the same function using scoped generics fn square(x: i32) -> i32 { let const X: i32 = x; square_generic::<X>() } fn square_generic<const 'a NUM: i32>() -> i32 { NUM * NUM }
That's the primary reason why scoped constants usually have to be Copy
–
they're effectively being passed to a method as though they were parameters, and
values of most types become moved (and thus unusable) upon using them as a
method parameter (but scoped generics inherently need to be usable multiple
times).
However, Rust has one other type that can be passed as a method parameter without consuming it: the mutable reference. It'd be very useful to be able to use those as scoped generics too. For example, an allocator API intended to be implementable in safe Rust, but also compatible with existing unsafe allocators, might look something like this:
/// An allocator that allocates objects with lifetime `'a`. struct MyAllocator<'a> { /* ... */ } /// An allocation of type `T`, produced by `ALLOC`, lasting for `'a`. struct MyAllocation<const 'a ALLOC: &mut MyAllocator<'a>, T> { // `allocation` is always Some except while dropping it; the `Option` // wrapper exists to work around a known limitation of the Drop trait allocation: Option<&'a mut MaybeUninit<T>> } impl<const 'a ALLOC: &mut MyAllocator<'a>, T> MyAllocation<ALLOC, T> { fn allocate() -> Result<Self, AllocError> { // an example implementation using alloc::Layout let layout = Layout::new::<T>(); // this gives a mutable reference to `ALLOC` that lasts for at most the // current function, "reborrowed from the scoped generic" let allocator: &mut MyAllocator<'a> = ALLOC; let allocation = allocator.allocate(layout)?; /* ... create a MyAllocation using allocation ... */ } } impl<const 'a ALLOC: &mut MyAllocator<'a>, T> Drop for MyAllocation<ALLOC, T> { fn drop(&mut self) { let allocation = self.allocation.take(); /* ... return the allocation to ALLOC, which mutates ALLOC ... */ } }
MyAllocation
is efficient due to being just a single reference (the
MyAllocator
is a scoped generic rather than a field, so it doesn't need to be
stored in memory along with the reference). The scoped generic on
MyAllocation
ensures that when an allocation is dropped, it will always be
returned back to the allocator that produced it, not some other allocator with
the same allocation lifetime (allowing unsafe code to soundly make that
assumption) – and tying the lifetime of the allocation to the lifetime of the
scoped constant guarantees that the allocations must necessarily all be dropped
before the allocator itself is.
In order for this sort of thing to be sound, a couple of extra restrictions are needed. One is on uses of a scoped constant (or value of a scoped generic) of mutable reference type: instead of getting a reference that lasts for the entire lifetime of the scoped constant, you get a reference that lasts at most for the duration of the current function or method, and the scoped constant counts as being "mutably borrowed" until the reference is dropped. Although this is a bit more limiting than a typical (non-scoped-constant) mutable reference, it's still powerful enough to allow for a wide range of useful programs.
This is the reason why the type of the scoped generic was written as &mut
MyAllocator<'a>
with no lifetime on the &mut
:
when creating the scoped constant, the lifetime can be any lifetime that's at least as long as the scoped constant's;
and when using the scoped constant, the lifetime of the reborrow is limited to the current function or method;
so in neither case is there a fixed lifetime on the value of the scoped constant that makes any sense to track separately from that of the scoped constant itself. (Meanwhile, if the scoped constant were a shared reference, it would make sense to track the lifetime of its value, which could meaningfully be longer than the lifetime of the reference itself.)
A second restriction is needed in order to make it impossible to get at two copies of the same mutable reference via using two different generic parameters (which could happen using either generic type parameters or scoped generics). A sufficient restriction, and the one I'd recommend for an initial implementation, is "while there is a mutable borrow of a scoped generic, it is impossible to make any futher use of generic type parameters or scoped generics until that mutable borrow is dropped, other than the type of the scoped generic itself and its generic parameters" (lifetime and const generics are still usable). That means that you can't use them as generics, but also can't use them as, e.g., the type of an object on which to call a trait method – another way to think about the restriction is "all the types used while the borrow exists are either the type of the scoped generic itself, types contained within that type, or types that are hardcoded into the function/method" (although you can still do things with type generics that don't involve running code on them, like moving/copying them and converting references to raw pointers). This restriction is slightly stricter than it needs to be, but has the advantage of being easy to understand and implement, and could potentially be relaxed in the future – and it isn't quite as restrictive as it sounds, because often the lifetime of the borrow is very short and thus doesn't restrict very much code.
Why this is the correct solution
It's possible to bikeshed the fine details of the syntax I've written above
(e.g. perhaps we could use a regular let
rather than needing a separate let
const
? maybe const
is the wrong keyword for the generic, as it isn't constant
over the whole program?). However, at least semantically I am fairly confident
that this solution is the correct one, for several different reasons. This
section will cover one of the main reasons, and some of the others are discussed
later on.
One thing that's worth doing when evaluating a language feature that solves a problem that comes up all the time, across many different programs, is to wonder "why didn't this problem need solving earlier?". In many cases, you discover that it was in fact solved earlier, but only in a special case, in a way that didn't deal with the general problem as a whole – and it can be instructive to look at the solution to the special case, to see how the general case compares.
With a problem as general as the one in this blog post, it is unsurprising that at least one core Rust feature is, in effect, a solution of a special case of the "scoped generic problem". Just think about the difference between a reference and a pointer:
#[derive(Default)] struct Thing { /* ... */ } let my_thing = Thing::default(); let reference: &Thing = &my_thing; // a complete object let pointer: *const Thing = reference as *const Thing; // an incomplete object
In terms of what's stored in the computer's memory during execution, there is no
distinction between these two values! They're both just going to store the
address of my_thing
in memory. The difference is that raw pointers are
unsound to use unless you take precautions with them (e.g. to dereference them
soundly, you must ensure that the target is allocated, correctly initialized,
still alive, not currently mutably borrowed, etc.). Meanwhile, references carry
a piece of additional information (their lifetime) that a) specifies the
conditions under which using them is sound, and b) places a limit on how long
the reference can be used for. The nature of the lifetime becomes a little
clearer if you write the references using a type alias, rather than using the
usual abbreviated syntax:
type ThingRef<'a> = &'a Thing; fn pick_a_thing<'a>(x: &'a Thing, y: &'a Thing) -> &'a Thing { /* ... */ }
This lifetime is a) a generic, b) lasts for a particular scope, and c) can be compared across different objects. This seems incredibly similar to the common information that's the difference between a complete and incomplete object; and indeed, the lifetime is precisely the difference between a reference and a pointer.
In fact, basically the only difference between Rust's existing lifetimes, and
the scoped constants I defined above, is that the scoped constants have a
nontrivial value which exists at runtime, whereas the lifetime is only a
compile-time construct. In a way, Rust's existing lifetimes can be seen as
"scoped constants, except restricted to being zero-sized types". (Similarly,
Rust's existing constants can be seen as "scoped constants, but with 'static
lifetime".)
One of the things that makes me confident that scoped generics are the correct solution to the problem, therefore, is that they're just a simple generalisation of something that already exists in Rust. In effect, a good way to think about it is "a scoped generic that's a zero-sized type is just a lifetime, and thus it could be used to implement a lifetime" – and when you do that, you end up with Rust's existing implementation of lifetimes, which is an encouraging sign.
Scoped generics and generativity
Earlier in this document, I said I was listing all the solutions to the "scoped generic problem" I could think of, except one – and this section is about the remaining solution. If you read a lot of blog posts about Rust that discuss how to do various things soundly, you'll occasionally see people say things along the lines of "this code does runtime checks to ensure soundness and will panic if the wrong arguments are given; you could ensure that everything works correctly at compile time by using generativity", together with an implication that the compile-time solution is too difficult, complicated or confusing to use in practice. (It certainly seems to be the case that people don't generally use it in practice!) The original source of the technique seems to be Gankra's thesis (PDF), section 6.3 (and even it contains a disclaimer, "It should be noted that this trick is very delicate, and I don't expect it to see much use").
In this section, I'm going to be discussing how generativity works, why it hasn't really caught on as a solution to the problem, and how it would be possible to implement scoped generics in the existing Rust compiler via doing something very similar.
Defining "generativity"
So what is generativity? The blog posts make it sound like it's a programming technique, but it's actually a property that abstraction mechanisms can either have or not have, i.e. they can either be generative or non-generative (with neither being clearly better – they're useful in different situations). The basic idea is that an abstraction mechanism is generative if you can use it to abstract the same thing multiple times (e.g. by calling it in a loop) and end up with results that are treated as different by the compiler (i.e. the "these things are the same" relationship got abstracted away, in addition to the other things being abstracted).
For example, in Rust, impl Trait
is non-generative:
trait DebugGenerator { fn gen_debug(&self) -> impl Debug; // abstracts the type being returned } fn debug_generator_test<T: DebugGenerator>(t: &T) -> impl Debug + '_ { [t.gen_debug(), t.gen_debug()] // but it's the same type every time }
You can't store two objects in an array unless they have the same type as each
other, so the impl Debug
abstraction, while it has hidden the exact type that
gen_debug
returns, has not hidden the fact that it's the same type every time
(i.e. the guarantees are "this method returns some type that implements Debug
,
and it's always the same type if called on the same type of object").
Meanwhile, lifetimes have behaviour that's very close to the definition of generativity:
#[derive(Clone, Copy, Debug)] struct Foo<'a>( PhantomData<*mut &'a ()> // makes Foo invariant over 'a ); impl<'a> Foo<'a> { fn accept(&self, _ref: &'a i32) {} } let x = Foo(PhantomData); for y in 0..10 { x.accept(&y); // "borrowed value does not live long enough" }
Syntactically, there's only one call to accept
here; and the data it's being
given are a) the memory location of y
, and b) the lifetime of the reference to
y
. However, it's being called in a loop, and each time round the loop, the
lifetime is different, despite conceptually being more like a type than like a
value (given that it exists only at compile time and is zero-sized). This means
that the code fails to compile, because x
has type Foo<'a>
, where there's no
valid lifetime for 'a
(for each of the ten loop iterations, it has to fit
entirely within that loop iteration, and those requirements are impossible to
satisfy simultaneously).
I'm not sure whether the original definition of the term "generativity" was supposed to cover this sort of behaviour (a lifetime isn't really an abstraction mechanism), but it's generally used that way in the Rust community, so let's use it here too.
Using generativity for compile-time guarantees
The mentions of "generativity" in the Rust literature are generally in the context of observing that you can use that behaviour of lifetimes (and in particular invariant lifetime generics) to, effectively, create values of a unique type. Here's what it would look like when used to implement the common information, together with an example of the compiler catching an attempt to use two different pieces of common information with a function that requires two of the same:
#[derive(Debug, Clone, Copy)] // `*mut &'a ()` is the simplest type that's invariant over 'a struct CommonInfo<'a, T>(T, PhantomData<*mut &'a ()>); fn let_const<T, R>(x: T, f: impl for<'a> Fn(CommonInfo<'a, T>) -> R) -> R { f(CommonInfo(x, PhantomData)) } fn assert_type_eq<T>(_x: T, _y: T) {} let_const(1, |sg1| { let_const(2, |sg2| { assert_type_eq(sg1, sg2); // "borrowed data escapes out of closure" dbg!(sg1, sg2); }); });
The basic idea is that let_const
creates a CommonInfo
object with a lifetime
'a
that is different from that of all other CommonInfo
objects. Doing this
in Rust is actually surprisingly difficult (it tries very hard to make the
lifetimes in your program work correctly, often by making multiple lifetimes the
same) – I made a lot of failed attempts to create a unique lifetime. This
particular implementation does it by generating the lifetime using a
higher-ranked trait bound (for<'a> Fn
) – f
has to be a function that is able
to accept common information with arbitrary lifetimes, and thus has to assume
that any lifetime it might be given could potentially be different from any
other specific lifetime. Unfortunately, this comes with the cost that the end
of the scope has to be marked explicitly, rather than being inferred by the
compiler (as is normal with let
bindings). Apparently a better solution
exists, but it's even more complicated
and relies on the exact details of the way destructors work in Rust.
In order to make CommonInfo
s with different T
values incomparable, the
structure is made invariant
over 'a
, via using PhantomData
to change the variance. (PhantomData
increases a type's variance to match that of a given type; *mut &'a ()
is
invariant over 'a
, so PhantomData
changes CommonInfo
's variance to match
that.) Normally, if Rust sees two different lifetimes that are supposed to be
the same, it merges them into a single lifetime in a way that depends on their
variance, but the invariance of CommonInfo
prevents it from doing that.
The resulting technique can be used to improve either the "incomplete objects" or "complete objects" solution:
// A complete object pub struct SparseBigInt<'a> { base: CommonInfo<'a, i32>, digits: BTreeMap<i64, i32> } // now this only works on two SparseBigInts with the same base impl<'a> Add for SparseBigInt<'a> { /* ... */ } // An incomplete object pub struct DigitList<'a> { digits: BTreeMap<i64, i32> } impl<'a> DigitList<'a> { fn new(base: CommonInfo<'a, i32>) -> Self { /* ... */ } // this will only compile if the base is correct for both `self` and `inc` fn add_bigint(self, inc: Self, base: CommonInfo<'a, i32>) -> Self { /* ... */ } }
The gain is in solving the "the common information doesn't match" problem. In
the case of the complete objects, it's now impossible to mix two objects that
have different bases – the Add
trait, by default, can only add two objects of
the same type (it's possible to override this, but the above code doesn't). In
the case of the incomplete objects, although the caller still has to specify the
common information, the code will no longer compile if they get it wrong; given
any DigitList<'a>
, there is only one CommonInfo<'a, _>
with the same
lifetime 'a
, so it must be the same one that was specified when the
DigitList<'a>
was created.
This generativity-based solution is, in a sense, a way to use scoped generics to implement scoped generics – Rust currently has only zero-sized scoped generics (lifetimes), but it's still possible to use those for their "these two types are incompatible" properties to create a range of values that each have a unique type, and use those as non-zero-sized scoped generics.
All this is clearly better than either the complete or incomplete object
solutions in the sense of compile-time safety; you basically end up with the
same thing as the original solution, but with a little extra help from the
compiler to reject incorrect code. Unfortunately, there are still several
problems it doesn't solve: most notably, it doesn't help with implementing
traits (the complete objects still can't implement Default
or From
even with
help from the lifetime generic, and likewise the generic doesn't help the
incomplete objects to implement implement Add
, Display
or Drop
).
Additionally, the ergonomics are pretty bad: the let_const
function I have
above is somewhat ugly to use, and yet I couldn't find anything better; and
there are lifetime parameters on the structures which mostly don't act like
lifetimes normally do in Rust, which could be confusing. So, despite being
safer than the alternatives, I'm not surprised that programmers typically don't
bother with this in practice.
How to implement scoped generics
Generativity might not be a full solution to the scoped generics problem – which
is why a new language feature is required – but it does provide something of a
roadmap to implementing one. With the key insights of "lifetimes are basically
zero-sized scoped generics" and "invariant lifetimes can be used to tag types to
ensure the types match", it turns out to be fairly easy to implement scoped
generics via extending the compiler. There are two main things that need to be
implemented: let const
(or whatever the name gets bikeshedded into), and the
implementation of the generics themselves. Both can be implemented via getting
the compiler to desugar the code primarily into things it already knows how to
implement (but the desugaring can't be implemented directly by the programmer,
as it can affect code in other crates).
let const
is fairly simple, and works just as with the generativity solution.
A scoped generic basically just desugars into a pair where one element of the
pair is a variable or function/method parameter, and the other is a lifetime.
Creating the variable half of things is just the pre-existing let
statement;
and that needs to be paired with a unique lifetime. The "create a unique
lifetime, that cannot be equal to any other lifetime" part of things is hard to
write in current Rust code, but should be an easy primitive to add to the
compiler (it seems somewhat easier to implement than most of the things the
compiler already does with lifetimes).
A scoped generic parameter on types desugars into an invariant lifetime – as with the "incomplete objects" solution, the value of the generic is not stored with the object itself (because doing so is inefficient – as it can require storing many copies of the same data – and is unsound in cases where the value is a mutable reference).
In order to compensate for this, a scoped generic parameter on a method or function desugars into two things: a) a generic lifetime parameter, as with the objects; but b) an additional function/method parameter, which is used to tell the method or function the value of the scoped generic. Here's an example of code that the programmer might write, using scoped generics:
#[derive(Default, Debug)] pub struct SparseBigInt<const 'a BASE: i32> { digits: BTreeMap<i64, i32> } fn double<const 'a BASE: i32>(x: &SparseBigInt<BASE>) -> SparseBigInt<BASE> { // sugar for: add::<BASE>(x, x) add(x, x) }
and the code it would desugar into:
#[derive(Default, Debug)] pub struct SparseBigInt<'a> { digits: BTreeMap<i64, i32> // code to make the structure invariant over 'a } fn double<'a>(x: &SparseBigInt<BASE>, BASE: i32) -> SparseBigInt<'a> { add::<'a>(x, x, BASE) }
This desugaring is applied even if the scoped generic parameter is not written directly, but implied via a type generic (when the type in turn has a scoped generic) – and trait methods are effectively "generic in the type of their receiver" for this purpose. For example:
// a standard library trait pub trait Drop { fn drop(&mut self); } impl<const 'a BASE: i32> Drop for SparseBigInt<BASE> { fn drop(&mut self) { eprintln!("Dropping a SparseBigInt with base {}": BASE); } }
desugars into:
// desugaring applied post-monomorphization, where trait methods have been // replaced with functions that operate on one specific type that implements // that trait fn SparseBigInt::drop<'a>(self: &mut SparseBigInt<'a>, BASE: i32) { eprintln!("Dropping a SparseBigInt with base {}": BASE); }
Implying an extra parameter on a standard library trait method, like this, is
something that Rust code simply can't do at the moment. However, it's the
capability that makes the whole thing work. The callers of the trait methods
– in this case, everywhere that the program tries to drop a SparseBigInt
value
– will always have access to the correct value of BASE
(because they can't
manipulate a SparseBigInt<BASE>
without having both the 'BASE
lifetime and
the value of BASE
– they would either be directly available from the let
const
or provided as parameters) – and the desugaring will therefore just fill
the parameter in.
Mutable-reference scoped generics are implemented in almost exactly the same way
(the only difference is that a use of the scoped generic as a value has a
reborrow inserted, &mut *REF
). As long as the compiler merges together uses
of the same scoped generic into a single parameter (which can always be detected
because they have the same lifetime, and different scoped generics always have
different lifetimes when using this implementation method), the restrictions on
what is allowed while a scoped generic is mutably borrowed will automatically
ensure that only one borrow of the value exists at a time (basically because as
long as the first borrow exists, all the language constructs that could name or
otherwise refer to the value being borrowed are banned, so there's no way to
create a second borrow).
One of the reasons I feel confident that this implementation of scoped generics is the correct one is that it implements in such a neat way – it's basically a combination of implying extra parameters onto methods in order to handle visibility problems, plus a soundness analysis that is the same as the existing soundness analysis for lifetimes. Additionally, because it effectively desugars into existing safe Rust code, there is little risk of accidentally implementing a soundness hole; expressing something in safe Rust is one way to demonstrate that it's sound, and because the scoped generics mechanism can be desugared into safe Rust (with sufficient help from the compiler), that demonstrates that it is sound, too.
A little bit of magic
One final check is worthwhile. New language features often add new capabilities to the language, and those let you write programs you couldn't have written before. However, it's not uncommon to make a mistake in defining the feature, which lets you go too far and do things that weren't meant to be possible, or are even potentially unsound. So, it makes sense, when defining a new language feature, to attempt to do the impossible, and see where you get stuck.
In this case, let's look at the most fundamental thing that's meant to be
impossible in safe Rust (barring the use of standard library APIs that use
unsafe
internally): making something that's both mutable and shareable at the
same time. Generally in Rust, you have to choose between a shared reference &
and mutable reference &mut
, and you can't have both sorts of reference
existing to the same data at the same time.
In several programs, this restriction tends to get in the way, and one commonly attempted workaround is to represent references by using indexes into a vector that gets passed around from function to function by mutable reference. When a function needs to do the equivalent of dereferencing a reference, it just accesses the element of the vector at the appropriate index. That acts like a mutable reference, in that you can change the element – but it also acts like a shared reference, in that you can store the index in multiple places (after all, it's just an integer).
This might induce a suspicion that scoped generics have gone too far in terms of power – after all, the vector here is acting very much like common information, and a mutable reference to a vector is something that is a permitted type for a scoped generic. So let's try to use a scoped generic that does this in an attempt to break the rules of Rust.
A first attempt is to create a "reference" type that's actually an index into a vector, with the vector being a scoped generic:
struct OmniReference<T, const 'a VEC: Vec<T>>(usize, PhantomData<&'a mut T>); // Rust doesn't support #[derive(Copy)] for types with non-Copy generics // but it does let you implement it manually: impl<T, const 'a VEC: &mut Vec<T>> Clone for OmniReference<T, VEC> { fn clone(&self) -> Self { *self } } impl<T, const 'a VEC: &mut Vec<T>> Copy for OmniReference<T, VEC> {} impl<T, const 'a VEC: &mut Vec<T>> OmniReference<T, VEC> { fn new(t: T) -> Self { let reference = VEC.len(); VEC.push(t); OmniReference(reference, PhantomData) } fn set(self, t: T) { VEC[self.0] = t; } fn get(self) -> T where T: Copy { VEC[self.0] } }
So far, we haven't broken any of the scoped generic rules (VEC.push(t)
references the generic T
while VEC
is mutably borrowed, but this is allowed
because T
is mentioned in the type of VEC
). There's an interesting T:
Copy
restriction on the get
method, which happens because we need to be able
to copy the T
out of the vector; we can't just return a reference because when
a &mut
reference is used as a scoped generic, the lifetime of the scoped
generic is limited to the lifetime of the method referring to it. This API also
looks very familiar, in that I'm pretty sure I've seen a "reference that you
can write to and copy out of, but not take references to the value inside"
before. Doesn't Cell
work like that?
Let's try to break things, by doing things that Cell
can't do. One possible
workaround to the Copy
problem might be to use a callback function, that can
act within the lifetime of the vector element. Or maybe we could work around
the lack of Copy
by cloning the value:
impl<T, const 'a VEC: &mut Vec<T>> OmniReference<T, VEC> { fn map(self, mapper: impl FnOnce(&mut T)) { (mapper)(&mut VEC[self.0]) } fn get_clone(self) -> T where T: Clone { VEC[self.0].clone() } }
One of these gets rejected by the compiler. In map
, the argument position
impl Trait
is shorthand for a type generic, with the signature desugaring to
this:
fn map<F: FnOnce(&mut T)>(self, mapper: F)
but there's an attempt to call a method on mapper
while VEC
is borrowed,
which would require using the type generic F
, and you can't use a function's
unrelated type and scoped generics while it borrows a mutable-reference scoped
generic (F
is not mentioned in the type of Vec
).
get_clone
, however, is more interesting. It is calling a method on a type
generic, but in this case, the type generic is mentioned within the type of the
scoped generic itself, so this is permitted. Normally, cloning the contents of
a Cell
is disallowed, in case the implementation of Clone
has a shared
borrow of the same cell and attempts to write to it while it's being cloned.
However, in this case, that particular failure case can't happen: if T
has any
scoped generics, they must have a lifetime that starts strictly before 'a
did
(because any scoped generics mentioned in T
must have been alive at the time
that the let const VEC
statement was run), so none of them can contain a
reference to VEC
, and there's no mechanism other than scoped generics to get
at the value of VEC
(because it's mutably borrowed by the scoped generic).
So, get_clone
is sound, and can be proven to be sound, even though the
corresponding operation on Cell
wouldn't be.
The upshot of all this is that the resulting OmniReference
is a) sound, and b)
basically Cell
, but with the compiler better able to understand which
operations on it would be sound, because it can use the scoped generics as a
guide to know what might potentially cause a double mutable reference. Being
able to implement something more powerful (if only slightly more powerful) than
Cell
is fun, and might potentially be a useful addition to Rust's arsenal
(beginners to Rust often try to use mutable structures with lots of Rc
and
RefCell
, and being able to avoid the runtime checks of RefCell
seems like it
could help make that sort of program more reliable by reducing the number of
opportunities for runtime panics). It is, of course, less efficient due to the
extra layer of indirection.
Something much more dramatic has been managed here, though: this is an
implementation of Cell
entirely in safe code. That isn't something that's
"intended" to be impossible, in the sense that it inherently opens up soundness
holes. It is, however, something that most Rust programmers probably wouldn't
expect to be possible in the language (and which, as far as I can tell, isn't
possible without scoped generics or some language feature like them).
It would be nice if this could be used directly to replace Cell
– it would
remove a lot of complexity from the language (there are quite a lot of
UnsafeCell
-related special cases needed for the compiler to correctly reason
about code that uses Cell
), perhaps meaning that a scoped generics feature
saves more than it gains in terms of complexity creep. Unfortunately, it
probably isn't actually capable of doing that, both because it's less efficient,
and because it would involve adding a scoped generic to every function, method
and type in the entire program that uses Cell
s (which would be pretty ugly and
annoying to deal with). Still, it might serve as a starting point to think
about ways to simplify the language – I prefer it when programming languages are
based on a few deep ideas that solve a wide range of issues, rather than
special-case fixes like UnsafeCell
that act differently from the rest of the
language and need special handling.
Future directions
Here are some ideas about ways that the scoped generics mechanism could be improved in the future.
Generics in Rust can lead to a lot of syntactic noise sometimes; and scoped generics would introduce generics in places which previously wouldn't use them. Some people might not find the gain in safety to be worth the increase in syntax complexity, and as such, it's probably worth thinking about ways to make the syntax for generics nicer (especially as those would also benefit programs that don't use scoped generics).
The restrictions I've listed above on mutable-reference scoped generics are
stricter than they really need to be; it would be safe to use unrelated generics
as long as they can be proven to be entirely unrelated (rather than apparently
unrelated but able to access the same scoped constant as another generic). That
would most likely involve inventing some new sort of where
clause that says
"where these two generics have no scoped constants in common" – that way, maybe
map
would work on the scoped-generic cells too.
On another note, there's one particularly intriguing future direction for scoped generics which would open up a huge rabbit hole, and definitely shouldn't be included in an initial implementation because the implications are so far-reaching and because it would likely be very difficult to implement, but which might turn out to make safe Rust capable of a much wider range of programs and which (for someone who already understands scoped generics) would be very easy to learn and understand. Look at this hypothetical code:
// define a const generic, so SIZE is being treated as constant in this scope let const SIZE: usize = /* some size input by the user */; // SIZE is being treated as constant, so it can be used as an array dimension let array = [0u32; SIZE]; // and although this array is dynamically sized, it still implements Sized println!("Array size in bytes: {}", mem::size_of::<[u32; SIZE]>()); // let's create a variable-sized structure that still implements Sized #[derive(Default)] struct Stretchy<const 'a SIZE: usize> { a: [bool; SIZE], b: [f32; SIZE], } let stretchy = Stretchy::<SIZE>::default();
This doesn't work based on the above definition of scoped generics: an array
size is a const generic, not a scoped generic (so it has to be an actual
constant, not a scoped constant). Still, the whole thing seems to work in terms
of soundness, and might well make code more efficient (because a slice reference
is twice the size of an array reference, so if two or more variable-sized arrays
are using the same scoped constant for their size, that saves memory compared to
using two slice references). It would also provide easy solutions to problems
like "how do you do an alloca
in Rust" (meaning that you wouldn't need a
separate "unsized locals" feature for that – just make them sized), and would
make C code easier to port to Rust (in C, it's legal to simply just declare a
variable-length array like the one above, although the relative lack of
compile-time checks there make it less difficult to implement). It does seem
like it might be very difficult to implement, though (with all sorts of things
that were previously compile-time constants suddenly becoming able to change at
runtime).
Conclusion
Rust needs scoped generics. They solve a problem which comes up very frequently – it's an issue for basically every Rust program I write – and currently has no good solutions; and they solve it in a way that's consistent with the rest of the language (with lifetimes and const generics being special cases of scoped generics). They also solve the same problem that "using generativity" is intended to solve, but in a way that's sufficiently simple and ergonomic that people might actually use it. And all those advantages come at a relatively low cost – compared to most language features that have this sort of scope and power, they should be fairly easy to implement (given how similar they are to lifetimes, which Rust already implements).