Why Rust needs scoped generics

Tags: rust programming memory | Written by Alex Smith, 2024-08-26

Introduction

Although most of the posts in this blog are about programming in C, more recently I've been using Rust for the bulk of my programming. For many years, there weren't any languages which were "better enough" than C to persuade me to switch (there were plenty that were slightly better, but switching has costs); but Rust has cleared that bar by some distance.

One particular attraction of Rust is the extent to which it is capable of proving, at compile time, that your program doesn't contain certain classes of bug. The highest-profile aspect of that is that it can entirely prevent many categories of security bugs (although, obviously, not all of them); but in some cases, it accomplishes that by replacing them with runtime crashes, which makes your programs more secure but not any more reliable. The more exciting aspect, to me, is the range of cases in which it not only prevents bugs from having a security impact, it proves that they won't happen at all, and often without any runtime overhead. Sound code is one thing – but it's even better to have code that's sound and panic-free, and yet still runs efficiently.

Rust already goes quite some distance to helping programs be reliable in this sense, which makes the places where it falls short feel like more of a disappointment, especially in places where I'd expect the language to be able to help. In particular, there's one problem that comes up in pretty much every program I try to write (and which therefore is probably coming up in programs that other people are writing, too), and which has a variety of negative effects: memory wasted storing the same values repetitively, which then ends up either requiring runtime checks that all the copies actually are the same even though that should be possible to determine at compile time, or omitting the checks and just hoping that you wrote the code correctly (knowing that a mistake wouldn't be caught by the compiler and would produce garbage results, and that the compiler might have to add checks anyway in order to prevent this leading to security bugs).

The problem is hard to define precisely, and I've dedicated a substantial portion of this blog post to it (although a summary would be "you have multiple values that need extra context to make sense of and/or use safely, but the context is, or at least is supposed to be, the same for all of them or for a subset that can be tracked at compile time"). It also has a range of possible solutions, none of which really work.

This blog post is about two main things. One is to explain the problem to the reader, along with the existing solutions and the reasons why none of them really work to solve it. The other is to explain how a relatively-easy-to-implement (and backwards-compatible) change to the Rust programming language could more or less eliminate the problem entirely, via adding a new language feature which I'm tentatively calling "scoped generics", and to try to convince the reader that implementing such a change would be worthwhile. Along the way, I cover the mysterious topic of "generativity" (which is a partial but incomplete solution to the problem, and which scoped generics would make much more ergonomic to use), recontextualise some of Rust's existing behaviour (it turns out that a lifetime is actually just a zero-sized scoped generic!), and end up discovering that the new feature would make it possible to (sort of) implement a standard library API entirely using safe code, even though it's one of those APIs for which that should be clearly impossible.

(A note about the intended audience: this post assumes that the reader already understands Rust – it's already long enough as it is, and I'm expecting most of the people who are interested in improvements to the language will already know the language I'm trying to improve.)

Defining the problem with some examples

Before I can start to explain a solution, I'll need to explain the problem I'm trying to solve. Because it's such a commonly occurring problem, the easy way to do that is to take examples where it actually occurred, either in code I've already written or in code I was considering writing. There were quite a lot of examples, but hopefully three examples will be enough to demonstrate the nature of the problem – these examples were chosen to be easy to understand (hopefully), and to illustrate a range of different aspects of the issue.

Example 1: arbitrary-precision integers in varying bases

One of my hobbies is studying "Turing tarpits" – programming languages that can in theory do the same computations as any program in any other language that we know how to implement, but that aren't very good at it. These languages can be interesting either because they are exceedingly simple (e.g. FRACTRAN or the I/D machine), or because they were discovered naturally in something not meant to be programmed (e.g. the rules and card pool of Magic: the Gathering form a Turing tarpit, as do the rules and card pool of Netrunner). To work out what a program in a Turing tarpit does, the obvious approach is to write an interpreter for the language with a few debugging features, and run the program in the interpreter to see what happens.

However, there's a problem: programs in Turing tarpits are sometimes incredibly inefficient, because the languages frequently don't have any sensible means of data storage. As a simple example, it's not uncommon for the easiest (or perhaps even the only viable) way to store a number n to be to represent it as a string with length 2n, and trying to execute a program that uses those sorts of numbers one instruction at a time is far too slow in practice. So instead, practically usable interpreters for Turing tarpits generally have to do optimisations, doing things like detecting simple loops within the program they're interpreting and working out what the effect of the loop as a whole would be rather than trying to interpret each iteration individually.

In one of my more recent experiments into Turing tarpits, I was working on analysing languages which had a tendency to store data by encoding it into very large numbers, produced by adding and subtracting small multiples of powers of a given base: one program might use numbers like 260318−11, whereas another might use numbers like 211−26−24. As such, I wanted to write an interpreter that could handle that sort of number efficiently, via storing a number as a small set of exponents and multipliers in a given base: the latter number could for example be represented as {(11, 1), (6, -1), (4, -1)} in base 2.

The obvious way to express this in Rust is to use a data structure that stores a sparse map from exponent to multiplier, along with the base. I came up with something that was somewhat similar to this (with a few irrelevant details elided for clarity):

pub struct SparseBigInt {
    base: i32,
    digits: BTreeMap<i64, i32>
}

This definition worked, to the extent that it was possible to write a working program using it (and I haven't revisited the program since, so it is still defining SparseBigInt with this sort of definition). However, it was clear that something was off with the way that base was being implemented.


The first problem came when I was trying to write the implementations of the arithmetic instructions; the problem impacted all of them, but let's talk about addition, because it's the first one I tried to implement. There's a pretty simple, efficient algorithm for adding unbounded integers of this sort (basically the same long-addition algorithm that most schoolchildren learn, except that you can speed it up by skipping over places where both numbers have a long run of zeroes, so the algorithm runs in linear time in the number of nonzero digits). Unfortunately, that doesn't work when the two numbers have different bases (and the addition algorithm when the bases differ is incredibly slow by comparison, and would ruin the performance of my program if I ever had to use it). Fortunately, this case was never going to come up in practice: with the tarpits I was analysing, all the numbers would naturally tend to be stored in the same base (which I could work out near the start of the program, before performing any arithmetic); and even more fortunately, Add and AddAssign are allowed to panic (they can do so even on normal data types like i32, if overflow checks are turned on). So all I had to do was to add a check that the bases were equal, and panic if they weren't.

Still, all this seemed a bit inelegant; every panic in your program is a warning of a place where bugs might exist (because it indicates both "this is something that, if it happened, indicates a bug in the program", and "it was impossible for the compiler to verify that this never happens" – the former is inherent in the meaning of a panic, and in the latter case the panic wouldn't be required). I am fairly sure that no base mismatches exist in the program that I wrote, but I am planning to reuse SparseBigInt for the analysis of other Turing tarpits in the future, and the requirement for the bases to match is the sort of thing that could easily catch me out then – it isn't documented in any machine-readable way, just in the implementation and in doc comments.

(Looking back at my code, I did in fact end up implementing the inefficient-by-comparison algorithm for performing arithmetic on numbers in different bases, just so that the arithmetic operators worked in all cases! This still ends up creating a potential performance pitfall, though, if it ever ends up happening, even though there is no longer a correctness pitfall. It might also have meant that the program would now contain quite a bit of code that is "supposed to be dead" and run only by the test suite; fortunately, the algorithm in question ended up being needed for an unrelated purpose, so the time spent writing the algorithm wasn't wasted, but there's no reason to expect that sort of luck to happen in similar cases in the future.)


This wasn't the only way in which my existing solution ended up inelegant. When writing a new type in Rust, it's usual to implement as many of the standard traits on it as possible, because this increases the chance that an operation you need will have been implemented in the standard library already, saving you from having to write your own. As a numeric type capable of representing 0, there is an obvious and useful default value for a SparseBigInt, so I naturally tried to implement Default on it. Unfortunately, this wasn't possible: the default digits is simple enough (an empty map), but what should the default base be? The only valid option I could think of at the time was "use some invalid sentinel value for base, then when performing an arithmetic operation on the number that makes it non-zero, change its base to match that of the other number" – but that would slow down every arithmetic operation in the program (due to the need to check for the sentinel value), and that wasn't worth it simply to make Default work. ("Change the base of zero to match", which I thought of only while writing this blog post, would also work, but would have similar downsides.) So instead, I ended up going without.

I did end up finishing the program in question; it helped me solve the question about Turing tarpits that I was trying to solve in the first place, so it did its job correctly. But it still left me wondering whether something was wrong with the way that I'd implemented base – even though my existing implementation seemed like it was the only available option.

Example 2: indexes that are used to represent something else

Recently, I've been interested in the problem of "reachability": given a large number of statements of the form "you can go from A to B", "you can go from B to D", etc., the goal is to determine whether it's possible to go from some given point to some other given point, possibly indirectly via a chain of other points. (The connections are one-way, so being able to go from A to B doesn't necessarily mean you can go from B to A.) It's pretty easy to do a single reachability query "efficiently" via spending time proportional to the number of statements, but in the situations where I need to solve this problem, I need to do a large number of reachability queries on a single reachability graph, and doing that efficiently seems to still be an unsolved problem of algorithmic graph theory (and which I don't yet have a solution for that I'm happy with).

Anyway, although for the reachability problem in general I'm still trying to find a good algorithm, I've already started implementing various easier special cases (e.g. it's possible to efficiently create a data structure that can efficiently answer queries like "is it possible to go from A to B and back again, and if so, what path makes that possible?" – only having to handle circular paths makes the algorithm much simpler). The data structure in question basically contains a map, where the keys are the various points, and the values are information about the corresponding point (which includes, among other things, references to other points).

How to implement something like this in Rust? Well, given that for any given data structure the graph it corresponds to is fixed, the obvious solution is to represent the points as integers from 0 inclusive up to the number of points exclusive – that makes it possible to implement the map efficiently by using an array-like object, and it's possible to store a reference to a point in the map values via just storing the appropriate integer (thus avoiding issues with circular references, which are generally very difficult to handle in Rust). A simplified version of the data structure might look like this:

struct PointInformation {
    // all of these `usize`s are point indexes
    root: usize,
    path_to_root: usize,
    path_from_root: usize,
}
pub struct ReachabilityData {
    point_info: Box<[PointInformation]>
}

Just like in the previous example, this worked, to the extent that I was able to create a working program (to solve the special case of finding paths from A to B and back to A); and just like the previous example, there were various issues that made the solution feel unsatisfying.


The first issue is that usize is not very type-safe as types go. A usize, in the abstract, could mean more or less anything. This means that if a typo causes the wrong variable to be used – a different usize with an entirely different meaning, perhaps – it won't be caught by the type system. There are a couple of solutions in Rust that can help to ease this issue; the most common is probably to use a wrapper type, typically implemented with the same representation as usize, but treated by the compiler as incompatible with it:

pub struct PointIndex(usize);

That solution prevents point indexes from being confused with other uses of usize. Unfortunately, I found myself needing to store data for multiple different reachability problems simultaneously, and this definition of the PointIndex type makes it global – although it confirms that a value is a point index for some reachability problem, it doesn't confirm that the value is a point index for the specific reachability problem whose solution is being looked up. So this solution would still fail to detect situations in which I accidentally passed a point index from one reachability problem to a data structure designed for a different problem; and doing that would not even cause a panic, it would just silently return meaningless results. As a result, this would still produce an API in which a fairly straightforward thinko or typo could lead to a hard-to-detect bug (which might not necessarily even be caught in testing, depending on whether the two reachability data structures happened to return the same value for the indexes that were tested).


Another, possibly larger, issue is related to the fact that the point indexes are used as indexes. This means that basically any use of them is going to end up indexing into an array (such as point_info in the example above) – and that in turn means that it needs to be in bounds. If the point indexes are used correctly – i.e. all the point indexes given as function arguments, and all the point indexes stored within the PointInformation, are actually point indexes for the appropriate reachability problem – then they will of course always be in bounds. However, the compiler doesn't know that, and will insert bounds checks on every use of an index. The bounds checks don't cost all that much time (an error-handling branch that is never taken will be very fast on modern processors, because they speculatively run the success case in parallel with checking whether the error case has occurred). However, they will still take some amount of time and cause some instruction cache pressure, so it would be better if the checks weren't necessary. It also isn't possible to use unsafe code in order to skip the bounds checks without the reachability library as a whole becoming unsound, because due to the previous problem, it isn't possible to guarantee that the indexing will always be inbounds – someone might have passed a point index from a different reachability problem as an argument, and then it genuinely could be out of bounds.

These two problems together mean that the documentation for methods on ReachabilityData tends to get really cluttered; pretty much every method needs to have a disclaimer like this:

Panics

If the given point index does not belong to this reachability problem, this method may return incorrect results or panic.

And that is very close to saying "this method has undefined behaviour" – it isn't fully undefined because it can't violate memory safety, but that just means that the consequences of a bug will be smaller, it doesn't do anything to prevent the bug happening in the first place.


A separate issue arose in the same program when I started writing display code. For efficiency and implementation convenience, ReachabilityData and PointInformation use point indexes: but the end user isn't thinking in terms of point indexes, but in terms of the actual points that they specified. Suppose that the code has found a path from A to B back to A, and want to display it to the user. One obvious solution is to have some sort of object that represents the path, with a Display implementation that displays the path – and that in turn would logically Display each point on the path. However, if the points are stored as point indexes, then Display won't have enough context to know what to print: Display::fmt has no arguments other than self and a formatter (which doesn't help), so when displaying, e.g., a PointIndex(5), then Display::fmt has no idea what that corresponds to from the user's point of view.

One solution that works for Display (and that I have used in cases like this before) is to have a structure that groups together a point index, and a reference to a structure that describes which points exist and what their names are. Perhaps something like this:

struct PointIndexDisplay<'a> {
    point_names: &'a [String],
    point_index: usize,
}

This creates an intriguing possibility for solving the API problems described earlier in this section: it might be possible for the various methods on ReachabilityData to take a (suitably named) PointIndexDisplay rather than a PointIndex or usize. The idea is that each problem would correspond to a unique set of points, and thus have a different point_names reference. If the point_names reference were also stored as a field in ReachabilityData, it could check at runtime whether or not a method argument corresponded to the correct reachability problem by comparing its point_names to the stored point_names (to see whether they are slices with the same start address and same length). This could be augmented by checking to see if the index were inbound in the PointIndexDisplay constructor, refusing to construct an object with an out-of-bounds index; that would then guarantee that the references into point_info were always in-bounds, even without a separate bounds check.

Does this technique solve all the problems? Sadly, no. Even if it makes it possible to save a bounds check, it's just been replaced with a check that the point_names are equal, so there's no actual saving there. Additionally, it doesn't really make sense to store PointIndexDisplay values within the PointInformation; it would technically work, but would cause huge memory bloat, because every single index would be accompanied by a slice reference (typically 16 bytes on modern computers), and all of those references would be the same, so it would be wasting a large amount of memory to store a lot of repetitive data. Thus, although the technique is somewhat helpful for catching mistakes (messing up the point index now gives you a deterministic panic rather than potentially silently producing the wrong result), and makes a Display implementation possible, it can't be used more pervasively because the comparisons make it too inefficient, and the extra error detection happens at runtime (whereas it's preferable to catch bugs at compile time if possible).

Still, it seems like there might be a seed of a good idea here, even if this alternative solution is also unsatisfying.

Example 3: local allocators

Here's an example from some code that I haven't written – but which lots of people want to write.

Some programs need to allocate a lot of short-lived data on the heap. In the general case, heap allocations have a lot of overhead; if data is being allocated and deallocated frequently, this can often be a major performance bottleneck.

One potential solution to the problem is to use a garbage collector (because garbage collectors can be optimized to have pretty good performance in this scenario), but those can be hard to integrate into programs written in systems programming languages like Rust. An alternative solution, commonly seen in games development, is the use of local allocators that allow for more optimized allocations of objects that exist only within a particular scope (such as the body of a particular function or method); as these only have to handle a limited number of allocation points, they can be specialised to work as fast as possible for the allocation patterns actually used with those objects via sacrificing functionality that happens not to be needed. (As a simple example, you can imagine an allocator operating under the conditions "all the objects allocated by this allocator will have the same type, and all of them will be deallocated simultaneously"; such an allocator can be orders of magnitude simpler and faster than a general-purpose allocator, because the calculations of where to allocate each object are incredibly simple and require only very minimal bookkeeping.)

Let's think about what a local allocator would have to do in order to interoperate with normal-looking Rust code. The main tasks of an allocator are allocating memory and deallocating memory, and the main problem with this is a visibility problem: the code that interacts with the allocator needs to actually be able to reference the allocator, in order to ask it for memory, and in order to let it know when it can reclaim the memory from an allocated object.

Allocation is very simple: it's possible to just pass a reference to the allocator around to every function/method that needs it (even though conceptually a mutable reference is needed, Rust "reborrows" the reference with every call to prevent it being moved into the function being called). This can be a little obnoxious sometimes, because all the functions will have the same parameter and all the calls will have to mention it, but isn't particularly problematic (and the same problem can come up in situations mostly unrelated to the situation discussed in this blog post – I may end up writing more about this in future). So this side of the problem has a "sufficiently good" solution already, and isn't really blocking the writing of any code. The solution would look something like this:

// the allocator exists in this function's scope
fn function_that_uses_a_local_allocator() {
    // overly simple allocator creation, but just assume this works for now
    let mut allocator = MyLocalAllocator::new();

    do_something(&mut allocator, /* other args */);
}

fn do_something(
    allocator: &mut MyLocalAllocator<'_>, /* other args */) {
    /* ... */
    // sugar for: do_something_else(&mut *allocator, /* other args */);
    do_something_else(allocator, /* other args */);
    /* ... */
}

fn do_something_else(
    allocator: &mut MyLocalAllocator<'_>, /* other args */) {

    /* ... */
    // Box::new_in is a real function, but doesn't actually work like this
    let my_object = Box::new_in(/* value to box */, allocator);
    /* ... */
}

Unfortunately, attempts to use the same solution for deallocation run into trouble. It's conceptually very appealing to have deallocation mirror allocation:

let my_object = Box::new_in(/* value to box */, allocator);
/* ... */
// I made this method up; Rust doesn't have it for reasons explained below
Box::drop_in(my_object, allocator);

Something like this is actually seen in practice sometimes in languages like C. Unfortunately, it has a tendency to lead to security holes; although it's perfectly secure if written correctly, in practice programs that use multiple allocators often seem to end up accidentally deallocating things with the wrong allocator, which is generally considered undefined behaviour and frequently a security bug. If allocators in Rust worked like that, you would have a soundness hole there, too:

fn main() {
    let mut allocator1 = MyLocalAllocator::new();
    let mut allocator2 = MyLocalAllocator::new();

    wrong(&mut allocator1, &mut allocator2);
}
fn wrong<'a, 'b>(allocator1: &'a mut MyLocalAllocator<'b>,
                 allocator2: &'a mut MyLocalAllocator<'b>) {
    let my_object = Box::new_in(0u64, allocator1);
    Box::drop_in(my_object, allocator2); /* oops */
}

The type system has unfortunately not been able to stop the attempt to deallocate the object using the wrong allocator, because there were two allocators with identical types (allocator1 and allocator2 are alive for the same lifetime, and the references to them also have the same lifetime) — there are a lot of things that can be checked by the type system (e.g. that my_object was allocated before it gets dropped, and isn't used afterwards), but "this is the right allocator" is not one of them.

Even ignoring that problem (e.g. by making Box::drop_in unsafe and leaving it up to the programmer to manually verify that they're using the right allocator), there is a second problem. Dropping objects manually isn't very common in Rust (although you can do it); normally, objects are dropped only in a destructor. What happens when you try to write one of those? Things start to go wrong even in the very simplest case, a struct that contains just one Box, and which tries to free it in the destructor:

struct BoxedError<'a> {
    my_field: Box<dyn Error, MyLocalAllocator<'a>>
}
impl<'a> Drop for BoxedError<'a> {
    fn drop(&mut self) {
        Box::drop_in(self.my_field, /* ??? */);
    }
}

The technique of passing around the allocator to every function and method that needs it works by adding an extra parameter to all those methods, but unfortunately, this isn't very compatible with standard traits like Drop, which have a fixed parameter list.


All this means that in practice, an alternative implementation is needed: instead of the allocator being known by every function that deallocates, it's tracked by every object that it allocates. This makes deallocation very easy, in destructors or otherwise: just ask the object which allocator allocated it, and then tell that allocator to deallocate it. For example, the current unstable implementation of non-global allocators in Rust uses this approach: if you call Box::new_in in order to create a Box using a specific allocator, the Box will store both a reference to the allocation, and the allocator itself, with <Box as Drop>::drop using the stored allocator to do the deallocation.

Unfortunately, this alternative approach, although it technically works, has at least two problems which mean that it isn't particularly usable in practice for local allocators. The more obvious is a performance issue: the primary reason to use a local allocator in the first place is for the performance gains, but if the allocator needs to be stored alongside every object it allocates, that can use up a substantial amount of extra memory. (The "natural" way to represent the allocator would be as a reference to memory describing it; but that ends up doubling the size of Box, because it now contains two references rather than one. In some programs, it's not uncommon for data structures to be made almost entirely of Boxes, so this could double the memory usage of the program.) All that extra memory (and all the extra time spent reading and writing it) is conceptually not really required in this case: it isn't conceptually too difficult to track which allocator to use for deallocations, with the stored allocator being useful only as a) a check that it's the right allocator / to catch situations where the wrong allocator is used by mistake, and b) a way to access the allocator in unusual contexts, such as from inside a destructor.

The less obvious problem is to do with verifying that the allocator itself is sound. Conceptually, allocations and deallocations mutate the allocator (because it needs to change its internal state to remember what memory has been allocated), and the natural/standard way to do that in Rust is to make the allocations and deallocations take a mutable/unique reference to the allocator. But that conflicts with the notion of "store the allocator in every Box it allocates"; you can't store the allocator itself because it's stateful, so you need to store a reference to the allocator but in Rust you can't have two references to the same thing if either of them are mutable, making it unclear what exactly is supposed to be stored. Rust's actual implementation of Box::new_in contains a workaround for the problem, which leads to calls to it looking pretty weird:

// Allocator is implemented on a shared reference, not the allocator itself
impl<'a> Allocator for &'a MyLocalAllocator<'a> {
    /* ... */
}

// the allocator exists in this function's scope
fn function_that_uses_a_local_allocator() {
    let mut allocator = MyLocalAllocator::new();

    do_something(&allocator, /* other args */);
}

// do_something could be given a specific type,
// but this generic type is probably more instructive
fn do_something<'a, T>(allocator: &'a T, /* other args */)
    where &'a T: Allocator {

    /* ... */
    // Box::new_in actually works like this
    let my_object = Box::new_in(/* value to box */, allocator);
    /* ... */
}

In other words, the value that's used to allocate and deallocate (and implements Allocator and is stored in the Box) is just a shared reference to the real allocator. (It doesn't technically have to be a literal shared reference, just an object that acts the same way, e.g. it could be a zero-sized type that acts like a "reference" to a global variable. But if it doesn't implement Copy, it gets moved into the Box and the first allocation thus prevents any further use of the allocator, so in practice it has to be shareable.) This makes the Allocator trait very awkward to implement correctly, because there are no compile-time checks for things like "the allocator isn't being used recursively", which in turn means that the allocator is not given unique access to any of its own internal state or to the memory that it's trying to allocate. This problem is solvable even in safe code, by using Cell, but then the code will end up running checks at runtime to ensure it isn't being used recursively (all the safe variants of Cell prevent the value inside being double-mutably-borrowed via one means or another, but it generally involves panicking or returning unusable dummy values rather than rejecting attempts at compile time), so you end up with an allocator full of potential panic points that the compiler can't prove aren't necessary.

In summary, local allocators just don't really work very well in current Rust. The efficient solution, of passing the allocator as an argument to every function that needs to interact with it, using it for both allocation and deallocation, can't obviously be checked for soundness by the compiler and doesn't mix well with destructors, which is particularly frustrating because it almost works and would produce pretty efficient code if it did. The alternative solution, of storing the allocator in every object, is inefficient and creates so many references to the allocator that it becomes hard for the compiler to prove that the allocator itself is implemented correctly, which tends to mean runtime checks (or unsafe code) and even slower performance. In order to prevent the idea of using local allocators to make Rust programs more performant turning into a dead end, some other solution is needed.

A survey of solutions

Hopefully, the examples above are enough to get an idea of the sort of situation that Rust has problems representing at the moment: there is some type of information that's needed to make sense of an object and/or to use it safely (the base, the list of which indexes correspond to which points, the allocator used to allocate it), and that information needs to be the same across a range of related objects (so that they can be added efficiently, or to ensure the correct array is referenced, or that the correct allocator is used to deallocate them). Often it becomes difficult to correctly implement a trait (the first example struggles with Default, the second Display, the third Drop).

In this section, I'd like to look at some potential solutions, none of which are ideal in the general case – but it's instructive to see the ways in which they fail, and will help to narrow down the problem and to be able to precisely state the requirements on the solution. (One possible solution has been omitted intentionally; it's covered later in this blog post. Let me know if I've missed any others.) I'll start with the two most obvious solutions – both of which were attempted at least twice in the discussion above – and then move on to some less obvious possibilities.

Incomplete objects

The approach that is probably most obvious to a programmer (such as me!) who learned systems programming in low-level languages like C is what I'll call the "incomplete object" approach. In this approach, the information that's common to a range of objects is not stored within any of the objects themselves; rather, it's stored separately by the code that uses those objects. Examples include the PointIndex in the second example (which represents a point, but is missing the information on what indexes represent which points), and the memory returned by the allocator in the third example (which needs to be deallocated by the same allocator, but which doesn't itself store information on what allocator that is).

This approach is generally pretty efficient, as the information only needs to be stored once for the entire set of objects. It complicates the signatures of methods that use those objects; that's a downside when using a single object, but it can turn into an advantage when two objects need to have the same information to be used together. For example, if the SparseBigInt objects from the first example were represented in this style via omitting the base, the resulting incomplete object would just be a digit list, and an addition function would look like this:

fn add_bigint(x: DigitList, y: DigitList, base: i32) -> DigitList { ... }

The signature makes it very clear that two integers can be added only if they're in the same base: it take two different digit lists, but the base only once.

Unfortunately, this approach has two big downsides. One of them is that it has no way to prevent the common information accidentally being specified incorrectly. Depending on the context, this could produce incorrect results (e.g. interpreting the digits of an integer as the wrong base), or even be a soundness hole (e.g. indexing beyond the end of an array, or deallocating an object with the wrong allocator).

The other downside is that it makes it very hard to implement traits. In my first example above, I didn't use this approach; if I had tried, I would have discovered that I couldn't implement Add (because trying to add one DigitList to another using + would discover that there was nowhere to specify the base, which is information required to do carries correctly). In the second example, Display couldn't be implemented on the incomplete PointIndex objects, and I needed to resort to a separate PointIndexDisplay; and incomplete objects can't usefully implement Drop, which was a major problem in the third example.

Complete objects

The opposite approach is to store the common information in every object that needs it: a full SparseBigInt rather than a DigitList, a PointIndexDisplay rather than a PointIndex, Rust's current unstable Box that stores the allocator. This can be considered something of a "brute force" solution to visibility problems: in order to ensure that something is always visible, just store it, or a reference to it, in everything that might possibly need it!

With SparseBigInt, this approach doesn't struggle too badly. The common information, the base (an i32), is pretty easy to store: it doesn't use up very much memory, it's Copy so there are no lifetime problems, and it doesn't change over time so there are no issues with ensuring that the correct version is stored. Even so, the problem with the huge proliferation of copies of base is that when there are multiple sources of truth for a particular value, it's generally only possible to ensure correctness by proving that they must all be the same. Adding two SparseBigInts involves checking that they have the same base, and this approach makes it impossible to do that at compile time. (There is an interesting symmetry with the previous approach: there, base was specified once, causing problems if it were specified incorrectly but statically guaranteeing that there's only one value; here, the way base is stored means that it cannot be specified incorrectly, but there is still no proof that the two base values match and things go wrong if they don't.)

Unfortunately, nothing is guaranteeing that the common information will be small, or Copy, or even cloneable. In the second example, the point_names is a shared reference to a list of strings – and that was a cut-down example, and in practice the common information in that sort of program can often be more complex than that. In the third example, the common information is conceptually a mutable reference to an allocator, and storing two of those is known to lead to trouble; Rust's current unstable allocator implementation has to work around the problem by representing it with a type with significantly different semantics, which in turn means that the compiler has trouble reasoning about the resulting allocator and can't prove very many safety properties. When using incomplete objects, none of this is a problem – you simply use one copy of the common information and borrow it as necessary. But it can create somewhat insurmountable problems here.

As seen in the first example above, this approach also doesn't play well with traits that contain methods that construct new objects, such as Default and From; for most traits, it can expose the common information to the trait implementation via the self parameter, but for methods that don't have one, the visibility problems remain.

Constant common information

Here's an approach suitable for people who don't mind recompiling their code a lot – and one that I've commonly seen used in practice in C code (and which is used even more pervasively in older programming languages like FORTRAN). It isn't very general, but when it does work, it solves all the problems on its own.

The idea is that sometimes, only one piece of common information is needed throughout the entire program. For example, when I was analysing Turing tarpits, I generally only needed one base for all the SparseBigInts in my program (e.g. the analysis I was most interested in used base 260). In the case where all that's needed is to be able to run the program on one single input, why not just make the common information a constant (e.g. in Rust, either a const or an immutable static)? This has lots of advantages:

The obvious disadvantages are a) you can only have one piece of common information for the whole program, and b) you have to recompile the program whenever it changes. Still, if you're writing a program for yourself, rather than writing a library, or writing a program that you plan to distribute for general use in a range of contexts, the latter isn't as much of a problem as it seems.

Interestingly, this approach comes up in Rust itself on occasion. In the allocator example above, I was primarily discussing the current state of the unstable allocator API – but there's a stable allocator API too. It looks like this:

#[global_allocator]
static GLOBAL_ALLOCATOR: MyAllocator = /* ... */;

In order to make sure that there is no unsoundness with allocator mismatches, but without spending bytes storing references to the allocator, stable Rust simply has one constant allocator for the whole program: whenever you use Box, Vec and friends, that's the allocator that's going to allocate the memory, and whenever you drop them, that's the allocator to which the memory is returned. Somewhat simplistic, but it's powering the vast majority of production-ready Rust code at the moment.

Storing the common information in a global variable

The previous solution used a compile-time constant, but it's possible to do something very similar using a mutable global variable instead. The basic idea is to initialise the global variable with the common information before any of the relevant objects are created, and then ensure that it keeps that value for as long as the objects are still alive.

In cases where only one piece of common information is needed for the lifetime of the program, such as was the case for me with the SparseBigInt example, this can be done pretty simply using cell::OnceCell. There are two soundness requirements (that the global variable is initialised in time, and that it keeps its value for long enough); and OnceCell trivially satisfies the second requirement (because it keeps its value forever). Unfortunately, the first requirement leads to some inefficiency, because the compiler cannot statically verify that the global variable is initialized before the first object is created, and the obvious implementation will end up instead checking whether the common information has been initialized every time it is requested. (As far as I can tell, the use of unsafe code would be the only way to eliminate this particular overhead entirely.)

Using global variables for the common information might also be possible in cases where it changes (but you would need one global variable for each piece of common information that might exist at a time; that implies in particular that this could not work with recursive code, which might have arbitrarily many nested "common information lifetimes"). It seems hard to get much help from the compiler when checking that the globals are initialized with the right value and stay alive for long enough (Rust doesn't have much in the way of compile-time checks of the status of global or interior-mutable variables). On the other hand, I am not convinced that it's impossible: consider a scheme like this:

This seems like, although it might need unsafe code to implement the locking scheme, it might be able to create a sound and safe interface for the code that uses it – this could possibly be a good basis on which to write a new library crate! It would be quite different from typical code, though, with these zero-sized types behaving in quite an unusual way; the closest thing that exists in Rust at the moment is &sync::RwLockReadGuard, but that isn't zero-sized, and a zero-sized version (that relied on hardcoded locations for storing its data) would probably need to be written from scratch.

It also doesn't solve all the problems with the "complete objects" solution, and in fact makes one worse; that solution had trouble with using mutable references as the common information, whereas this one struggles with using even shared references (storing references in global variables is generally really difficult because they generally have to be 'static). This is a serious downside, given that the common information frequently turns out to be a reference in practice.

Everything mentioned in this section could work with thread-local variables too, but it isn't clear that they have any significant advantages over global variables for the purpose.

Const generics

The zero-sized types in the previous example point to another possible solution: if the common information were known at compile time, then instead of carrying a reference to a global variable that stores the value, why not make them carry the value itself? Instead of implementing a trait that dereferences by accessing a global variable, you could instead hardcode the dereferenced value. That could look something like this, using the SparseBigInt example:

#[derive(Default)]
pub struct SparseBigInt<Base: Deref<Target=i32>> {
    base: Base,
    digits: BTreeMap<i64, i32>
}

fn main() {
    #[derive(Default)]
    struct Base260 {}
    impl Deref for Base260 {
        type Target = i32;
        fn deref(&self) -> &i32 { &260 }
    }

    let zero_in_base_260: SparseBigInt<Base260> = Default::default();
}

However, this implementation doesn't guarantee that the base always dereferences to the same value (nothing is forcing deref to be deterministic). Fortunately, Rust already has a feature that solves this particular problem: a const generic:

#[derive(Default)]
pub struct SparseBigInt<const BASE: i32> {
    digits: BTreeMap<i64, i32>
}

fn main() {
    let zero_in_base_260: SparseBigInt<260> = Default::default();
}

There are a lot of things to like about this solution:

In other words, the value of the common information isn't being specified in a field of the object, or being given as a parameter to methods called on the object; instead, it's being stored in a generic.

It's therefore something of a pity that this solution isn't usable in the majority of cases. There are two major problems. One is that a const generic has to be a compile-time constant, whereas it's frequently the case that the common information isn't known until runtime. The other is that there are currently extreme restrictions on what type a const parameter can have (it has to be a primitive, integer-like type); there are plans to ease some of these restrictions in the future, but even after that, the common information will frequently be of a type that's inappropriate for use in a const generic (such as with the allocator example above, which conceptually wants the common information to be a mutable reference to a value created at runtime).

Arriving at a solution

With none of Rust's existing language features providing a perfect solution to the problem, it therefore makes sense to think about how a new language feature could solve it (especially as, in this blog post, I'm planning to argue that one should be added). Having seen some partial solutions above, what does the perfect solution look like?

The two main things that have to be decided, when implementing any new language feature, are a) the semantics (i.e. how do programs that use the feature behave?), and b) the syntax (i.e. what do programs that use the feature look like?). Out of the attempts in the previous section, one syntax stands out as much neater than the others: the const generics looked really nice and were easy to use and understand. That indicates that the language feature we're looking for should be some new kind of generic that indicates the common information. That way, structures, enums, etc. that need to be interpreted using a particular piece of common information would declare it as a generic parameter; and the functions and methods that operated on them would be made generic over the common information in the same way.

What about the semantics? One important property of the common information is that it's common to multiple objects, and there's often a need to prove at compile time that it's the same across multiple objects. As such, when the common information is provided via a generic parameter, it needs to stay the same within all uses of a particular object with that generic parameter, during the entirety of a call of a funciton with that generic parameter, and so on – it therefore has to be treated like a constant in those cases (a problem that the const generics solved via being an actual constant, holding its value throughout the entire execution of the program). On the other hand, unlike an actual constant, it might not have a value at some points in the program where it isn't actively being used, and might, e.g., have a different value on every iteration of a loop.

This suggests that the correct solution is something along the lines of a "scoped constant" – instead of being constant for the entire program, it's constant only within a particular region of code. The basic idea is that the code should be able to declare "within this block, and all the functions it calls, etc., I am going to give this value this name, and the name will continue to correspond to that value throughout the block" – and then it should be able to use that name as though it were a const generic, when creating objects, calling functions, and the like.

Here's an example of what that might look like syntactically:

#[derive(Default, Debug)]
pub struct SparseBigInt<const 'a BASE: i32> {
    digits: BTreeMap<i64, i32>
}

// this would probably be implemented as a trait implemenation, but I'm writing
// it as a function as an example
fn add_sparse_big_ints<const 'a BASE: i32>(
    x: &SparseBigInt<BASE>, y: &SparseBigInt<BASE>) -> SparseBigInt<BASE> {
    /* addition algorithm, can mention BASE and use it like a regular constant */
}

fn main() {
    let base = /* read a base from user input */;

    {
        let const BASE: i32 = base;
        let zero: SparseBigInt<BASE> = Default::default();

        // sugar for: let twice_zero = add_sparse_big_ints::<BASE>(zero, zero);
        let twice_zero = add_sparse_big_ints(zero, zero);
        println!("{:?}", twice_zero);
    }
    // now BASE is undefined again, and we know there are no allocated objects
    // or currently executing functions that use it
}

The basic idea is that let const BASE: i32 = base; defines a new scoped constant that lasts for the current scope, in much the same way as let base_copy: i32 = base; would define a new immutable variable that lasts for the current scope. After that, the scoped constant can be used as a generic parameter in much the same way as a regular constant; the only difference is that the constant "has a lifetime" in the sense that it will no longer be valid after its scope ends, so objects that use the constant as a const generic must be destroyed before the end of the constant's scope, functions that use the constant as a constant generic must finish executing before the end of the constant's scope, and so on. (To indicate that the constant has a lifetime, I wrote the generic using const 'a rather than just const – this limits the lifetime of the object to at most 'a in the same way that containing a reference &'a would.)

One other difference from normal const generics is that scoped constants are considered equal in the sense of type equality only if they were created by the same let const statement. You can't create two different scoped constants that happen to have the same value, then try to, e.g., store two objects, one using each of the contants, in the same array. (This shouldn't be surprising given that they have a value that's expected to vary at runtime – the power of type systems typically doesn't go to the extent of "prove these runtime-calculated values are always equal".) Because "these were produced by the same declaration" is used as the definition of equality, rather than an == comparison, the scoped constants don't actually need to implement Eq (unlike regular constants, which do).

As usual, there are a few restrictions on what types can be used with this sort of "scoped generic". The primary restriction is on the type of the scoped constants; the simple case is when the type is Copy, perhaps because it's a shared reference. It is also possible to use a mutable reference, but this comes with additional restrictions (see the next subsection).

There also probably needs to be a restriction on using objects with scoped generics as trait objects (and maybe also using traits with scoped generics as trait objects), and on using functions/methods with a scoped generic as function pointers. Unlike most uses of scoped generics, I have not been able to prove that these uses are sound, and suspect they may be unsound in some cases (you would probably have to at least limit the lifetime of the trait object to the lifetime of the scoped constant, but I'm not sure that's enough). Eventually it might possibly be worthwhile to determine exactly which uses are sound and allow them; but as these are relatively fringe uses, disallowing them entirely in an initial implementation would be reasonable.

Mutable references as scoped constants

When a function or method has a scoped generic parameter, it can act a lot like a normal method parameter:

// the normal way to write a "square an i32" function
fn square(x: i32) -> i32 { x * x }

// writing the same function using scoped generics
fn square(x: i32) -> i32 {
    let const X: i32 = x;
    square_generic::<X>()
}
fn square_generic<const 'a NUM: i32>() -> i32 { NUM * NUM }

That's the primary reason why scoped constants usually have to be Copy – they're effectively being passed to a method as though they were parameters, and values of most types become moved (and thus unusable) upon using them as a method parameter (but scoped generics inherently need to be usable multiple times).

However, Rust has one other type that can be passed as a method parameter without consuming it: the mutable reference. It'd be very useful to be able to use those as scoped generics too. For example, an allocator API intended to be implementable in safe Rust, but also compatible with existing unsafe allocators, might look something like this:

/// An allocator that allocates objects with lifetime `'a`.
struct MyAllocator<'a> { /* ... */ }
/// An allocation of type `T`, produced by `ALLOC`, lasting for `'a`.
struct MyAllocation<const 'a ALLOC: &mut MyAllocator<'a>, T> {
    // `allocation` is always Some except while dropping it; the `Option`
    // wrapper exists to work around a known limitation of the Drop trait
    allocation: Option<&'a mut MaybeUninit<T>>
}

impl<const 'a ALLOC: &mut MyAllocator<'a>, T> MyAllocation<ALLOC, T> {
    fn allocate() -> Result<Self, AllocError> {
        // an example implementation using alloc::Layout
        let layout = Layout::new::<T>();
        // this gives a mutable reference to `ALLOC` that lasts for at most the
        // current function, "reborrowed from the scoped generic"
        let allocator: &mut MyAllocator<'a> = ALLOC;
        let allocation = allocator.allocate(layout)?;
        /* ... create a MyAllocation using allocation ... */
    }
}
impl<const 'a ALLOC: &mut MyAllocator<'a>, T> Drop for MyAllocation<ALLOC, T> {
    fn drop(&mut self) {
        let allocation = self.allocation.take();
        /* ... return the allocation to ALLOC, which mutates ALLOC ... */
    }
}

MyAllocation is efficient due to being just a single reference (the MyAllocator is a scoped generic rather than a field, so it doesn't need to be stored in memory along with the reference). The scoped generic on MyAllocation ensures that when an allocation is dropped, it will always be returned back to the allocator that produced it, not some other allocator with the same allocation lifetime (allowing unsafe code to soundly make that assumption) – and tying the lifetime of the allocation to the lifetime of the scoped constant guarantees that the allocations must necessarily all be dropped before the allocator itself is.

In order for this sort of thing to be sound, a couple of extra restrictions are needed. One is on uses of a scoped constant (or value of a scoped generic) of mutable reference type: instead of getting a reference that lasts for the entire lifetime of the scoped constant, you get a reference that lasts at most for the duration of the current function or method, and the scoped constant counts as being "mutably borrowed" until the reference is dropped. Although this is a bit more limiting than a typical (non-scoped-constant) mutable reference, it's still powerful enough to allow for a wide range of useful programs.

This is the reason why the type of the scoped generic was written as &mut MyAllocator<'a> with no lifetime on the &mut:

so in neither case is there a fixed lifetime on the value of the scoped constant that makes any sense to track separately from that of the scoped constant itself. (Meanwhile, if the scoped constant were a shared reference, it would make sense to track the lifetime of its value, which could meaningfully be longer than the lifetime of the reference itself.)

A second restriction is needed in order to make it impossible to get at two copies of the same mutable reference via using two different generic parameters (which could happen using either generic type parameters or scoped generics). A sufficient restriction, and the one I'd recommend for an initial implementation, is "while there is a mutable borrow of a scoped generic, it is impossible to make any futher use of generic type parameters or scoped generics until that mutable borrow is dropped, other than the type of the scoped generic itself and its generic parameters" (lifetime and const generics are still usable). That means that you can't use them as generics, but also can't use them as, e.g., the type of an object on which to call a trait method – another way to think about the restriction is "all the types used while the borrow exists are either the type of the scoped generic itself, types contained within that type, or types that are hardcoded into the function/method" (although you can still do things with type generics that don't involve running code on them, like moving/copying them and converting references to raw pointers). This restriction is slightly stricter than it needs to be, but has the advantage of being easy to understand and implement, and could potentially be relaxed in the future – and it isn't quite as restrictive as it sounds, because often the lifetime of the borrow is very short and thus doesn't restrict very much code.

Why this is the correct solution

It's possible to bikeshed the fine details of the syntax I've written above (e.g. perhaps we could use a regular let rather than needing a separate let const? maybe const is the wrong keyword for the generic, as it isn't constant over the whole program?). However, at least semantically I am fairly confident that this solution is the correct one, for several different reasons. This section will cover one of the main reasons, and some of the others are discussed later on.

One thing that's worth doing when evaluating a language feature that solves a problem that comes up all the time, across many different programs, is to wonder "why didn't this problem need solving earlier?". In many cases, you discover that it was in fact solved earlier, but only in a special case, in a way that didn't deal with the general problem as a whole – and it can be instructive to look at the solution to the special case, to see how the general case compares.

With a problem as general as the one in this blog post, it is unsurprising that at least one core Rust feature is, in effect, a solution of a special case of the "scoped generic problem". Just think about the difference between a reference and a pointer:

#[derive(Default)]
struct Thing { /* ... */ }
let my_thing = Thing::default();
let reference: &Thing = &my_thing;                     // a complete object
let pointer: *const Thing = reference as *const Thing; // an incomplete object

In terms of what's stored in the computer's memory during execution, there is no distinction between these two values! They're both just going to store the address of my_thing in memory. The difference is that raw pointers are unsound to use unless you take precautions with them (e.g. to dereference them soundly, you must ensure that the target is allocated, correctly initialized, still alive, not currently mutably borrowed, etc.). Meanwhile, references carry a piece of additional information (their lifetime) that a) specifies the conditions under which using them is sound, and b) places a limit on how long the reference can be used for. The nature of the lifetime becomes a little clearer if you write the references using a type alias, rather than using the usual abbreviated syntax:

type ThingRef<'a> = &'a Thing;
fn pick_a_thing<'a>(x: &'a Thing, y: &'a Thing) -> &'a Thing { /* ... */ }

This lifetime is a) a generic, b) lasts for a particular scope, and c) can be compared across different objects. This seems incredibly similar to the common information that's the difference between a complete and incomplete object; and indeed, the lifetime is precisely the difference between a reference and a pointer.

In fact, basically the only difference between Rust's existing lifetimes, and the scoped constants I defined above, is that the scoped constants have a nontrivial value which exists at runtime, whereas the lifetime is only a compile-time construct. In a way, Rust's existing lifetimes can be seen as "scoped constants, except restricted to being zero-sized types". (Similarly, Rust's existing constants can be seen as "scoped constants, but with 'static lifetime".)

One of the things that makes me confident that scoped generics are the correct solution to the problem, therefore, is that they're just a simple generalisation of something that already exists in Rust. In effect, a good way to think about it is "a scoped generic that's a zero-sized type is just a lifetime, and thus it could be used to implement a lifetime" – and when you do that, you end up with Rust's existing implementation of lifetimes, which is an encouraging sign.

Scoped generics and generativity

Earlier in this document, I said I was listing all the solutions to the "scoped generic problem" I could think of, except one – and this section is about the remaining solution. If you read a lot of blog posts about Rust that discuss how to do various things soundly, you'll occasionally see people say things along the lines of "this code does runtime checks to ensure soundness and will panic if the wrong arguments are given; you could ensure that everything works correctly at compile time by using generativity", together with an implication that the compile-time solution is too difficult, complicated or confusing to use in practice. (It certainly seems to be the case that people don't generally use it in practice!) The original source of the technique seems to be Gankra's thesis (PDF), section 6.3 (and even it contains a disclaimer, "It should be noted that this trick is very delicate, and I don't expect it to see much use").

In this section, I'm going to be discussing how generativity works, why it hasn't really caught on as a solution to the problem, and how it would be possible to implement scoped generics in the existing Rust compiler via doing something very similar.

Defining "generativity"

So what is generativity? The blog posts make it sound like it's a programming technique, but it's actually a property that abstraction mechanisms can either have or not have, i.e. they can either be generative or non-generative (with neither being clearly better – they're useful in different situations). The basic idea is that an abstraction mechanism is generative if you can use it to abstract the same thing multiple times (e.g. by calling it in a loop) and end up with results that are treated as different by the compiler (i.e. the "these things are the same" relationship got abstracted away, in addition to the other things being abstracted).

For example, in Rust, impl Trait is non-generative:

trait DebugGenerator {
    fn gen_debug(&self) -> impl Debug; // abstracts the type being returned
}

fn debug_generator_test<T: DebugGenerator>(t: &T) -> impl Debug + '_ {
    [t.gen_debug(), t.gen_debug()]     // but it's the same type every time
}

You can't store two objects in an array unless they have the same type as each other, so the impl Debug abstraction, while it has hidden the exact type that gen_debug returns, has not hidden the fact that it's the same type every time (i.e. the guarantees are "this method returns some type that implements Debug, and it's always the same type if called on the same type of object").

Meanwhile, lifetimes have behaviour that's very close to the definition of generativity:

#[derive(Clone, Copy, Debug)]
struct Foo<'a>(
    PhantomData<*mut &'a ()> // makes Foo invariant over 'a
);
impl<'a> Foo<'a> {
    fn accept(&self, _ref: &'a i32) {}
}
let x = Foo(PhantomData);

for y in 0..10
{
    x.accept(&y);  // "borrowed value does not live long enough"
}

Syntactically, there's only one call to accept here; and the data it's being given are a) the memory location of y, and b) the lifetime of the reference to y. However, it's being called in a loop, and each time round the loop, the lifetime is different, despite conceptually being more like a type than like a value (given that it exists only at compile time and is zero-sized). This means that the code fails to compile, because x has type Foo<'a>, where there's no valid lifetime for 'a (for each of the ten loop iterations, it has to fit entirely within that loop iteration, and those requirements are impossible to satisfy simultaneously).

I'm not sure whether the original definition of the term "generativity" was supposed to cover this sort of behaviour (a lifetime isn't really an abstraction mechanism), but it's generally used that way in the Rust community, so let's use it here too.

Using generativity for compile-time guarantees

The mentions of "generativity" in the Rust literature are generally in the context of observing that you can use that behaviour of lifetimes (and in particular invariant lifetime generics) to, effectively, create values of a unique type. Here's what it would look like when used to implement the common information, together with an example of the compiler catching an attempt to use two different pieces of common information with a function that requires two of the same:

#[derive(Debug, Clone, Copy)]
// `*mut &'a ()` is the simplest type that's invariant over 'a
struct CommonInfo<'a, T>(T, PhantomData<*mut &'a ()>);
fn let_const<T, R>(x: T, f: impl for<'a> Fn(CommonInfo<'a, T>) -> R) -> R {
    f(CommonInfo(x, PhantomData))
}

fn assert_type_eq<T>(_x: T, _y: T) {}
let_const(1, |sg1| {
    let_const(2, |sg2| {
        assert_type_eq(sg1, sg2);  // "borrowed data escapes out of closure"
        dbg!(sg1, sg2);
    });
});

The basic idea is that let_const creates a CommonInfo object with a lifetime 'a that is different from that of all other CommonInfo objects. Doing this in Rust is actually surprisingly difficult (it tries very hard to make the lifetimes in your program work correctly, often by making multiple lifetimes the same) – I made a lot of failed attempts to create a unique lifetime. This particular implementation does it by generating the lifetime using a higher-ranked trait bound (for<'a> Fn) – f has to be a function that is able to accept common information with arbitrary lifetimes, and thus has to assume that any lifetime it might be given could potentially be different from any other specific lifetime. Unfortunately, this comes with the cost that the end of the scope has to be marked explicitly, rather than being inferred by the compiler (as is normal with let bindings). Apparently a better solution exists, but it's even more complicated and relies on the exact details of the way destructors work in Rust.

In order to make CommonInfos with different T values incomparable, the structure is made invariant over 'a, via using PhantomData to change the variance. (PhantomData increases a type's variance to match that of a given type; *mut &'a () is invariant over 'a, so PhantomData changes CommonInfo's variance to match that.) Normally, if Rust sees two different lifetimes that are supposed to be the same, it merges them into a single lifetime in a way that depends on their variance, but the invariance of CommonInfo prevents it from doing that.

The resulting technique can be used to improve either the "incomplete objects" or "complete objects" solution:

// A complete object
pub struct SparseBigInt<'a> {
    base: CommonInfo<'a, i32>,
    digits: BTreeMap<i64, i32>
}
// now this only works on two SparseBigInts with the same base
impl<'a> Add for SparseBigInt<'a> { /* ... */ }

// An incomplete object
pub struct DigitList<'a> {
    digits: BTreeMap<i64, i32>
}
impl<'a> DigitList<'a> {
    fn new(base: CommonInfo<'a, i32>) -> Self { /* ... */ }

    // this will only compile if the base is correct for both `self` and `inc`
    fn add_bigint(self, inc: Self, base: CommonInfo<'a, i32>) -> Self {
        /* ... */
    }
}

The gain is in solving the "the common information doesn't match" problem. In the case of the complete objects, it's now impossible to mix two objects that have different bases – the Add trait, by default, can only add two objects of the same type (it's possible to override this, but the above code doesn't). In the case of the incomplete objects, although the caller still has to specify the common information, the code will no longer compile if they get it wrong; given any DigitList<'a>, there is only one CommonInfo<'a, _> with the same lifetime 'a, so it must be the same one that was specified when the DigitList<'a> was created.

This generativity-based solution is, in a sense, a way to use scoped generics to implement scoped generics – Rust currently has only zero-sized scoped generics (lifetimes), but it's still possible to use those for their "these two types are incompatible" properties to create a range of values that each have a unique type, and use those as non-zero-sized scoped generics.

All this is clearly better than either the complete or incomplete object solutions in the sense of compile-time safety; you basically end up with the same thing as the original solution, but with a little extra help from the compiler to reject incorrect code. Unfortunately, there are still several problems it doesn't solve: most notably, it doesn't help with implementing traits (the complete objects still can't implement Default or From even with help from the lifetime generic, and likewise the generic doesn't help the incomplete objects to implement implement Add, Display or Drop). Additionally, the ergonomics are pretty bad: the let_const function I have above is somewhat ugly to use, and yet I couldn't find anything better; and there are lifetime parameters on the structures which mostly don't act like lifetimes normally do in Rust, which could be confusing. So, despite being safer than the alternatives, I'm not surprised that programmers typically don't bother with this in practice.

How to implement scoped generics

Generativity might not be a full solution to the scoped generics problem – which is why a new language feature is required – but it does provide something of a roadmap to implementing one. With the key insights of "lifetimes are basically zero-sized scoped generics" and "invariant lifetimes can be used to tag types to ensure the types match", it turns out to be fairly easy to implement scoped generics via extending the compiler. There are two main things that need to be implemented: let const (or whatever the name gets bikeshedded into), and the implementation of the generics themselves. Both can be implemented via getting the compiler to desugar the code primarily into things it already knows how to implement (but the desugaring can't be implemented directly by the programmer, as it can affect code in other crates).

let const is fairly simple, and works just as with the generativity solution. A scoped generic basically just desugars into a pair where one element of the pair is a variable or function/method parameter, and the other is a lifetime. Creating the variable half of things is just the pre-existing let statement; and that needs to be paired with a unique lifetime. The "create a unique lifetime, that cannot be equal to any other lifetime" part of things is hard to write in current Rust code, but should be an easy primitive to add to the compiler (it seems somewhat easier to implement than most of the things the compiler already does with lifetimes).

A scoped generic parameter on types desugars into an invariant lifetime – as with the "incomplete objects" solution, the value of the generic is not stored with the object itself (because doing so is inefficient – as it can require storing many copies of the same data – and is unsound in cases where the value is a mutable reference).

In order to compensate for this, a scoped generic parameter on a method or function desugars into two things: a) a generic lifetime parameter, as with the objects; but b) an additional function/method parameter, which is used to tell the method or function the value of the scoped generic. Here's an example of code that the programmer might write, using scoped generics:

#[derive(Default, Debug)]
pub struct SparseBigInt<const 'a BASE: i32> {
    digits: BTreeMap<i64, i32>
}

fn double<const 'a BASE: i32>(x: &SparseBigInt<BASE>) -> SparseBigInt<BASE> {
    // sugar for: add::<BASE>(x, x)
    add(x, x)
}

and the code it would desugar into:

#[derive(Default, Debug)]
pub struct SparseBigInt<'a> {
    digits: BTreeMap<i64, i32>
    // code to make the structure invariant over 'a
}

fn double<'a>(x: &SparseBigInt<BASE>, BASE: i32) -> SparseBigInt<'a> {
    add::<'a>(x, x, BASE)
}

This desugaring is applied even if the scoped generic parameter is not written directly, but implied via a type generic (when the type in turn has a scoped generic) – and trait methods are effectively "generic in the type of their receiver" for this purpose. For example:

// a standard library trait
pub trait Drop {
    fn drop(&mut self);
}
impl<const 'a BASE: i32> Drop for SparseBigInt<BASE> {
    fn drop(&mut self) {
        eprintln!("Dropping a SparseBigInt with base {}": BASE);
    }
}

desugars into:

// desugaring applied post-monomorphization, where trait methods have been
// replaced with functions that operate on one specific type that implements
// that trait
fn SparseBigInt::drop<'a>(self: &mut SparseBigInt<'a>, BASE: i32) {
    eprintln!("Dropping a SparseBigInt with base {}": BASE);
}

Implying an extra parameter on a standard library trait method, like this, is something that Rust code simply can't do at the moment. However, it's the capability that makes the whole thing work. The callers of the trait methods – in this case, everywhere that the program tries to drop a SparseBigInt value – will always have access to the correct value of BASE (because they can't manipulate a SparseBigInt<BASE> without having both the 'BASE lifetime and the value of BASE – they would either be directly available from the let const or provided as parameters) – and the desugaring will therefore just fill the parameter in.

Mutable-reference scoped generics are implemented in almost exactly the same way (the only difference is that a use of the scoped generic as a value has a reborrow inserted, &mut *REF). As long as the compiler merges together uses of the same scoped generic into a single parameter (which can always be detected because they have the same lifetime, and different scoped generics always have different lifetimes when using this implementation method), the restrictions on what is allowed while a scoped generic is mutably borrowed will automatically ensure that only one borrow of the value exists at a time (basically because as long as the first borrow exists, all the language constructs that could name or otherwise refer to the value being borrowed are banned, so there's no way to create a second borrow).

One of the reasons I feel confident that this implementation of scoped generics is the correct one is that it implements in such a neat way – it's basically a combination of implying extra parameters onto methods in order to handle visibility problems, plus a soundness analysis that is the same as the existing soundness analysis for lifetimes. Additionally, because it effectively desugars into existing safe Rust code, there is little risk of accidentally implementing a soundness hole; expressing something in safe Rust is one way to demonstrate that it's sound, and because the scoped generics mechanism can be desugared into safe Rust (with sufficient help from the compiler), that demonstrates that it is sound, too.

A little bit of magic

One final check is worthwhile. New language features often add new capabilities to the language, and those let you write programs you couldn't have written before. However, it's not uncommon to make a mistake in defining the feature, which lets you go too far and do things that weren't meant to be possible, or are even potentially unsound. So, it makes sense, when defining a new language feature, to attempt to do the impossible, and see where you get stuck.

In this case, let's look at the most fundamental thing that's meant to be impossible in safe Rust (barring the use of standard library APIs that use unsafe internally): making something that's both mutable and shareable at the same time. Generally in Rust, you have to choose between a shared reference & and mutable reference &mut, and you can't have both sorts of reference existing to the same data at the same time.

In several programs, this restriction tends to get in the way, and one commonly attempted workaround is to represent references by using indexes into a vector that gets passed around from function to function by mutable reference. When a function needs to do the equivalent of dereferencing a reference, it just accesses the element of the vector at the appropriate index. That acts like a mutable reference, in that you can change the element – but it also acts like a shared reference, in that you can store the index in multiple places (after all, it's just an integer).

This might induce a suspicion that scoped generics have gone too far in terms of power – after all, the vector here is acting very much like common information, and a mutable reference to a vector is something that is a permitted type for a scoped generic. So let's try to use a scoped generic that does this in an attempt to break the rules of Rust.

A first attempt is to create a "reference" type that's actually an index into a vector, with the vector being a scoped generic:

struct OmniReference<T, const 'a VEC: Vec<T>>(usize, PhantomData<&'a mut T>);
// Rust doesn't support #[derive(Copy)] for types with non-Copy generics
// but it does let you implement it manually:
impl<T, const 'a VEC: &mut Vec<T>> Clone for OmniReference<T, VEC> {
    fn clone(&self) -> Self { *self }
}
impl<T, const 'a VEC: &mut Vec<T>> Copy for OmniReference<T, VEC> {}

impl<T, const 'a VEC: &mut Vec<T>> OmniReference<T, VEC> {
    fn new(t: T) -> Self {
        let reference = VEC.len();
        VEC.push(t);
        OmniReference(reference, PhantomData)
    }
    fn set(self, t: T) {
        VEC[self.0] = t;
    }
    fn get(self) -> T where T: Copy {
        VEC[self.0]
    }
}

So far, we haven't broken any of the scoped generic rules (VEC.push(t) references the generic T while VEC is mutably borrowed, but this is allowed because T is mentioned in the type of VEC). There's an interesting T: Copy restriction on the get method, which happens because we need to be able to copy the T out of the vector; we can't just return a reference because when a &mut reference is used as a scoped generic, the lifetime of the scoped generic is limited to the lifetime of the method referring to it. This API also looks very familiar, in that I'm pretty sure I've seen a "reference that you can write to and copy out of, but not take references to the value inside" before. Doesn't Cell work like that?

Let's try to break things, by doing things that Cell can't do. One possible workaround to the Copy problem might be to use a callback function, that can act within the lifetime of the vector element. Or maybe we could work around the lack of Copy by cloning the value:

impl<T, const 'a VEC: &mut Vec<T>> OmniReference<T, VEC> {
    fn map(self, mapper: impl FnOnce(&mut T)) {
        (mapper)(&mut VEC[self.0])
    }
    fn get_clone(self) -> T where T: Clone {
        VEC[self.0].clone()
    }
}

One of these gets rejected by the compiler. In map, the argument position impl Trait is shorthand for a type generic, with the signature desugaring to this:

fn map<F: FnOnce(&mut T)>(self, mapper: F)

but there's an attempt to call a method on mapper while VEC is borrowed, which would require using the type generic F, and you can't use a function's unrelated type and scoped generics while it borrows a mutable-reference scoped generic (F is not mentioned in the type of Vec).

get_clone, however, is more interesting. It is calling a method on a type generic, but in this case, the type generic is mentioned within the type of the scoped generic itself, so this is permitted. Normally, cloning the contents of a Cell is disallowed, in case the implementation of Clone has a shared borrow of the same cell and attempts to write to it while it's being cloned. However, in this case, that particular failure case can't happen: if T has any scoped generics, they must have a lifetime that starts strictly before 'a did (because any scoped generics mentioned in T must have been alive at the time that the let const VEC statement was run), so none of them can contain a reference to VEC, and there's no mechanism other than scoped generics to get at the value of VEC (because it's mutably borrowed by the scoped generic). So, get_clone is sound, and can be proven to be sound, even though the corresponding operation on Cell wouldn't be.

The upshot of all this is that the resulting OmniReference is a) sound, and b) basically Cell, but with the compiler better able to understand which operations on it would be sound, because it can use the scoped generics as a guide to know what might potentially cause a double mutable reference. Being able to implement something more powerful (if only slightly more powerful) than Cell is fun, and might potentially be a useful addition to Rust's arsenal (beginners to Rust often try to use mutable structures with lots of Rc and RefCell, and being able to avoid the runtime checks of RefCell seems like it could help make that sort of program more reliable by reducing the number of opportunities for runtime panics). It is, of course, less efficient due to the extra layer of indirection.

Something much more dramatic has been managed here, though: this is an implementation of Cell entirely in safe code. That isn't something that's "intended" to be impossible, in the sense that it inherently opens up soundness holes. It is, however, something that most Rust programmers probably wouldn't expect to be possible in the language (and which, as far as I can tell, isn't possible without scoped generics or some language feature like them).

It would be nice if this could be used directly to replace Cell – it would remove a lot of complexity from the language (there are quite a lot of UnsafeCell-related special cases needed for the compiler to correctly reason about code that uses Cell), perhaps meaning that a scoped generics feature saves more than it gains in terms of complexity creep. Unfortunately, it probably isn't actually capable of doing that, both because it's less efficient, and because it would involve adding a scoped generic to every function, method and type in the entire program that uses Cells (which would be pretty ugly and annoying to deal with). Still, it might serve as a starting point to think about ways to simplify the language – I prefer it when programming languages are based on a few deep ideas that solve a wide range of issues, rather than special-case fixes like UnsafeCell that act differently from the rest of the language and need special handling.

Future directions

Here are some ideas about ways that the scoped generics mechanism could be improved in the future.

Generics in Rust can lead to a lot of syntactic noise sometimes; and scoped generics would introduce generics in places which previously wouldn't use them. Some people might not find the gain in safety to be worth the increase in syntax complexity, and as such, it's probably worth thinking about ways to make the syntax for generics nicer (especially as those would also benefit programs that don't use scoped generics).

The restrictions I've listed above on mutable-reference scoped generics are stricter than they really need to be; it would be safe to use unrelated generics as long as they can be proven to be entirely unrelated (rather than apparently unrelated but able to access the same scoped constant as another generic). That would most likely involve inventing some new sort of where clause that says "where these two generics have no scoped constants in common" – that way, maybe map would work on the scoped-generic cells too.

On another note, there's one particularly intriguing future direction for scoped generics which would open up a huge rabbit hole, and definitely shouldn't be included in an initial implementation because the implications are so far-reaching and because it would likely be very difficult to implement, but which might turn out to make safe Rust capable of a much wider range of programs and which (for someone who already understands scoped generics) would be very easy to learn and understand. Look at this hypothetical code:

// define a const generic, so SIZE is being treated as constant in this scope
let const SIZE: usize = /* some size input by the user */;
// SIZE is being treated as constant, so it can be used as an array dimension
let array = [0u32; SIZE];
// and although this array is dynamically sized, it still implements Sized
println!("Array size in bytes: {}", mem::size_of::<[u32; SIZE]>());
// let's create a variable-sized structure that still implements Sized
#[derive(Default)]
struct Stretchy<const 'a SIZE: usize> {
    a: [bool; SIZE],
    b: [f32; SIZE],
}
let stretchy = Stretchy::<SIZE>::default();

This doesn't work based on the above definition of scoped generics: an array size is a const generic, not a scoped generic (so it has to be an actual constant, not a scoped constant). Still, the whole thing seems to work in terms of soundness, and might well make code more efficient (because a slice reference is twice the size of an array reference, so if two or more variable-sized arrays are using the same scoped constant for their size, that saves memory compared to using two slice references). It would also provide easy solutions to problems like "how do you do an alloca in Rust" (meaning that you wouldn't need a separate "unsized locals" feature for that – just make them sized), and would make C code easier to port to Rust (in C, it's legal to simply just declare a variable-length array like the one above, although the relative lack of compile-time checks there make it less difficult to implement). It does seem like it might be very difficult to implement, though (with all sorts of things that were previously compile-time constants suddenly becoming able to change at runtime).

Conclusion

Rust needs scoped generics. They solve a problem which comes up very frequently – it's an issue for basically every Rust program I write – and currently has no good solutions; and they solve it in a way that's consistent with the rest of the language (with lifetimes and const generics being special cases of scoped generics). They also solve the same problem that "using generativity" is intended to solve, but in a way that's sufficiently simple and ergonomic that people might actually use it. And all those advantages come at a relatively low cost – compared to most language features that have this sort of scope and power, they should be fairly easy to implement (given how similar they are to lifetimes, which Rust already implements).