Getting A Handle On Things

Today we will take a very simple intrusive linked list written in Rust and make it safe. Kind of, anyway.

Before we start making something safe we need an unsafe thing to make safe. Let’s not pretend that what we are doing here is the least bit useful, let us instead do it just for the fun of it. (What we are doing actually is useful, the explanation of which this margin is too narrow to contain.)

Any programmer will have come across linked lists at some point. Linked lists are great because elements can be inserted at any position very quickly, and similarly we can remove elements at any position with very little cost. Intrusive lists are linked lists as well, but unlike the normal linked lists the Rust stdlib provides an intrusive list stores its links in the elements it contains rather than outside of them. (Often this kind of thing is done to avoid memory allocations, or for exposition. Linked lists depicted in computer science textbooks for example are always intrusive.)

The Gathering

If we want to make a safe intrusive linked list from scratch, we must first invent the universean unsafe linked list. To do so we need a few requirements to guide our implementation. Let’s collect a few at random.

  1. An Element stores a u32 value.
  2. An Element may be added only to the front of the list.
  3. An Element may be removed regardless of its position in the list, discarding the stored value.
  4. An Element may be removed from the beginning of the list, returning the stored value.

To make this happen we need two structures: one, the Element, for the elements of the list, and another, the List, for the list itself. We also need three operations: push to add an element, pop to remove an element and return its value, and unlink to remove an element.

To make push possible our List will need a link to the first element of the list, and our Element will need a link to its successor. Unfortunately we cannot use plain references because removing an element from the list would then be impossible–as long as the element is in the list it is borrowed and cannot be mutated. We also cannot use Cells with references inside because the List would borrow any Element for longer than the element itself lives:

struct Element<'a> {
    value: u32,
    next: Option<&'a Element<'a>>
}

struct List<'a> {
    head: Option<&'a Element<'a>>,
}

fn main() {
    let mut list = List(None);
    let elem = Element(None);
    list.0 = Some(&element); // boom
}

We really want to be able to add elements to a list though rather than replacing the list with an entirely new version (as we would do in functional languages). References are out, then. The only thing we have left now are pointers, which are unsafe. So at least we have the unsafety nailed down already, even though we don’t have any operations yet. The pointers will have to be mutable because adding and removing elements mutates their successors in the list.

1
2
3
4
5
6
7
8
struct Element {
    value: u32,
    next: *mut Element,
}

struct List {
    head: *mut Element,
}

We can now implement the push and pop operations, but we can’t implement unlink yet. To implement unlink with this setup we need to know which collection the Element is part of, search for the element, and then remove it from its position using the next and head links. This is rather inefficient, so we would like to do better. We can do better if every element knows not only the next element in the list, but also knows which link leads to itself from the previous element or the List instance, whichever it might be.

1
2
3
4
5
6
7
8
9
struct Element {
    value: u32,
    next: *mut Element,
    ptr_to_self: *mut *mut Element,
}

struct List {
    head: *mut Element,
}

With this setup we can now implement all three operations. All three are rather simple, moving only a bit of data around between pointers. We have no idea how long any element of the list will live, or even the list itself will live, thus all our operations must be unsafe. Nevertheless we can apply some codpiece engineering (“If you can’t make it safe, at least make it so that nobody gets hurt”) and not attempt to unlink elements that are not in the list (or not in the list anymore).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
use std::ptr;

struct Element {
    value: u32,
    next: *mut Element,
    ptr_to_self: *mut *mut Element,
}

impl Element {
    pub fn new(value: u32) -> Self {
        Element{
            value,
            next: ptr::null_mut(),
            ptr_to_self: ptr::null_mut(),
        }
    }

    pub unsafe fn unlink(&mut self) {
        if !self.ptr_to_self.is_null() {
            (*self.ptr_to_self) = self.next;
            if !self.next.is_null() {
                (*self.next).ptr_to_self = self.ptr_to_self;
            }
        }

        self.ptr_to_self = ptr::null_mut();
    }
}

struct List {
    head: *mut Element,
}

impl List {
    pub fn new() -> Self {
        List{ head: ptr::null_mut() }
    }

    pub unsafe fn push(&mut self, e: &mut Element) {
        if !self.head.is_null() {
            (*self.head).ptr_to_self = &mut e.next;
        }
        e.next = self.head;
        e.ptr_to_self = &mut self.head;
        self.head = e;
    }

    pub unsafe fn pop(&mut self) -> Option<u32> {
        if self.head.is_null() {
            None
        } else {
            let head = self.head;
            (*head).unlink();
            Some((*head).value)
        }
    }
}

fn main() {
    let mut l = List::new();
    let mut e1 = Element::new(1);

    unsafe {
        l.push(&mut e1);
        e1.unlink();
        assert!(l.pop().is_none());
    }
}

This looks reasonably good. It might be unsafe, but we’re prepared to deal with that. There’s just one problem: unlink violates the reference model. Nothing prevents an application using this data structure from keeping a mutable reference to an element around and popping the element from the list while the reference is held. Popping the element will then create another mutable reference to the element, but having two mutable references to the same object is not allowed. Replacing the *mut _ fields with Cell<*const _> would not solve that problem: we might no longer have two mutable references, but we would still have two references of which one is mutable. Which is also not allowed.

Nothing short of disallowing mutable references to elements in lists can solve this particular problem. As long as an element is a member of a list there must be no mutable references to this element. None at all. Not even within the implementation of the list, since otherwise unlink would not be allowed to have a reference to the element.

But before we tackle that let’s replace the pointers with Cells. The result will still be incorrect, but it will probably be a step closer to a solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
use std::cell::Cell;
use std::ptr;

struct Element {
    value: u32,
    next: Cell<*const Element>,
    ptr_to_self: Cell<*const Cell<*const Element>>,
}

impl Element {
    pub fn new(value: u32) -> Self {
        Element{
            value,
            next: Cell::new(ptr::null_mut()),
            ptr_to_self: Cell::new(ptr::null_mut()),
        }
    }

    pub unsafe fn unlink(&self) {
        if self.ptr_to_self.get().is_null() {
            return;
        }

        (*self.ptr_to_self.get()).set(self.next.get());
        if !self.next.get().is_null() {
            (*self.next.get()).ptr_to_self.set(self.ptr_to_self.get());
        }

        self.ptr_to_self.set(ptr::null_mut());
    }
}

struct List {
    head: Cell<*const Element>,
}

impl List {
    pub fn new() -> Self {
        List{ head: Cell::new(ptr::null_mut()) }
    }

    pub unsafe fn push(&mut self, e: &mut Element) {
        if !self.head.get().is_null() {
            (*self.head.get()).ptr_to_self.set(&e.next);
        }
        e.next.set(self.head.get());
        e.ptr_to_self.set(&self.head);
        self.head.set(e);
    }

    pub unsafe fn pop(&mut self) -> Option<u32> {
        if self.head.get().is_null() {
            None
        } else {
            let head = self.head.get();
            (*head).unlink();
            Some((*head).value)
        }
    }
}

fn main() {
    let mut l = List::new();
    let mut e1 = Element::new(1);

    unsafe {
        l.push(&mut e1);
        e1.unlink();
        assert!(l.pop().is_none());
    }
}

The original problem remains unsolved, but its severity has been reduced slightly. How can we know that no part of the transformation we just applied could have solved the problem?

Contracts and capabilities

We know because we have not changed the signature of push. Adding an item to the list required a mutable reference before the transformation, and after the transformation it still does. Both before and after the transformation we can take a mutable reference on an element, and call push with it, at any time. But adding an element to a list not only changes the element, it also allows the list to change the element later on.

Since our implementation does not allocate memory we cannot take ownership of elements during push. Likewise we want to add elements to the list without having to put them into a Box first because that too would be an allocation. If we could transfer ownership all of our safety problems would solve themselves, but ownership we cannot transfer. Luckily we don’t have to transfer ownership, we only have to exclude all the same operations transferring ownership would exclude.

Let us examine push once again. push receives a mutable reference. Anyone with a mutable reference to an element can also add that element to a list. Extending the lifetime of the reference excludes a second push to the list for the extended lifetime. Once the extended lifetime has elapsed the capability to push is once again available, thus to be safe the element must no longer be in a list when the extended lifetime ends.

To summarize: we must extend the lifetime of the reference given to push and remove the element before the lifetime ends. Removing the element can be done by a Drop implementation on some struct we have yet to invent. This leaves only lifetime extension to be figured out.

To extend the lifetime from the function call to something larger we must bind it to something. Anything within the function body will of course not live long enough, so the something must live in the caller of push. Using a variable in the caller and passing it to push as well will not suffice though: the lifetime must not only enter the function, it must leave it as well. If the lifetime enters the function and never leaves, it can be tightened to encompass only the function invocation itself. Only if it leaves and is bound to a name live after the function is called can the lifetime be extended.

Since now our goal is both to extend the lifetime and to remove the element before the lifetime ends we need both the lifetime and a struct to hold it. To remove the element we also need a reference to the element. Putting these three things together gives us a struct that owns the capability to modify an elements’ list membership: a handle to the element in the list. Adding an element to the list returns a handle to the element, dropping the handle removes the element and allows a new set of operations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
use std::cell::Cell;
use std::ptr;

struct Element {
    value: u32,
    next: Cell<*const Element>,
    ptr_to_self: Cell<*const Cell<*const Element>>,
}

impl Element {
    pub fn new(value: u32) -> Self {
        Element{
            value,
            next: Cell::new(ptr::null_mut()),
            ptr_to_self: Cell::new(ptr::null_mut()),
        }
    }

    unsafe fn unlink(&self) {
        if self.ptr_to_self.get().is_null() {
            return;
        }

        (*self.ptr_to_self.get()).set(self.next.get());
        if !self.next.get().is_null() {
            (*self.next.get()).ptr_to_self.set(self.ptr_to_self.get());
        }

        self.ptr_to_self.set(ptr::null_mut());
    }
}

struct Handle<'a>(&'a Element);

impl<'a> Drop for Handle<'a> {
    fn drop(&mut self) {
        unsafe {
            self.0.unlink();
        }
    }
}

struct List {
    head: Cell<*const Element>,
}

impl List {
    pub fn new() -> Self {
        List{ head: Cell::new(ptr::null_mut()) }
    }

    pub unsafe fn push<'a>(&mut self, e: &'a mut Element) -> Handle<'a> {
        if !self.head.get().is_null() {
            (*self.head.get()).ptr_to_self.set(&e.next);
        }
        e.next.set(self.head.get());
        e.ptr_to_self.set(&self.head);
        self.head.set(e);
        Handle(e)
    }

    pub fn pop(&mut self) -> Option<u32> {
        if self.head.get().is_null() {
            None
        } else {
            let head = self.head.get();
            unsafe {
                (*head).unlink();
                Some((*head).value)
            }
        }
    }
}

fn main() {
    let mut l = List::new();
    let mut e1 = Element::new(1);

    unsafe {
        l.push(&mut e1);
    }
    assert!(l.pop().is_none());
}

Note how pop became safe. Since we can only call pop on a list that exists and the Handles guarantee that no list can point to elements that no longer exist we can derive the guarantee that no pop operation will ever attempt to remove a dead element.

On the other side of the API push is not safe yet. While the handles protect list from dying (or moved) elements, nothing protect elements from dying (or moved) lists. As the implementation stands we may drop a list before all of its elements have been removed, leaving behind dangling pointers for elements to find when they are dropped.

There are two ways to solve this: either the list removes all elements when it is dropped, or all handles must have been dropped before the list is dropped. Clearing the list during drop is straightforward, so let us investigate the other variant instead. To protect the list from dying elements we must ensure that the list outlives all handles–after a handle has been dropped we no longer care about the element itself. In push we already know the lifetime of the handle. All we have to do is constrain the lifetime of the list to be just as long or longer, then the list will always be empty when dropped.

-    pub unsafe fn push<'a>(&mut self, e: &'a mut Element) -> Handle<'a> {
+    pub unsafe fn push<'a>(&'a mut self, e: &'a mut Element) -> Handle<'a> {

This change come with an unfortunate side effect: the list can only contain one element because it too, like the element, will be immutable as long as the handle lives. Having turned the head link into a Cell though we do not need the mutable reference for push and pop, we simply drop the mut and the list works again.

Our list is safe! … as long as no Handle is leaked. Once handles are leaked (eg with mem::forget) all lifetime guarantees are invalidated, allowing elements to be dropped while they are still part of a list. Removing an item from the list when item itself is dropped is equally unhelpful since the item itself may be leaked just as well as its handle. The Rust standard library goes through great pains to avoid breaking any invariants when handles are dropped, even though it does not guarantee your program will continue to make forward progress afterwards (eg when a MutexGuard instance is leaked the lock will never be available again).

Whether we can provide more safety with our implementation depends greatly on the constraints under which we intend to use our list. If the list must outlive its elements there is no way we can make it intrisically safe by design of the list interface alone: handles may be leaked and never remove their item from the list, items placed on the stack may be forgotten by mem::forget and become invalid. Macros can be used to solve this, encapsulating the unsafe interface into a safe exterior–as long as we do not rely on drop to do our cleanup but explictly write the cleanup out (maybe automatically). Under different constraints, for example when the list need not outlive its elements, other interfaces with other restrictions and guarantees are possible.

Epilogue

Resources handles, the main technique we used to make our list as safe as possible, are a very powerful tool. They cannot solve every problem that requires some sort of cleanup, but for many problems they can go long way towards a good solution. Most resource handles used in daily life bind the lifecycle of operating system objects (such as network sockets or files) or synchronization scopes to control flow in the program, neither of which break semantics of the program when they are leaked–in worst case scenarios the program may stop working, but it should never crash. Using resource handles to bind lifetimes of references to control flow is a lot uncommon, but not unheard of. BTreeMap::entry and its relatives in the other maps are a good example of this in the standard library.

Once we leave the cozy confines of userspace programming on large (or large-ish) machines we see patterns like this become more and more common. Many resource restrictions we can statically reason about, such as “no allocations for list items” here, can also be phrased as a resource applications. (Not all of them can, of course, take for example the restriction “will not exceed 1GB memory usage”. Static reasoning about such restrictions is often possible for real-world application, but we very soon leave compilers behind and work mostly in deeply formalized logic and proof assistants.) The more severe our restriction is, the less likely it becomes that resource handles alone can solve the problem of enforcing those restrictions.

All of these handles follow the same pattern though: let the compiler and the libraries do stuff for you whenever it is possible (and safe) to do so. Some assembly may be required, some warnings might be issued. But, if possible, embrace your new RAII overlords. They mean well.

Acknowledgements

User Diggsey on /r/rust poked me to clarify that handles are unsafe when leaked.