Ferrous Oxide For Jaguars And Incremented Crocodiles

Caveat lector: the primary purpose of the article is to introduce a reader proficient in one of the popular object-oriented languages how not to program in Rust. While each feature of the language will be briefly introduced where it is used, no great efforts will be made to explain the feature in detail. Links to the Rust book should provide that.

That being said, let’s jump into the pit.

Let’s assume, for the sake of argument, that we want to write a Rust library that will add numbers. The library should not just add numbers, like an ordinary example library would, with some add function it provides as its only interface. We want something better. We want something that will add any number of numbers, in any combination: we want to evaluate expressions. And not just constant expressions either, we want to evaluate expressions with constants and variables.

“Aha!” our observant mind says, “I will need an abstract base class with an evaluate function and I’m done!” We then notice that Rust has no abstract base classes. It has no classes at all. It doesn’t even have abstract, really. What it does have though are types and traits. Types and traits, used together, are not quite enough to implement our idea. We will also need trait objects, which behave a lot like interfaces.

“Okay, so I’ll do an interface-based design”, we say. Since we want to add constants and variables, in any combination, we will need variables, constants, and combinations. Variables will be one struct that contains the variable (somehow), constants will be a struct that contain the constant, and combinations will be another struct that contains any of the three. Easy, right?

First steps

After some quiet hacking, we might have come up with something like this.

trait Expression {
  fn eval(&self) -> i32;
}

struct Sum(Expression, Expression);

impl Expression for Sum {
  fn eval(&self) -> i32 {
    self.0.eval() + self.1.eval()
  }
}

struct Constant(i32);

impl Expression for Constant {
  fn eval(&self) -> i32 { self.0 }
}

fn main() {
  let expression = Sum(
    Constant(1),
    Sum(Constant(2), Constant(3)));

  println!("value is {}", expression.eval());
}

That looks good so far, much like we’d do it in any other language. Unfortunately it doesn’t work. The compiler complains in line 5, about the first member of the struct: does not have a constant size known at compile-time. Remember that trait objects are similar to interfaces. Interface have to point somewhere, they are not objects on their own, because we cannot know what kind of data their implementation may contain. We only know which methods it has. Rust is no different here: trait objects are effectively pointers. And what we’ve written down so far treats them like they were objects, which they are not. So we’ll have to put the trait objects onto the heap, by putting them into boxes.

trait Expression {
  fn eval(&self) -> i32;
}

struct Sum(Box<Expression>, Box<Expression>);

impl Expression for Sum {
  fn eval(&self) -> i32 {
    self.0.eval() + self.1.eval()
  }
}

struct Constant(i32);

impl Expression for Constant {
  fn eval(&self) -> i32 { self.0 }
}

fn main() {
  let expression = Sum(
    Box::new(Constant(1)),
    Box::new(Sum(Box::new(Constant(2)), Box::new(Constant(3)))));

  println!("value is {}", expression.eval());
}

This version works, but it still does not have variables. In any other language we’d implement variables much like constants are implemented here: we create a few variables and insert them into the expression wherever they are needed by passing a reference. We keep the original reference around to modify the variable struct, and all is well.

But rust won’t let us do that so easily. We could keep the object on the stack and pass it as a reference, if we were to implement Expression for the reference to our variable. But we can’t modify the object anymore, because we put a reference to it into the box. We could also put a reference to a rust variable into our Variable and create multiple Variables, just like constants. But we can’t modify the rust variable anymore, because we put a reference to it into the Variable. Those are just the rules.

Clearly we need to get around this reference business somehow and write the variable anyway. Luckily for us, rust provides a few types that let us mutate a variable through a simple reference, Cell for example. We might want to put a reference to that Cell into our variable type somehow. But no matter how we write it down, the compiler complains about missing lifetimes or things having to be 'static, whatever that means. We then search the internet, and the internet says “use Rc”. So we’ll put the Cell onto the heap too, with an Rc! Feels just like home, just so much more verbose.

After a lot more hacking (and maybe cursing) we arrive at something that works.

It’s alive!

use std::cell::Cell;
use std::rc::Rc;

trait Expression {
  fn eval(&self) -> i32;
}

struct Sum(Box<Expression>, Box<Expression>);

impl Expression for Sum {
  fn eval(&self) -> i32 {
    self.0.eval() + self.1.eval()
  }
}

struct Constant(i32);

impl Expression for Constant {
  fn eval(&self) -> i32 { self.0 }
}

struct Variable(Rc<Cell<i32>>);

impl Expression for Variable {
  fn eval(&self) -> i32 { self.0.get() }
}

fn main() {
  let v = Rc::new(Cell::new(0));

  let expression = Sum(
    Box::new(Variable(v.clone())),
    Box::new(Sum(Box::new(Constant(2)), Box::new(Constant(3)))));

  println!("v=0, value is {}", expression.eval());

  v.set(4);
  println!("v=4, value is {}", expression.eval());
}

Wow, finally it works! But it’s so … verbose. And long. And has way too many wrappers for things nobody else would ever wrap. Clearly, rust is not a good language to do even the most simples of things.

Or is it? Let me explain.

So far we have built our program like a programmer coming from Java or C++ might build it, but in doing so we’ve built a rather bad rust program. It contains interfaces because abstract bases are not a thing, it mutates state that is actually shared between multiple users, and it is very much afraid of the lifetime system.

Let’s clear those up one by one.

References and lifetimes

References and lifetimes are probably the most tricky part of rust for any novice. We won’t go into detail on their inner workings here, because the rust book has a fairly complete description of ownership, references and borrows, and lifetimes. What we will do though is explain the transformation we do, and why it is safe.

First of all, what do we want to transform at all? Let’s start with the Rc. We introduced it originally because the compiler complained about lifetimes, so that’s probably a good place to start. Rust complains about references whenever it cannot prove that the reference does not outlive the thing being referenced. For this the compiler uses two pieces of information: the scope of the thing that is being referenced (including lifetimes, where applicable), and the lifetime of the reference itself. In our example we will have to add a lifetime to Variable struct if we want to store a reference in it. Let’s just start there.

use std::cell::Cell;

trait Expression {
  fn eval(&self) -> i32;
}

struct Sum(Box<Expression>, Box<Expression>);

impl Expression for Sum {
  fn eval(&self) -> i32 {
    self.0.eval() + self.1.eval()
  }
}

struct Constant(i32);

impl Expression for Constant {
  fn eval(&self) -> i32 { self.0 }
}

struct Variable<'a>(&'a Cell<i32>);

impl<'a> Expression for Variable<'a> {
  fn eval(&self) -> i32 { self.0.get() }
}

fn main() {
  let v = Cell::new(0);

  let expression = Sum(
    Box::new(Variable(&v)),
    Box::new(Sum(Box::new(Constant(2)), Box::new(Constant(3)))));

  println!("v=0, value is {}", expression.eval());

  v.set(4);
  println!("v=4, value is {}", expression.eval());
}

That, unfortunately, does not work at all. The compiler still complains about v not living long enough: v has to be valid for the static lifetime. That’s because we never specified how long any references within the trait object Expression must be valid. We haven’t specified anything, so rust assumed the only safe thing: 'static, which means “forever”.

To specify this kind of bound rust provides us with some syntax: writing Expression + 'lifetime means that any reference contained in the object backing the Expression must be valid for 'lifetime. We cannot name any lifetime that is not 'static though, so we will have to introduce a lifetime parameter on Sum as well and let the compiler infer appropriate lifetimes from the scopes of the values used. This is actually an important thing to remember: in rust, lifetimes and scopes can often be treated as being the same thing.

use std::cell::Cell;

trait Expression {
  fn eval(&self) -> i32;
}

struct Sum<'a>(Box<Expression + 'a>, Box<Expression + 'a>);

impl<'a> Expression for Sum<'a> {
  fn eval(&self) -> i32 {
    self.0.eval() + self.1.eval()
  }
}

struct Constant(i32);

impl Expression for Constant {
  fn eval(&self) -> i32 { self.0 }
}

struct Variable<'a>(&'a Cell<i32>);

impl<'a> Expression for Variable<'a> {
  fn eval(&self) -> i32 { self.0.get() }
}

fn main() {
  let v = Cell::new(0);

  let expression = Sum(
    Box::new(Variable(&v)),
    Box::new(Sum(Box::new(Constant(2)), Box::new(Constant(3)))));

  println!("v=0, value is {}", expression.eval());

  v.set(4);
  println!("v=4, value is {}", expression.eval());
}

On “interfaces”

We created the Expression trait because we wanted something like an abstract base class, which rust can’t provide, so we went for the next closest thing instead. Using traits in such a way is often not a good idea, and often it is in fact impossible. Rust instead encourages the use of its powerful generics system. These generics, like C++ templates and Java generics, are checked at compile time. Unlike Java generics, there is no runtime cost associated with rust generics. Unlike C++ templates, the compiler gives very good error messages if a generic type is instantiated in a way the type does not expect.

Say, for example, we want to write a global function evaluate that can evaluate all kinds of expressions. Using trait objects we can write this down succinctly;

1
fn evaluate(e: &Expression) -> i32 { /* ... */ }

This does not capture our full intent though. We do not want to evaluate an expression trait object, we want to evaluate anything that is an expression. This is actually a type bound: we are prepared to evaluate an object of any type the user wants to give us, as long as that type has an implementation for Expression. Making the parameter e generic and at the same time bounding it to be an Expression gives us the best of both worlds: we have full type safety, good error messages when we mess up somewhere down the line, and we do not introduce runtime overhead because the type of e is known at compile time.

fn evaluate<E: Expression>(e: &E) -> i32 { /* ... */ }

But in our case, we don’t even need generics. In fact, we could not use them if we wanted to because Sum is inherently recursive: any Sum may contain any number of more Sums, so we cannot know the full type of our final expression at compile time—unless we give up our ability to dynamically create the sums, which would make them quite pointless to have. Luckily we know all possible parts of any summation beforehand: it’s either a Constant, a Variable or a Sum. That looks suspiciously like an enum

So let’s rewrite our program and use an enum instead of structs, using match to destructure the enum instead of dynamic dispatch.

use std::cell::Cell;

enum Sum<'a> {
  Sum(Box<Sum<'a>>, Box<Sum<'a>>),
  Constant(i32),
  Variable(&'a Cell<i32>),
}

impl<'a> Sum<'a> {
  pub fn eval(&self) -> i32 {
    match *self {
      Sum::Sum(ref l, ref r) => l.eval() + r.eval(),
      Sum::Constant(c) => c,
      Sum::Variable(v) => v.get(),
    }
  }
}

fn main() {
  let v = Cell::new(0);

  let expression = Sum::Sum(
    Box::new(Sum::Variable(&v)),
    Box::new(
      Sum::Sum(
        Box::new(Sum::Constant(2)),
        Box::new(Sum::Constant(3)))));

  println!("v=0, value is {}", expression.eval());

  v.set(4);
  println!("v=4, value is {}", expression.eval());
}

Note that we still need to put the left- and right-hand sides of an addition into boxes, otherwise we’d end up with same dilemma we had with generics: the type would be impossible to write down. Also note that when evaluating an addition we match the l and r parts of the addition by ref. If we didn’t do that rust would assume that we want to take ownership of the two boxes, attempt to move them, and fail because we don’t own the entire sum.

This looks a lot nicer already. But we still have the Cell to take care of.

Mutating shared state

Using Cell is generally frowned upon.

Mutating shared state, which Cell provides, is often a sign of a design wanting to be improved. Other languages rely heavily on mutation of shared state and the premise of “the programmer knows what they are doing”. Which totally makes sense, for those languages. Rust, however, was designed to be memory safe. No rust program should ever act on memory in such a way that afterwards memory is somehow bad. This could be caused by any number of things: writing to a pointer that no longer points to valid memory, two threads writing to the same variable simultaneously, one thread reading what another is writing, et cetera.

What we have in our program is one thread reading what another is writing, or rather a variant thereof. We cannot let Variable carry a reference to its value and still mutate the value afterwards because rust enforces, in the compiler, a critical difference between constant and mutable references: if a mutable reference to a variable exists, no user of an immutable reference can know that anything read from the reference is valid. At all.

Without this rule bad things could happen, for example when referencing “large” objects:

1
2
3
4
5
6
7
8
fn main() {
  let b = Rc::new(0i32);

  let br = &b;
  put_into_shared_object(br);

  b = Rc::new(1i32);
}

We have placed a reference to the object b into a shared object. Another thread may read this reference from the shared object, and subsequently attempt to access the referenced object. If this access happens while main is writing to b—the referenced object—the other thread might read some part of the object before it was changed, and some parts after. In the case of Rc this could mean that the other thread reads a pointer to an object that no longer exists, which violates memory safety.

But how does this relate to the expressions we wanted to evaluate?

Rust only enforces these rules on its own references. There are many kinds of references, and references to memory are only one kind. Instead of these memory references we could also use an integer index into an array, and provide the array to eval on every call. This index will not change over time, is always linked to (or references) the same item in a given array, and our expression does not hold a reference to the array. As a consequence we are free to modify the array between calls to eval, and we can thus drop the Cells for variables. Since we no longer have any references in our Sum type we can also get rid of the lifetime 'a.

enum Sum {
  Sum(Box<Sum>, Box<Sum>),
  Constant(i32),
  Variable(usize),
}

impl Sum {
  pub fn eval(&self, variables: &[i32]) -> i32 {
    match *self {
      Sum::Sum(ref l, ref r) => l.eval(variables) + r.eval(variables),
      Sum::Constant(c) => c,
      Sum::Variable(v) => variables[v],
    }
  }
}

fn main() {
  let mut variables = [0];

  let expression = Sum::Sum(
    Box::new(Sum::Variable(0)),
    Box::new(
      Sum::Sum(
        Box::new(Sum::Constant(2)),
        Box::new(Sum::Constant(3)))));

  println!("v=0, value is {}", expression.eval(&variables));

  variables[0] = 4;
  println!("v=4, value is {}", expression.eval(&variables));
}

Now it looks quite neat, doesn’t it?