Rust/Leetcode Primer: Trees

Lately I've been brushing up on data structures and algorithms by working through problems on leetcode. I've solved about 150 or so problems using Python and now I've started to tackle them with Rust. I wanted to write down some notes about Rust-specific idioms as I come across them, as the rules about ownership and mutability can sometimes make problems that are easy in Python or C++ tricky to solve in Rust.

Trees

First, some definitions. When I mention 'trees', I'm talking about the Tree data structure. To be precise, trees are connected, acyclic graphs with N vertices and N-1 edges. We usually think about trees going in one direction. You start with a vertex (the root), and then navigate to that vertex's children. A vertex with no children is called a leaf. On leetcode there are many questions which ask about binary trees, where each vertex has 0-2 children (left and right), which is what we'll look at here. Most of the ideas are easily generalized to N-ary trees.

A typical binary tree (Source: Wikipedia, Public Domain)

There are several flavors of binary trees, like 'balanced', 'almost balanced', 'skewed', etc. that can change how you solve the problem, but the problems on leetcode will usually explain precisely what the terms mean (with examples).

Luckily (and sometimes unluckily), leetcode defines the TreeNode implementation for you in all the problems I've seen, and it goes like this:

// Definition for a binary tree node.
#[derive(Debug, PartialEq, Eq)]
pub struct TreeNode {
  pub val: i32,
  pub left: Option<Rc<RefCell<TreeNode>>>,
  pub right: Option<Rc<RefCell<TreeNode>>>,
}

impl TreeNode {
  #[inline]
  pub fn new(val: i32) -> Self {
    TreeNode {
      val,
      left: None,
      right: None
    }
   }
 }

An Option models the binary tree perfectly, as we have left and right children which could be null (None), or else contain a subtree (Some(TreeNode)). In practice, having a child be owned by its parent in the Rust sense is not very easy to work with, so it's wrapped by Rc<RefCell<TreeNode>>, which offers shared ownership of nodes via Rc and interior mutability (we can break the rules about borrowing) with RefCell.

A typical tree problem will require you to push nodes onto stacks or queues, so it just wouldn't really work without Rc<RefCell<TreeNode>> in safe Rust (it depends, see Real World Trees below).

Let's see how to work with these things.

Example - Working with the tree immutably

Let's work through an easy-level problem, Same Tree (#100). You can go read the definition and try to solve it, if you like.

For this problem, we want to check if two binary trees are 'the same', meaning they have all the same values in the same tree structure.

The key observation to solving this problem is that you should traverse both trees at the same time and compare pairs of nodes (one from tree A, and one from tree B). As long as you traverse both trees in the same way, any pair you find that doesn't match indicates the trees are not the same.

Here's my (accepted) solution using a stack (depth-first search).

use std::rc::Rc;
use std::cell::RefCell;

type MaybeNode = Option<Rc<RefCell<TreeNode>>>;

impl Solution {
  pub fn is_same_tree(p: MaybeNode, q: MaybeNode) -> bool {
      let mut stack = vec![];
      stack.push((p, q));
      while !stack.is_empty() {
          // unwrapping is safe because of our while condition
          let pair = stack.pop().unwrap();
          match pair {
              (Some(p), Some(q)) if p == q => {
                  stack.push((p.borrow().left.clone(), q.borrow().left.clone()));
                  stack
                      .push((p.borrow().right.clone(), q.borrow().right.clone()));
              }
              (None, None) => {}
              _ => {
                  return false;
              }
          }
      }
      true
  }
}

As you can see, this is similar to how you would write this in your typical imperative language. I created a type alias to avoid typing Option<Rc<RefCell<TreeNode>>> all the time. The only real ugly bit is dealing with the Rc<RefCell>>s... can we make that better?

To start off, we can borrow p and q once each.

match pair {
    (Some(p), Some(q)) if p == q => {
        let p = p.borrow();
        let q = q.borrow();
        stack.push((p.left.clone(), q.left.clone()));
        stack.push((p.right.clone(), q.right.clone()));
    },
    (None, None) => {},
    _ => { return false; }
}

That's looking better already. Do we really need to clone them?

match pair {
    (Some(p), Some(q)) if p == q => {
        let p = p.borrow();
        let q = q.borrow();
        stack.push((p.left, q.left));
        stack.push((p.right, q.right));
    },
    (None, None) => {},
    _ => { return false; }
}

error[E0507]: cannot move out of dereference of `std::cell::Ref<'_, TreeNode>`

Right... I have to borrow it somehow.

match pair {
    (Some(p), Some(q)) if p == q => {
        let p = p.borrow();
        let q = q.borrow();
        stack.push((&p.left, &q.left));
        stack.push((&p.right, &q.right));
    },
    (None, None) => {},
    _ => { return false; }
}

error[E0597]: `p` does not live long enough

No dice. Same thing if you try to as_ref() the Options. The problem is our references get dropped at the end of the match, so we have to clone to put them on the stack.

What if we traded our stack for the call stack, and used recursion?

use std::rc::Rc;
use std::cell::RefCell;

type MaybeNode = Option<Rc<RefCell<TreeNode>>>;

impl Solution {
    pub fn is_same_tree(p: MaybeNode, q: MaybeNode) -> bool {
        fn helper(p: &MaybeNode, q: &MaybeNode) -> bool {
            match (p, q) {
                (Some(p), Some(q)) if p == q => {
                    let p = p.borrow();
                    let q = q.borrow();

                    helper(&p.left, &q.left) && helper(&p.right, &q.right)
                },
                (None, None) => true,
                _ => false
            }
        }

        helper(&p, &q)
    }
}

This looks a lot nicer! We're able to dodge most of the Rc<RefCell> stuff (just one immutable borrow) because the compiler is able to reason about lifetimes more easily when we store the references on the call stack like this.

That's a bunch of words about how to work with the trees immutably. What if we want to mutate them?

Another example - Mutating the trees

Let's work through mutating trees in place in the context of another problem, Invert Binary Tree (#226).

For this problem we have to mutate the tree in place to achieve the desired result. You could build a new tree as you go, but you'd be wasting some memory. The only real trick here is to traverse it correctly and make sure you don't swap the same subtree twice.

Here's my (accepted) solution:

use std::rc::Rc;
use std::cell::RefCell;
type MaybeNode = Option<Rc<RefCell<TreeNode>>>;
impl Solution {
    pub fn invert_tree(mut root: MaybeNode) -> MaybeNode {
        fn helper(node: &mut MaybeNode) {
            if let Some(_node) = node {
                let mut _node = _node.borrow_mut();

                match (_node.left.take(), _node.right.take()) {
                    (None, None) => {}
                    (l, r) => {
                        _node.left = r;
                        _node.right = l;
                    }
                }
                helper(&mut _node.left);
                helper(&mut _node.right);
            }
        }
        helper(&mut root);
        root
    }
}

Recursion again makes for a nice, easy Rust solution. We can just borrow the RefCell mutably, then take the option values, which leaves a None in place and swap them. Bonus points to someone that can figure out how to use std::mem::swap here, the other idiomatic way to swap variables in Rust (I tried, and failed).

Tree Traversal

So far we've used helper functions and recursion to do tree traversal, which works fine for a lot of problems. In the previous section we did a preorder traversal (root, left, right) to ensure we mutate the tree correctly. In fact, I've found this template works fine for problems that require you to visit all of the nodes in a given tree:

fn traverse(root: &MaybeNode) {
    if let Some(node) = root {

        let node = node.borrow();

        // PREORDER -- do something here

        traverse(&node.left);

        // INORDER -- do something here

        traverse(&node.right);

        // POSTORDER -- do something here

        // optionally, return a value or something
    }
}

Iterative approaches should also work fine at the (minimal) cost of extra cloning.

A more idiomatic approach would probably create an IterMut object that lets us iterate over the tree nodes mutably (and can hide all the reference counting stuff). I'm still working out the details here, so more on this in the future.

Real World Trees

Does this way of modeling trees match what you might see in a real crate? It depends. The problem with Rc<RefCell<TreeNode>> is that it can panic at runtime. For small, controlled problems like what we go through here, you can usually "prove" (by thinking really hard) that your code doesn't panic because the surface area is small. For crates that create and work with big, messy trees (like parsers, for example), it's better to stay in the land of compile-time safety. I might reach for something like indextree, which does arena allocation using a single Vec as a backing store and uses indices to keep track of all the edges -- skipping RefCell altogether!

The big (and probably only) advantage of using Rc<RefCell> is that tree nodes can outlive the graph or tree that they come from. This can come in handy for leetcode problems where you want to quickly throw nodes into some other data structure to implement your algorithm. A more measured approach that might appear in real software would allocate the data structure required and ensure the lifetimes match up with the arena-allocated tree nodes so that we could freely store their references without the need for reference counting pointers.

It's worth pointing out, too, that for simple use cases you can model trees with Box<T>. Boxes are more straightforward to work with but don't allow shared ownership, and won't work for general graphs that might have cycles (which can also cause problems for Rc!).