Drowning in Monad Transformers

This post assumes you’re familiar with the following Scalaz concepts:

At Avention, we have a significant amount of backend code running in Akka. Most of this code runs in the following State monad:

case class PipelineState(...)

type PipelineMonad[+A] = State[PipelineState, A]

The details of PipelineState aren’t relevant here. Briefly, it allows us to easily track statistics about our backend code. These statistics are written to a database for monitoring and debugging purposes. PipelineState allows us to do things like identify key spots in the code and track how many times those spots succeed or fail. It also allows us to easily track execution times for key blocks of code. The details of PipelineState might be a good subject for another blog post, but for now the important point is that most of our backend code runs in PipelineMonad, which is a state monad using PipelineState.

For convenience, we’ve also defined an option transformer wrapped around PipelineMonad. We use this option transformer extensively throughout our code. Here’s the definition:

type PipelineMonadOT[+A] = OptionT[PipelineMonad, A]

Adding More State

What happens when part of our application needs to use more state than just what’s in PipelineState? While in practice this extra state could be some complex data structure, for simple illustrative purposes let’s just use an Int. The traditional functional approach is to define a state transformer monad that mixes our Int state in with PipelineState:

type IntStateT[M[+_], +A] = StateT[M, Int, A]

type PipelineMonadWithInt[+A] =
  StateT[PipelineMonad, Int, A]

We’ll also need to add an option transformer:

type PipelineMonadWithIntOT[+A] =
  OptionT[PipelineMonadWithInt, A]

We end up with 3 levels of nested monads. The lowest level is a state monad for PipelineState. Wrapped around that is a state transformer for adding the Int state. Wrapped around that is an option transformer. Transformers allow us to nest our monads arbitrarily deep. As we’ll see below, 3 levels is deep enough to cause a lot of confusion.

Let’s look at how we can get different types of values into our PipelineMonadWithIntOT monad. For clarity, I’m putting explicit types on all variables. You wouldn’t do this in practice.

To get a simple non-monadic value into PipelineMonadWithIntOT, you just point it into the monad:

val m1: PipelineMonadWithIntOT[String] =
  "hello".point[PipelineMonadWithIntOT]

To get an Option into PipelineMonadWithIntOT, you need to first wrap the Option in PipelineMonadWithInt using the point() method. Then you can wrap the result in a PipelineMonadWithIntOT using OptionT.optionT():

val m2: Option[String] = "hello".some

val m3: PipelineMonadWithIntOT[String] =
  OptionT.optionT(m2.point[PipelineMonadWithInt])

If you already have a value wrapped in PipelineMonadWithInt, you can wrap it in PipelineMonadWithIntOT using the liftM() method:

val m4: PipelineMonadWithInt[String] =
  "hello".point[PipelineMonadWithInt]

val m5: PipelineMonadWithIntOT[String] =
  m4.liftM[OptionT]

Finally, if you have a value wrapped in PipelineMonad, you have to go through a couple of steps. First, you have to use liftM() to wrap the PipelineMonad in PipelineMonadWithInt. Then, you have to use liftM() again to wrap that in PipelineMonadWithIntOT.

val m6: PipelineMonad[String] =
  "hello".point[PipelineMonad]

val m7: PipelineMonadWithIntOT[String] =
  m6.liftM[IntStateT].liftM[OptionT]

That covers all the types of things you would want to wrap in PipelineMonadWithIntOT. But there is a problem. Let’s try to remove the explicit typing on variable m7:

val m7 = m6.liftM[IntStateT].liftM[OptionT]

error: kinds of the type arguments (scalaz.Unapply[scalaz.Monad,IntStateT[PipelineMonad,String]]{type M[X] = IntStateT[PipelineMonad,X]; type A = String}#M,String) do not conform to the expected kinds of the type parameters (type F,type A) in class OptionT.
scalaz.Unapply[scalaz.Monad,IntStateT[PipelineMonad,String]]{type M[X] = IntStateT[PipelineMonad,X]; type A = String}#M's type parameters do not match type F's expected parameters:
type X is invariant, but type _ is declared covariant
       val m7 = m6.liftM[IntStateT].liftM[OptionT]
           ^

This code should work, but the Scala compiler gets confused and gives us an error. We’ve uncovered a bug in the way Scalaz and the compiler interact. (We’re using Scala version 2.10.4. I haven’t tested if this issue exists with later versions.) We need to give the compiler some hints to make it happy, either by putting an explicit type on m7, or by doing something like the following:

val m7 = (
  m6.liftM[IntStateT]: PipelineMonadWithInt[String]
).liftM[OptionT]

As you can see, once we start putting transformers inside transformers wrapping values becomes non-trivial. We also start pushing the compiler to its limits. We’re drowning in transformers.

Looking for a Simpler Solution

Can we find a simpler solution that doesn’t involve nested transformers?

We could throw up our hands, say to heck with functional coding, and use a mutable variable to hold our Int. Let’s look for a better solution than that.

Rather than using a state transformer to manage our Int, we could just pass the current Int value into each of our functions. Then our functions could return the next Int value. But this approach could easily cause the problems with tracking intermediate states that I talked about at the beginning of my blog post about the state monad. That is, passing our Int state in and out of functions could lead to ugly, brittle code. The state monad was created explicitly to solve these problems. It would be a shame not to be able to take advantage of it.

We could add our Int as a new field inside the PipelineState case class. That would keep our code nice and simple since all we’d need is PipelineMonad and PipelineMonadOT (the option transformer that wraps PipelineMonad). But this is hacky because it violates separation of concerns. PipelineState and PipelineMonad exist at the lowest levels of our infrastructure. They should have no knowledge about how they are used. Besides, we might have a bunch of different parts of our code that each need to add their own type of state. So, we wouldn’t just be adding an Int to PipelineState; we’d be adding 10 or 20 distinct types of state used by the different sections of our code. Yuck.

Let’s see if we can find a less hacky way to add a new field inside the PipelineState class. We could add a new data field that’s of type Any:

case class PipelineState(data: Any, ...)

We could then stick our Int state into that data field. This approach solves our separation of concerns problem; PipelineState and PipelineMonad have no idea how they’re being used or what type of extra data is stored in them. Also, different parts of our code could pack different types of state into the single data field. Unfortunately, we have to typecast data whenever we pull a value out of it. That’s a pain in the neck. We also lose compile-time type checking. Let’s switch from using Any to using a type parameter:

case class PipelineStateEx[D](data: D, ...)

Much cleaner. Note that we’ve switched the name of the class to PipelineStateEx. We’ll see why in a bit.

Defining Monads Around PipelineStateEx

Now we need to build up a state monad and an option transformer around PipelineStateEx[D]. Before we do that, let’s take a deeper look at how some of our code was defined before we added the data field to PipelineState:

object PipelineStateMgr {
  type PipelineMonad[+A] = State[PipelineState, A]

  type PipelineMonadOT[+A] = OptionT[PipelineMonad, A]

  def checkpoint[A](...): PipelineMonad[Option[A]] = ...

  def checkpointOT[A](...): PipelineMonad[A] = ...

  def currentCheckpointId(): PipelineMonad[Option[String]] = ...

  def currentCheckpointIdOT(): PipelineMonadOT[Option[String]] = ...

  def setCurrentCheckpointId(...): PipelineMonad[Unit] = ...

  def setCurrentCheckpointIdOT(...): PipelineMonadOT[Unit] = ...

  def pushContext(...): PipelineMonad[Unit] = ...

  def pushContextOT(...): PipelineMonadOT[Unit] = ...

  def popContext(): PipelineMonad[Unit] = ...

  def popContextOT(): PipelineMonadOT[Unit] = ...

  ...
}

Don’t worry about what each of the functions do. The key point is that after we define our monads, we define a bunch of helper functions that run in our monads. The rest of our code generally doesn’t access the PipelineState object directly; instead, we access PipelineState through these helper functions. For convenience, we have two versions of each helper: one version that runs in PipelineMonad and one that runs in PipelineMonadOT. Under the covers, the PipelineMonadOT versions just call the corresponding PipelineMonad versions and wrap the results in an option transformer.

Let’s try to modify PipelineStateMgr to use PipelineStateEx[D] with the new data field:

// THE FOLLOWING DOES NOT WORK!
object PipelineStateMgr {
  type PipelineMonad[+A, D] = State[PipelineState[D], A]

  type PipelineMonadOT[+A, D] =
    OptionT[PipelineMonad[A, D], A]

  ...
}

We’ve got a problem, namely, PipelineMonad[+A, D] is not a monad. A monad must take exactly one type parameter. But PipelineMonad has two: A and D. We run into problems when we try to use this non-monad with OptionT. OptionT expects its first type parameter to be a monad. Since PipelineMonad is not a monad, we can’t use it with OptionT. Put another way, the new D parameter is screwing everything up.

We can fix this problem by changing PipelineStateMgr to be a trait that takes D as a type parameter:

trait ManagesPipelineState[D] {
  type PipelineState = PipelineStateEx[D]

  type PipelineMonad[+A] = State[PipelineState, A]

  type PipelineMonadOT[+A] = OptionT[PipelineMonad, A]

  def checkpoint[A](...): PipelineMonad[Option[A]] = ...

  def checkpointOT[A](...): PipelineMonad[A] = ...

  def currentCheckpointId(): PipelineMonad[Option[String]] = ...

  def currentCheckpointIdOT(): PipelineMonadOT[Option[String]] = ...

  def setCurrentCheckpointId(...): PipelineMonad[Unit] = ...

  def setCurrentCheckpointIdOT(...): PipelineMonadOT[Unit] = ...

  def pushContext(...): PipelineMonad[Unit] = ...

  def pushContextOT(...): PipelineMonadOT[Unit] = ...

  def popContext(): PipelineMonad[Unit] = ...

  def popContextOT(): PipelineMonadOT[Unit] = ...

  ...
}

As a convenience, we’ve added a type PipelineState which is just a parameterless version of PipelineStateEx[D]. Once that type is defined, the rest of the trait is identical to our original PipelineStateMgr object. Note that PipelineMonad and PipelineMonadOT now take only a single type parameter each. So they are now valid monads, and everything works. The trick to making things work was to move the type parameter from the definitions of PipelineMonad and PipelineMonadOT up to the trait.

With this trait in place, we can define the following pipeline state manager for code that needs to add an Int to the state:

object IntPipelineStateMgr extends ManagesPipelineState[Int]

We could just as easily create a pipeline state manager for some other data type:

case class Foo(...)
object FooPipelineStateMgr extends ManagesPipelineState[Foo]

We could even create a pipeline state manager for code that doesn’t need any extra state:

object UnitPipelineStateMgr
  extends ManagesPipelineState[Unit]

Let’s see how we can use IntPipelineStateMgr to access or change the current Int when we’re running inside PipelineMonad:

import IntPipelineStateMgr._

...

for {
  ...

  squared <- gets { state: PipelineState =>
    val currentNum = state.data
    currentNum * currentNum
  }

  ...

  _ <- modify { state: PipelineState =>
    state.copy(data = state.data + 1)
  }

  ...
} yield ...

This is just standard state monad stuff. The gets() call squares the current Int stored in the state without modifying the state. The modify() replaces the current state by incrementing our Int by one.

If we want to use PipelineMonadOT, we need to call liftM[OptionT] on the results of gets() and modify(). Unfortunately, due to a type inferencing bug/limitation in the compiler, we have to provide a bit of type information to make things work:

  squared <- {
    gets { state: PipelineState =>
      val currentNum = state.data
      currentNum * currentNum
    }: PipelineMonad[Int]
  }.liftM[OptionT]

Because that’s a bit ugly, let’s add a wrapOT() method to our ManagesPipelineState[D] trait:

trait ManagesPipelineState[D] {
  type PipelineState = PipelineStateEx[D]

  type PipelineMonad[+A] = State[PipelineState, A]

  type PipelineMonadOT[+A] = OptionT[PipelineMonad, A]

  def wrapOT[A](m: PipelineMonad[A]): PipelineMonadOT[A] =
    m.liftM[OptionT]

  ...
}

The wrapOT() method just wraps a PipelineMonad in a PipelineMonadOT. Our code for accessing the current Int stored in the state wrapped up in a PipelineMonadOT now simplifies to

  squared <- wrapOT {
    gets { state: PipelineState =>
      val currentNum = state.data
      currentNum * currentNum
    }
  }

We’ve now found a solution that lets us add arbitrary data to PipelineState so that we don’t have to use state transformers at all. Our wrapOT() helper method helps us get past some deficiencies in the compiler and helps keep our code cleaner.

Reexamining State Transformers

Let’s take a second look at state transformer monads. Our main objection was that wrapping values gets ugly, especially given the compiler issues. But we could write wrapper methods similar to wrapOT() to hide all the ugly wrapping. The only issue then is that we’ve got a lot of boilerplate code. Every time we want to add a different type of state on top of PipelineMonad, we’ve got to add the following:

  • A type equivalent to IntStateT
  • A type equivalent to PipelineMonadWithInt
  • A type equivalent to PipelineMonadWithIntOT
  • A handful of helper wrap methods

That’s a bit of a pain. Instead of doing all that boilerplate code, we can build a trait that takes care of all that for us. The following trait assumes the original version of PipelineMonad, that is, the one without the extra data field in PipelineState.

trait ExtendsPipelineState[D] {
  type ExtStateT[M[+_], +A] = StateT[M, D, A]

  type ExtPipelineMonad[+A] = StateT[PipelineMonad, D, A]

  type ExtPipelineMonadOT[+A] = OptionT[ExtPipelineMonad, A]

  def wrapPipelineMonad[A]
    (m: PipelineMonad[A]): ExtPipelineMonad[A] =
    m.liftM[ExtStateT]

  def wrapExtStateMonad[A]
    (m: State[D, A]): ExtPipelineMonad[A] =
    m.lift[PipelineMonad]

  def wrapToOT[A](a: A): ExtPipelineMonadOT[A] =
    a.point[ExtPipelineMonadOT]

  def wrapOptionToOT[A]
    (m: Option[A]): ExtPipelineMonadOT[A] =
    OptionT.optionT(m.point[ExtPipelineMonad])

  def wrapExtPipelineMonadToOT[A]
    (m: ExtPipelineMonad[A]): ExtPipelineMonadOT[A] =
    m.liftM[OptionT]

  def wrapPipelineMonadToOT[A]
    (m: PipelineMonad[A]): ExtPipelineMonadOT[A] =
    wrapExtPipelineMonadToOT(wrapPipelineMonad(m))

  def wrapExtStateMonadToOT[A]
    (m: State[D, A]): ExtPipelineMonadOT[A] =
    wrapExtPipelineMonadToOT(wrapExtStateMonad(m))
}

Building state transformers and option transformers on top of PipelineMonad is now easy:

object IntPipelineStateExtender extends ExtendsPipelineState[Int]
import IntPipelineStateExtender._

val m1: ExtPipelineMonadOT[String] =
  wrapToOT("hello")

val m2: ExtPipelineMonadOT[String] =
  wrapOptionToOT("hello".some)

val m3: ExtPipelineMonadOT[String] =
  wrapExtPipelineMonadToOT("hello".point[ExtPipelineMonad])

val m4: ExtPipelineMonadOT[String] =
  wrapPipelineMonadToOT("hello".point[PipelineMonad])

val m5: ExtPipelineMonadOT[Int] =
  wrapExtStateMonadToOT { gets { n: Int => n * n } }

The boilerplate is gone, and we’ve got nice clean wrappers. If we want to extend with a type other than Int, we just have to do this:

case class Foo(...)
object FooPipelineStateExtender extends ExtendsPipelineState[Foo]
import FooPipelineStateExtender._

Super easy. Using state transformers to add the state doesn’t seem so bad now.

Summary

When you have code already running in a state monad, adding extra state can be tricky. You can add a transformer around the base state, but the wrapping code gets ugly quickly. Deficiencies in the Scala compiler make things even worse.

We examined two approaches to solving this problem. For the first approach, we added an arbitrary data field to the underlying state monad. We moved our state monad definition from an object to trait ManagesPipelineState[D], where D is the type of the extra state. Then, we created objects that implement the trait such as IntPipelineStateMgr and FooPipelineStateMg. These objects allow us to easily and cleanly work with Int and Foo data directly in the base state monad. There’s no need to resort to using state transformers with this approach.

For our second approach, we created trait ExtendsPipelineState[D] to make state transformers easier to use. This trait encapsulates all the boilerplate type definitions and wrapper methods needed to build a state transformer on top of PipelineMonad. Using ExtendsPipelineState[D], it’s easy to layer state transformers and option transformers on top of PipelineMonad. Just create objects like IntPipelineStateExtender and FooPipelineStateExtender that implement the trait. With those in place, wrapping different types into the monads is easy. Plus you don’t have to do any boilerplate type definitions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: