Abstraction is a fundamental concept in software development. Identifying and building abstractions wellsuited to the problem at hand can make the difference between clear, maintainable code and a teetering, Jengalike monolith ducttaped together by a grotesque ballet of tight coupling and special case handling. While a welldesigned abstraction can shield us from detail, it can also suffer from leakage, failing to behave as expected or specified and causing problems for code built on top of it. Ensuring the reliability of our abstractions is therefore of paramount concern.
In previous blog posts, we’ve separately discussed the benefits of using type classes in Scala to model abstractions, and using randomized property testing in Scala to improve tests. In this post we discuss how to combine these ideas in order to build more reliable abstractions for use in your code. If you find these ideas interesting please be sure to check out the references at the end of this post.
Type classes for fun and profit
Type classes allow us to easily build additional behaviors around data types in a typesafe way. One simple and useful example is associated with the monoid abstraction, which represents a set of items which can be combined with one another (such as the integers, which can be combined by addition). Loosely[1], a monoid consists of

a collection of objects (e.g., integers)

a binary operation for combining these objects to yield a new object of the same type (e.g., addition)

an identity object whose combination leaves an object unchanged (e.g., the number 0)
This abstraction is captured by the scalaz trait Monoid[F]:
1
2
3
4

trait Monoid[F] { def zero: F def append(f1: F, f2: => F): F }

The utility of this machinery is that it gives us a generalized way to use types that support some notion of “addition” or “combination”, for example[2]:
1
2
3
4
5
6
7
8
9
10

def addItUp[F : Monoid](items: Seq[F]): F = { // Combine a bunch of items val m = implicitly[Monoid[F]] items.foldLeft(m.zero){case (total, next) => m.append(total,next)} } scala> addItUp(Seq("day ", "after ", "day")) res1: String = "day after day" scala> addItUp(Seq(1,2,3)) res2: Int = 6

As described in our earlier machine learning example, this can be more convenient than requiring that the data types themselves subtype or inherit from some kind of “Addable” interface.
In Scala, the Monoid[F] trait definition (combined with the compiler typechecking) buys us some important sanity checks with respect to behavior. For example, the function signature append(x: F, y: F): F guarantees that we’re never going to get a nonF result[3].
However, there are additional properties that an implementation of Monoid[F] must satisfy in order to truly conform to the conceptual definition of a monoid, but which are not easily encoded into the type system. For example, the monoid binary operation must satisfy left and right identity with respect to the “zero” element. For integers under addition the zero element is 0, and we do indeed have x + 0 = 0 + x = x for any integer x.
We can codify this requirement in something called type class law. When defining a particular type class, we can add some formal properties or invariants which we expect implementations to obey. The codification of these constraints can then be kept alongside the type class definition. Again returning to scalaz Monoid[4], we have
1
2
3
4
5
6
7
8
9
10

trait Monoid[F] extends Semigroup[F] { ... trait MonoidLaw extends SemigroupLaw { def leftIdentity(a: F)(implicit F: Equal[F]) = F.equal(a, append(zero, a)) def rightIdentity(a: F)(implicit F: Equal[F]) = F.equal(a, append(a, zero)) } ... }

An interesting observation is that this implementation depends upon another type class instance Equal[F] which simply supplies an equal() function for determining whether two instances of F are indeed equal. Of course, Equal[F] comes supplied with its own type class laws for properties any welldefined notion of equality must satisfy such as commutativity (x==y iff y==x), reflexivity (x==x), and transitivity (if a==b and b==c then a==c).
A machine learning example
We now consider an example machine learning application where we are evaluating some binary classifier (like a decision tree) over test data. We run our evaluation over different sets of data, and for each set we produce a very simple output indicating how many predictions were made, and of those, how many were correct:
1

case class Evaluation(total: Int, correct: Int)

We can implement Monoid[Evaluation] [5] in order to combine the our experimental results across multiple datasets:
1
2
3
4
5

object EvaluationMonoid extends Monoid[Evaluation] { def zero = Evaluation(0,0) def append(x: Evaluation, y: => Evaluation) = Evaluation(x.total + y.total, x.correct + y.correct) }

We’d like to ensure that our implementation satisfies the relevant type class laws. We could write a handful of unit tests against one or more handcoded examples, for example using ScalaTest:
1
2
3
4
5
6
7
8
9
10
11

"Evaluation Monoid" should { import EvaluationMonoid._ implicit val eq = Equal.equalA[Evaluation] val testEvaluation = Evaluation(3, 2) "obey Monoid typeclass Law" in { Monoid.monoidLaw.leftIdentity(testEval) should be (true) Monoid.monoidLaw.rightIdentity(testEval) should be (true) } }

However, this merely gives us an existence result. That is, there exists some value for which our the desired property holds. We’d like something a little stronger. This is where we can use ScalaCheck to do property testing, randomly generating as many arbitrary instances of Evaluation as we’d like. If the law holds for all [6] generated instances, we can have a higher degree confidence in the correctness of our implementation. To accomplish this we simply need to supply a means of generating random Evaluation instances via ScalaCheck Gen:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

val evalGen = for {total < Gen.choose(0, 1000); correct < Gen.choose(0, total)} yield Evaluation(total,correct) "Evaluation Monoid" should { import EvaluationMonoid._ implicit val eq = Equal.equalA[Evaluation] "obey Monoid typeclass Law" in { forAll (evalGen) { testEval => { Monoid.monoidLaw.leftIdentity(testEval) should be (true) Monoid.monoidLaw.rightIdentity(testEval) should be (true) }} } }

Now that’s an abstraction we can believe in!
This level of confidence becomes important when we begin to compose type class instances, mixing and matching this machinery to achieve our desired effects. Returning to our Evaluation example, we may want to evaluate different models over these datasets, storing the results for each dataset in a Map[String,Evaluation] where the keys refer to which model was used to obtain the results. In scalaz, we get the Monoid[Map[String,Evaluation]] instance “for free”, given an instance of Monoid[Evaluation]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

scala> implicit val em = EvaluationMonoid em: EvaluationMonoid.type = EvaluationMonoid$@34f5b235 scala> implicit val mm = mapMonoid[String,Evaluation] mm: scalaz.Monoid[Map[String,Evaluation]] = scalaz.std.MapInstances$$anon$4@13105b09 scala> val dataset1 = Map("modelA" > Evaluation(3,2),  "modelB" > Evaluation(4,1)) dataset1: scala.collection.immutable.Map[String,Evaluation] = Map(modelA > Evaluation(3,2), modelB > Evaluation(4,1)) scala> val dataset2 = Map("modelA" > Evaluation(5,4)) dataset2: scala.collection.immutable.Map[String,Evaluation] = Map(modelA > Evaluation(5,4)) scala> mm.append(dataset1,dataset2) res3: Map[String,Evaluation] = Map(modelA > Evaluation(8,6), modelB > Evaluation(4,1))

Conclusion and references
If you are using the scalaz library, many of the provided type classes come “batteries included” with type class laws. Even if you are not, these ideas can help you to build more reliable type class instances which can be composed and extended with confidence. See below for some additional references and readings on this subject:
Footnotes
[1] Omitting associativity and explicit discussion of closure.
[2] For brevity, these code snippets do not show library (scalaz, ScalaTest, ScalaCheck) imports.
[3] Excluding the unfortunate possibilities of null return values or thrown Exceptions.
[4] A semigroup is a more general concept than a monoid, which is modeled in scalaz by having Monoid[F] extend Semigroup[F].
[5] This implementation has a bit of a boilerplate flavor, this post describes how we could automagically derive our Monoid[Evaluation] instance.
[6] As implied by the ScalaCheck project’s appropriate logo.