Abstraction is a fundamental concept in software development. Identifying and building abstractions well-suited to the problem at hand can make the difference between clear, maintainable code and a teetering, Jenga-like monolith duct-taped together by a grotesque ballet of tight coupling and special case handling. While a well-designed abstraction can shield us from detail, it can also suffer from leakage, failing to behave as expected or specified and causing problems for code built on top of it. Ensuring the reliability of our abstractions is therefore of paramount concern.
In previous blog posts, we’ve separately discussed the benefits of using type classes in Scala to model abstractions, and using randomized property testing in Scala to improve tests. In this post we discuss how to combine these ideas in order to build more reliable abstractions for use in your code. If you find these ideas interesting please be sure to check out the references at the end of this post.
Type classes for fun and profit
Type classes allow us to easily build additional behaviors around data types in a type-safe way. One simple and useful example is associated with the monoid abstraction, which represents a set of items which can be combined with one another (such as the integers, which can be combined by addition). Loosely, a monoid consists of
a collection of objects (e.g., integers)
a binary operation for combining these objects to yield a new object of the same type (e.g., addition)
an identity object whose combination leaves an object unchanged (e.g., the number 0)
The utility of this machinery is that it gives us a generalized way to use types that support some notion of “addition” or “combination”, for example:
As described in our earlier machine learning example, this can be more convenient than requiring that the data types themselves subtype or inherit from some kind of “Addable” interface.
In Scala, the Monoid[F] trait definition (combined with the compiler type-checking) buys us some important sanity checks with respect to behavior. For example, the function signature append(x: F, y: F): F guarantees that we’re never going to get a non-F result.
However, there are additional properties that an implementation of Monoid[F] must satisfy in order to truly conform to the conceptual definition of a monoid, but which are not easily encoded into the type system. For example, the monoid binary operation must satisfy left and right identity with respect to the “zero” element. For integers under addition the zero element is 0, and we do indeed have x + 0 = 0 + x = x for any integer x.
We can codify this requirement in something called type class law. When defining a particular type class, we can add some formal properties or invariants which we expect implementations to obey. The codification of these constraints can then be kept alongside the type class definition. Again returning to scalaz Monoid, we have
An interesting observation is that this implementation depends upon another type class instance Equal[F] which simply supplies an equal() function for determining whether two instances of F are indeed equal. Of course, Equal[F] comes supplied with its own type class laws for properties any well-defined notion of equality must satisfy such as commutativity (x==y iff y==x), reflexivity (x==x), and transitivity (if a==b and b==c then a==c).
A machine learning example
We now consider an example machine learning application where we are evaluating some binary classifier (like a decision tree) over test data. We run our evaluation over different sets of data, and for each set we produce a very simple output indicating how many predictions were made, and of those, how many were correct:
We can implement Monoid[Evaluation]  in order to combine the our experimental results across multiple datasets:
We’d like to ensure that our implementation satisfies the relevant type class laws. We could write a handful of unit tests against one or more hand-coded examples, for example using ScalaTest:
However, this merely gives us an existence result. That is, there exists some value for which our the desired property holds. We’d like something a little stronger. This is where we can use ScalaCheck to do property testing, randomly generating as many arbitrary instances of Evaluation as we’d like. If the law holds for all  generated instances, we can have a higher degree confidence in the correctness of our implementation. To accomplish this we simply need to supply a means of generating random Evaluation instances via ScalaCheck Gen:
Now that’s an abstraction we can believe in!
This level of confidence becomes important when we begin to compose type class instances, mixing and matching this machinery to achieve our desired effects. Returning to our Evaluation example, we may want to evaluate different models over these datasets, storing the results for each dataset in a Map[String,Evaluation] where the keys refer to which model was used to obtain the results. In scalaz, we get the Monoid[Map[String,Evaluation]] instance “for free”, given an instance of Monoid[Evaluation]:
Conclusion and references
If you are using the scalaz library, many of the provided type classes come “batteries included” with type class laws. Even if you are not, these ideas can help you to build more reliable type class instances which can be composed and extended with confidence. See below for some additional references and readings on this subject:
- Law Enforcement using Discipline
- Haskell’s Type Classes: We Can Do Better
 A semigroup is a more general concept than a monoid, which is modeled in scalaz by having Monoid[F] extend Semigroup[F].
 This implementation has a bit of a boilerplate flavor, this post describes how we could automagically derive our Monoid[Evaluation] instance.
 As implied by the ScalaCheck project’s appropriate logo.