Type class derivation with Shapeless: An introduction
You must be shapeless, formless, like water.
At Basement Crowd we make use of Shapeless for a couple of problems – the most common use is for the automatic derivation of JSON de/serialisers for our APIs (spray-json-shapeless/circe), but it also comes into play for test data generation (scalacheck shapeless) and its polymorphic functions.
This article will attempt to give an introduction to some aspects of Shapeless and how it can help with automatic type class derivation (a feature that is at the heart of how circe and scalacheck can automatically handle case classes). To discuss this, we will consider the problem of generating test data for unit tests, already solved for us with scalacheck, but this gives us an understandable problem context to discuss the solution.
Generating test data with type-classes
Let’s say we have some case classes in our code, and we need to generate a bunch of random instances of them so we can use them in unit tests. We could approach this by defining a type-class for our generator and have an implementation of the generator for each of our case classes. For example:
As you can see, we define our parameterised trait Generator[A] with a single generate method, and then in our companion object we have a helpful entrypoint method “generate” that will look for implicit type-class implementations available in-scope. We have also defined our Capybara implementation of the type-class, this means if we call Generator.generate[Capybara] we get a randomly generated Capybara instance!
The type-class approach provides a pretty good pattern for generating test data, which is all good and well, but it’s pretty verbose and seems like it will be a lot of boilerplate once we add all our case classes.
An obvious and easy first step to reduce the boilerplate is to pull out the simple type generation into their own Generator implementations, so they can be common, and can be re-used across case classes. For example:
Now we have a generator defined for our simple types, we can easily define a generator for our case classes Capybara and Dog, but thats still boilerplate that is going to grow linearly with the number of case classes we have.
To avoid this increase in boiler plate, we would ideally have some clever method that lets us pass in any case class, and as long as we have implicit Generators for all the member types in scope, it generates our class, without the need for the explicit boilerplate linking them up.
A case class represents a product (in the functional programming, algebraic data type sense), that is, case class Capybara(name: String, age: Int, awesome: Boolean) is the product of name AND age AND awesome – that is, every time you have an instance of that case class you will always have an instance of name AND age AND awesome. Of course, as well as being a product of its members, a case class also carries semantic meaning in the type itself – that is, for this instance as well as having these three attributes we also know an additional piece of information that this is a Capybara. This is of course super helpful, and central to the idea of a type system, but maybe sometimes (as in this case), we want to be able to just generically operate on that case class without being tied to the specific type – and that’s what Shapeless provides – It provides a structure called a HList (a Heterogeneous List) that allows you to define a type as a list of different types, but in a compile time checked way.
Rather that a standard List in Scala, where we define the type that the list will contain, with a HList we can define the list as a sequence of specific types, for example:
The above example allows us to define out HList with specific types, which provides our compile time safety – the :: notation is some syntactic sugar provided by Shapeless to mirror normal Scala list behaviour, and HNil is the type for an empty HList.
Hopefully, it might already be coming clear as to how this can help – if we can convert any case class into a common generic format that represents the member types, then it could be useful for our boilerplate.
Fortunately, Shapeless provides a class called Generic[A]. Shapeless generates instances of this class through compile time macros for all case classes (and sealed traits), so with the addition of the correct import you can bring into scope a Generic[A] for any arbitrary case class you have, for example:
From there, we can use this class to convert any instance of our case class to a HList and back again:
Our generic type-class
Now we have our Generator[A] implementations for our simple types, and thanks to Shapeless we have Generic implementations to convert any case class into a HList of member types, so how can we put that together?
The easiest thing is probably to start with the code and talk through what is happening:
To start, we are going to need a Generator instance that will apply to our (or any) case class. So what’s going on here?
There are two type parameters T and L – T represents the type of the case class and L is defined as a HList.
In the implicit argument to the function you will see we are looking for the existence of an implicit Generic that can translate type T to type L. As we have already established, Shapeless provides Generic instances for all case classes (type T) and L just gives us a handle on the resulting HList type that we need in the next implicit argument, Generator[L] (L is a path dependent type on Generic[T], so we use the AUX pattern here so we can reference L – you can read more about the Aux pattern here!). All the generate method then does is use the implicit Generator[L] to handle the actual generation of a newly generated HList, and then uses the Generic instance to translate the HList back to type T.
So now we have type-class implementations for our simple types (String, Int, Boolean) and we have a type-class implementation for any case class (as long as we have Shapeless to provide the Generic instances) – however, as we mentioned above, all the case class type-class does is translate the case class to a HList and cross its fingers that there is a Generator[A] in scope that can handle HLists!
As you might have guessed, the final piece of the puzzle is to add in type-class instances to support HLists:
These two implementations are hopefully simpler to understand – the first one handles the case of an empty HList (a HNIL) so that just returns HNil.
The second one looks a little more complex, but in reality it’s just an instance of the Generator type-class for a non-empty HList, this is captured by two type parameters (H & T – representing the head of the list and the tail of the list, which is also a HList). The method, through its implicit arguments ensures that we have access to Generator instances for both the head and the tail types and just delegates to those instances to handle the generation.
Now we have a Generator that can accept any case class and convert it to a common format (HList), a Generator that can handle HList and once we implement all the Generators for simple types (String, Int, Boolean, UUID, etc) then we are setup to be able to generate data for any case class – goodbye boilerplate!
Before joining, Rob spent 4 years working as a senior engineer at an investment management start-up in London, building a cutting edge trading platform and market place for money managers. Prior to that, Rob spent 6 years working as a technical consultant at Accenture, working on a range of technology-driven, client-facing projects based across Europe. Rob holds a BSc in Artificial Intelligence and Computer Science from the University of Birmingham.