Scala Tuples and Case Classes
Over the years of writing Scala code, my relationship with tuples and case classes has evolved. As a beginner, and coming from a Java background, I had found both to be a little weird, but now after 4–5 years and tens of thousands of lines of code, I’ve started to understand and appreciate their usefulness. In fact, I’ve found that in unexpected ways they can lead to much cleaner and flexible code.
Let’s start with a quick review.
Exploring Tuples
A tuple is used when you want to represent something in a simple multi-part data structure. For example, maybe you’re dealing with some complex numbers or maybe some geographical latitude/longitude coordinates.
val complex: List[(Double, Double)] =
List( (1.1, -2.3), (3.2, 0.0) ) // (real, imaginary)
val denverGeo = (39.74, -104.99) // (lat, long)
A tuple’s elements can have different types. For example, if I wanted to make a quick list of people’s names paired with their ages, I could represent those by a tuple of type (String, Int).
From a beginner’s standpoint, the notation for a tuple is sort of weird. If you want the first element of tuple t, you use the notation t._1
(we don’t use zero indexing!) Also, there are funny limitations like a tuple can have a maximum of 22 elements. Often times, tuples are very short-lived structures, such as the name/age pairs in this example:
val names = List("Jill", "Doug", "Xavier", "Rosemary")
val ages = List(32, 43, 12, 6)
val sentences = (names zip ages).map(t =>
s"${t._1} is ${t._2} years old, or ${t._2 * 7} in dog years!")
Note here that the tuple t gets created by the zip command. (That is, zipping two collections results in a collection of tuples, where the first element comes from the first collection and the second element comes from the second collection, etc.)
They are also useful if you want to write a function that returns two elements instead of one. Here’s an example where we want to take a list and return some quick statistics: the sum and the count of the elements.
Notice that the stats function is returning three things: two doubles and an integer. (And using these components you can calculate the mean, variance and standard deviation as we show.)
Also note the last line, where we can take a function that returns a tuple, and use it to populate three variables at the same time. This is a classic example of leveraging Scala to make concise code.
Exploring Case Classes
Let’s take the earlier example of storing a geolocation (latitude/longitude) value as a single entity. Coming from Java-land, we would say this is an ideal use case for a class, and we may do something like this:
Boy, what a pain! In Scala, if we write this using a normal class and a “case class”, our code might look like this:
The differences between the regular class Geo2 and the case class Geo3 are:
- When instantiating the case class, you don’t need the
new
keyword. - In the regular class, if we didn’t put the
val
prefix in front of the parameters in the constructor, those values would not be accessible from the outside. (In other words, parameters in the constructor can be used to instantiate some internal state, including public accessors, but they don’t exist after the object is created. If you addval
orvar
in front of the parameters, then they also become members of the actual class.) - For the case class, the parameters are automatically publicly accessible as (immutable) values.
By the way, if you read up on case classes, you’ll find out that they can have up to a maximum of 22 parameters. (Notice something familiar about that number?)
Tuples and Case Classes have some limitations
Okay, before we talk about the cool magic that you get from these things, let’s talk about the limitations of Tuples and Case Classes.
Case classes can participate in some limited forms of inheritance. Essentially, a case class can inherit from another regular class or a trait, but you can’t inherit from case class.
With tuples, the limitations are even more obvious: there’s really no inheritance that you can do, and obviously the parameters can’t have names: you’re stuck with that t._1, t._2
notation.
Tuples and Case Classes have Magic Qualities
A problem with old Java classes (and Scala classes) is that you don’t automatically have a concept of equality unless you override a .hashCode method. Take this simple example:
The magic of case classes is that Scala automatically creates a bunch of functionality, like an automatic hashCode
hashCode generation and automatic serialization. The same applies to tuples. (That is, the individual parameters of a case class or tuple have to have equality defined and/or have an inherent serialization defined.)
Another cool thing that tuples and case classes have is the ability to be disassembled in pattern matching case classes. Here’s a (somewhat silly) example with case classes:
By the way, I’ve added some inheritance to the case classes by having Dog and Cat both inherit the Pet trait. So inheritance is allowed, but mostly to declare groupings of classes. (By making the Pet trait sealed, the compiler will issue a warning if your case clauses aren’t exhaustive, e.g. you forget the Cat.)
But the important thing to notice is how we’re able to deconstruct the Dog so that we define the temp value n to extract the name. As you start writing Scala code with more pattern matching, this will become more useful.
Let’s think of these things as “Records”
Okay, if I were a beginner reading all of this, I would shrug my shoulders and really ask why this is worth occupying any space in my brain. After all, I can still do a lot of coding with the old style classes. Why should I fuss with tuples and case classes? And when should I be using them?
If you can think of a thing as a sort of “record” that is comprised of a bunch of other things (which could themselves be comprised of a bunch of things), then it’s time to use a tuple or a case class.
In modern programming, we’ve seen the power and flexibility that can come from something as simple as JSON. I can express information about myself and my two dogs like this:
{ "name" : "Murray",
"age" : 50,
"pets" : [ { "name" : "Nutmeg", "type": "Dog" },
{ "name" : "Rosemary", "type" : "Dog" } ] }
This could automatically be converted into tuple form if I throw away the parameter names: ("Murray", 50, List(("Nutmeg","Dog"),("Rosemary","Dog"))
and because each of these components is either a primitive or a collection of primitives, this tuple can be compared for equality or serialized or pattern matched.
Case classes and tuples don’t have to be limited
Don’t think that just because something is a case class, you can’t write some complex logic and expose useful methods in its body. You just want to remember that object needs to be uniquely defined by its parameters, so things like equality and serialization/deserialization can just work.
You could think of these as classes where you are NOT putting any special logic in the constructor. (Heck, you’re actually not writing a constructor at all!) And even then, you can put some constructor-like logic in the companion object’s apply method.
Naming and adding functionality to Tuples
By the way, there are a number of times that I just use simple tuples, but give them names so my code reads more cleanly. Consider the earlier example of representing a geographical coordinate as a tuple of two Doubles. (latitude and longitude)
I could use a simple type statement to give the nickname “Geo” to such a tuple. And then I could even add some methods to these tuples with an implicit class.
The first item here—using simple type names to put aliases to basic tuples or collections—isn’t a hard concept to understand. And if you’re working constantly working with something like lists of coordinates, these type aliases can encapsulate some complexity: like type ClusterCoordinates = Vector[Option[(Double,Double)]]
where otherwise wrapping it in a case class just for a single-parameter object would actually become laborious.
The second thing I’m showing above—tacking logic onto a type with an implicit class—is some cool Scala-ninja stuff. I couldn’t resist the urge to show off this neat Scala feature.
A closing anecdote
The reason I decided to write this article (my first Medium article) came after I had written a numeric library that originally worked with some clever trickery both in Scala and in Spark.
Being in a hurry, I started developing and testing new functionality in the Scala environment and had stopped building Spark test cases. The code compiled, so I hoped I would be lucky when trying the Spark side of things.
And that’s where I got bitten. I had complex objects and lambdas and foldLeft/aggregate operations, and it all worked in Scala when everything was running within the same self-contained environment, but in Spark, you’re often not just running normal code: you’re assembling plans and stages that are going to get sent out to multiple executors, and it all looks like simple magic, but there’s a lot of behind-the-scenes transmission of data and logic going on.
Where I built my code and building blocks out of tuples and case classes, things just worked. But when I started making some weird classes and inheritance relationships, everything ground to a halt. The long-and-short of this is that I have several days of manually rewriting my code ahead of me.
Lesson learned!
Addendum
A reader pointed out a misstatement that case classes were limited to 22 fields. That used to be the case before Scala 2.11 was released, and I was clearly remembering something that I must have read in a older Scala programming book or tutorial somewhere. And even that caveat has a caveat, because case classes with over 22 fields have some limitations like not having an auto-generated unapply method. And perhaps more relevant is the fact that Scala 3 has reworked things so that the “22 limitation” of functions, tuples and case classes will just go away.
If there’s an important point, it’s how case classes and tuples are similar as really useful building blocks.
One more thing I noticed since first writing this article is that Java version 14 has introduced a concept of Records, which are designed for simple, immutable structures—which look remarkably like Scala’s tuples and case classes. Hmmm!