Java


I noticed some strange behavior in some Scala code recently. It was rather a mystery. I looked for my error and googled for a solution for the longest time with no success. Eventually I got my answer from the Scala mailing list / Nabble forum. Here’s the class that was causing the trouble.

class ArrayWrapper[A](length: Int) {
  private val array = new Array[A](length)
  def apply(x: Int) = array(x)
  def update(x: Int, value: A) = array(x) = value
  override def toString(): String = array.toString
}

The first thing you’ll notice about this class is that it is extremely simple! There aren’t a lot of moving parts. It’s a simple wrapper that exposes 3 basic array behaviors: apply (a ‘getter’), update (a ‘putter’), and good ol’ toString. Arrays in Scala take a type parameter, and to ensure that this class could wrap an array of any type I used a type parameter, too. Have a good look at the class and make sure you understand how it works. It won’t take long.

How do you expect this class to behave? Let’s play a little fill-in-the-blanks. Here is a Scala interpreter session with some results blanked out.

scala> class ArrayWrapper[A](length: Int) {
    |   private val array = new Array[A](length)
    |   def apply(x: Int) = array(x)
    |   def update(x: Int, value: A) = array(x) = value
    |   override def toString(): String = array.toString
    | }
defined class ArrayWrapper

scala> val a = new ArrayWrapper[Int](5)
??????????????

scala> val x = a(0)
??????????????

scala> x.toString
??????????????

scala> a(0).toString
??????????????

scala> a(0) = 0

scala> a.toString
??????????????

scala> a(0).toString
??????????????

scala>

There are 6 blanks. What do you expect to see in each of those? Well, the first blank follows the creation of a new ArrayWrapper[Int] and its assignment to a val ‘a’. So, according to our overriding definition of toString, it is simply the result of the underlying Array’s toString method. I know from experience how a brand new Array of Ints looks. It looks like this:

scala> new Array[Int](5)
res1: Array[Int] = Array(0, 0, 0, 0, 0)

So that’s what I expect to see here. Anyone expect something different? Here’s what I actually saw in the first blank:

scala> val a = new ArrayWrapper[Int](5)
a: ArrayWrapper[Int] = Array(null, null, null, null, null)

Hmm. That’s not what I expected. Did you predict this? Why is this array full of nulls when a new Array[Int] is usually full of zeros? I was stumped. The array is parameterized, I reasoned, so maybe type erasure was involved. That doesn’t make sense, though. No types should be erased at this point.

Let’s look at the next few lines, 12-16. I called a(0) (the apply method) and assigned the result to x. I then called the toString method on x. What do you expect in these two lines? I would have expected a(0) to return 0 and x.toString to return “0”, but my conviction is shaken by that last result. Will a(0) return null? Will x.toString throw a NullPointerException? Decide what you predict will happen. Here’s the actual result:

scala> val x = a(0)
x: Int = 0

scala> x.toString
res0: java.lang.String = 0

Each line behaves in the “correct” way even though we saw all those nulls in the underlying array. That’s good news, I suppose. Maybe the problem is limited to Array’s toString method. It should be smooth sailing now. Let’s now look at line 18, in which we call a(0).toString. It’s just combining the operations (apply and toString) from the previous two lines without storing the intermediate result in ‘x’. I expected that to return String “0”. You can probably guess by now that what I expected is not what I got. Make your own prediction before you read the actual result below. What will happen when we call a(0).toString?

scala> a(0).toString
java.lang.NullPointerException
       at .<init>(<console>:7)
       at .<clinit>(<console>)
       at RequestResult$.<init>(<console>:3)
       at RequestResult$.<clinit>(<console>)
       at RequestResult$result(<console>)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.i...

Ouch! NullPointerException! This is an unpleasant surprise. The call a(0) returned a zero earlier, and calling toString on that zero returned a String “0”. But now we get this disaster. I’m getting more and more confused. Do you have an explanation for this crazy behavior yet?

Moving along, in line 21 we assign 0 to a(0). Remember that a(0) returned 0 earlier. By the way, behind the scenes the line “a(0) = 0” doesn’t call the apply method, but the ‘update’ method. It succeeds. In lines 23 and 26 we call a.toString and a(0).toString. What will happen in each case? At this point, it’s anybody’s guess. The behavior has been so wacky I can’t even make a sensible prediction. Make a guess of your own, if you dare, and observe the actual result below:

scala> a(0) = 0

scala> a.toString
res3: String = Array(0, null, null, null, null)

scala> a(0).toString
res4: java.lang.String = 0

The underlying Array now appears to contain a zero in addition to the nulls. Also, the a(0).toString, which was throwing a NullPointerException earlier, is now succeeding.

As I say, I puzzled over this problem for some time. I wanted to blame the issue on type erasure in the parameterized type, but that explanation didn’t make sense. I posted a question to the Scala forum on Nabble and got a response back in short order from Daniel Sobral.

The culprit? Drumroll…

Boxing. Well, boxing, unboxing, and a peculiarity of parameterized types. To review, here is our ArrayWrapper class:

class ArrayWrapper[A](length: Int) {
  private val array = new Array[A](length)
  def apply(x: Int) = array(x)
  def update(x: Int, value: A) = array(x) = value
  override def toString(): String = array.toString
}

We declared ‘array’ to be an Array[A], which is to say an Array of who-knows-what. When the Array is defined in this way, with a type parameter of unknown type, the Array must be an array of object references! It cannot be an array of Java int primitives. That’s the peculiarity of parameterized types. That’s why the default values for the members of the array were null instead of 0. The underlying array is actually an array of java.lang.Integer objects.

When we ran ‘val x = a(0)’, Scala retrieved the value at index 0 which was null. The apply method has Int return type in our example, and null is not an legal value of an Int. Int is Scala’s version of the Java int primitive type. So the null was converted (unboxed) to Int value 0. Then it could be stored in val x, etc. Once it’s safely unboxed, it behaves like a normal Scala Int value.

So, why did a(0).toString not work? Shouldn’t the null returned from a(0) be unboxed to Int 0, then re-boxed for the toString call? Apparently it doesn’t work that way. The unboxing hasn’t happened at the time the toString call is executed, so that toString is called on the null, giving us the NullPointerException. I don’t know whether this behavior is imposed by the JVM or the Scala language. Either way, it seems to me like a violation of the Principle of Least Astonishment and an opportunity for improvement.

Once we call a(0) = 0, then the underlying array is populated with a boxed version of 0, which is to say an instance of java.lang.Integer. After it’s populated with a non-null it works normally.

Again, this only happens for Arrays with parameterized types. If we make ArrayWrapper non-parameterized and declare ‘array’ as an Array[Int] then the problem goes away.

scala> class ArrayWrapper(length: Int) {
     |   private val array = new Array[Int](length)
     |   def apply(x: Int) = array(x)
     |   def update(x: Int, value: Int) = array(x) = value
     |   override def toString(): String = array.toString
     | }
defined class ArrayWrapper

scala> val a = new ArrayWrapper(5)
a: ArrayWrapper = Array(0, 0, 0, 0, 0)

scala> a(0).toString
res0: java.lang.String = 0

There are a few lessons in all this for the Scala developer:

  • Be vigilant about Array initialization. Initialize them explicitly, especially when dealing with primitives like Int, Long, Float, Double, Byte and Char. Don’t trust the default values.
  • Beware parameterized Arrays. They are flawed. Consider specifying their type or using another collection instead, such as a List or Map which can’t contain un-initialized values.
  • Unit test all your code, even those parts that look too simple to screw up.

Variance in Java

If you’re a Java developer then you probably know a thing or two about subtyping. B is a subtype of A if B extends or implements A (I’ll use this convention throughout this post). A is the supertype, B is the subtype. But what about arrays? Or generic collections? Is an array of B a subtype of an array of A? Is List<B> a subtype of List<A>? Let’s do some experiments:

Object testObj = null;

String[] arrayB = { "a", "b", "c" };
Object[] arrayA = arrayB;
testObj = arrayA[0];

List<String> listB = new ArrayList<String>();
listB.add("123");
List<Object> listA = (List)listB;
testObj = listA.get(0);

List<Double> listC = new ArrayList<Double>();
listC.add(new Double(10.0));
listB = (List)listC;
System.out.println(listB.get(0));

I’ve left out the boilerplate for brevity. This code compiles and it runs fine, mostly. This tells us three things. First, Java treats an array of B as a subtype of an array of A. We know this because we can use a reference to an array of Objects to refer to an array of Strings. This means an array of Strings “is-a” array of Objects.

Second, we know that an ArrayList<String> is a subtype of a List<String>. This makes sense, because ArrayList implements interface List. An ArrayList<String> “is-a” List<String>. We can even finagle the List<String> into a List<Object>, but we have to make an explicit cast.

Third, we see a ClassCastException during the call to listA.get(0). We use the explicit cast again to assign a List<Double> to a reference to List<String>. The compiler allows this! This kind of sloppy typing is one of the main complaints of Java generics’ detractors (this link includes an example of an erroneous assignment that doesn’t even require an explicit cast!). Now, let’s look at some of the things Java doesn’t allow.

Object[] arrayA = { new Object() };
String[] arrayB = arrayA;

List<String> listB = new ArrayList<String>();
List<Object> listA = listB;

This causes two compiler errors. The first occurs when we try to assign an array of Object to a reference to an array of String. This is sensible. An array of Objects could contain any object: Strings, Doubles, Threads, anything. We can’t treat such an array as an array of Strings.

The second error occurs when we try to assign a list of Strings to a reference to a List of Objects. This is the same code as in the previous snippet, but without the explicit cast. The compiler doesn’t allow this. Since lists are mutable (we can add and remove items) we could add non-String members to listA, which means those non-String members would also be in listB. But if we don’t add any items to listA, the assignment is perfectly safe. Java can’t figure out in all cases when it’s safe to do an assignment and when it is not.

Why do Java generics work the way they do? I think it’s mainly due to two factors: backward compatibility, promoting adoption of the feature. When generics were introduced, there were already millions of lines of code out there that depend on regular, non-generic, mutable collections. To make the new code compatible with legacy code, the type parameters are erased during compilation, and allowing those dangerous casts lets developers work around the fact that List<B> is not a subtype of List<A>. To make it easier to use generics in new code and convert to non-generics for integration with old code, the compiler rules were made fairly permissive.

Variance Terminology

If you’re not familiar with the term “variance”, here’s what it means with respect to Java. Java arrays are covariant. That means that an array of B is a subtype of an array of A, provided that B is a subtype of A. The type-subtype relationship of the arrays follows the relationship of the contents. Lists in Java are invariant (some say nonvariant). A List of Strings has no relationship to a List of Objects. You can explicitly cast a List of Strings to a List of Objects but you can force the conversion the opposite direction, too, (Yuck.) so that doesn’t count. Now consider some hypothetical generic class X such that X<A> is a subtype of X<B> if B is a subtype of A. That’s the opposite of the way arrays work. The hypothetical generic X is contravariant.

One more detail of terminology: A type class could theoretically be covariant with respect to one type parameter, and contravariant with respect to another (not in Java, just theoretically). So, say you have a generic class X that is covariant with respect to its first type parameter and contravariant with respect to its second. So X<B,I> is a subtype of X<A,J> only if B is a subtype of A and J is a subtype of I. Weird, huh? This actually happens in Scala.

Variance in Scala

In Scala, variance is not left to chance. There are very strict rules. Variance with respect to type parameters is spelled out explicitly for each class (or trait). The same conventions are used for Arrays, Lists, or any generic class! The variance system (indeed, the whole type system) in Scala is more complicated and has a steeper learning curve than in Java, but it affords you the ability to write very expressive code that behaves in a more intuitive fashion. Here are three generic classes that use each of the variance types.

class InVar[T]     { override def toString = "InVar" }
class CoVar[+T]     { override def toString = "CoVar" }
class ContraVar[-T] { override def toString = "ContraVar" }
/************ Regular Assignment ************/
val test1: InVar[String] = new InVar[String]
val test2: CoVar[String] = new CoVar[String]
val test3: ContraVar[String] = new ContraVar[String]

The ‘+’ denotes covariance with respect to the type parameter, and ‘-’ denotes contravariance. The class is invariant with respect to type parameters without a plus or minus. If you run this code you can see that when the type parameters are the same on both sides, the assignments work fine. Now, let’s see what happens when we test assignment for different type parameters.

scala> /************ Invariant Subtyping ************/

scala> val test1: InVar[String] = new InVar[AnyRef]
<console>:5: error: type mismatch;
 found   : InVar[AnyRef]
 required: InVar[String]
       val test1: InVar[String] = new InVar[AnyRef]
                                   ^

scala> val test2: InVar[AnyRef] = new InVar[String]
<console>:5: error: type mismatch;
 found   : InVar[String]
 required: InVar[AnyRef]
       val test2: InVar[AnyRef] = new InVar[String]
                                   ^

scala> /************ Covariant Subtyping ************/

scala> val test3: CoVar[String] = new CoVar[AnyRef]
<console>:5: error: type mismatch;
 found   : CoVar[AnyRef]
 required: CoVar[String]
       val test3: CoVar[String] = new CoVar[AnyRef]
                                  ^

scala> val test4: CoVar[AnyRef] = new CoVar[String]
test4: CoVar[AnyRef] = CoVar

scala> /************ Contravariant Subtyping ************/

scala> val test5: ContraVar[String] = new ContraVar[AnyRef]
test5: ContraVar[String] = ContraVar

scala> val test6: ContraVar[AnyRef] = new ContraVar[String]
<console>:5: error: type mismatch;
 found   : ContraVar[String]
 required: ContraVar[AnyRef]
       val test6: ContraVar[AnyRef] = new ContraVar[String]
                                      ^

Now you can see the difference in the three classes. The invariant class doesn’t allow assignment in either direction, regardless of whether their type parameters have a subtype relationship. The covariant class allows an assignment from subtype to supertype. String is a subtype of AnyRef, so CoVar[String] is a subtype of CoVar[AnyRef]. The contravariant class allows an assignment from supertype to subtype. String, again, is a subtype of AnyRef, so ContraVar[AnyRef] is a subtype of ContraVar[String].

So, it’s as simple as that, right? Sorry. There’s a little more to it. Once you’ve declared a type parameter as covariant or contravariant there are some restrictions on where this type can be used. Why? Scala is not a purely functional langage in that it allows objects to alter their internal state. It allows mutability. Mutability throws a monkey wrench into variance. Say, for example you had a Scala implementation of a linked list like so:

class LinkedList[+A] {
  private var next: LinkedList[A] = null
  def add(item: A): Unit = { ... }
  def get(index: Int): A = { ... }
}

val strList = new LinkedList[String]
strList.add("str1")
val anyList: LinkedList[Any] = strList
anyList.add(new Double(1.0))
val str: String = strList.get(1)

This code won’t compile. Do you see the problem? This code, if it worked, would allow us to create a list of Strings and then add a Double to that list! If we allow that then there’s a disaster when we get to the last line. A LinkedList of Strings returns a Double. That’s no good. That’s the same problem as we saw in Java. Scala nips this sort of code in the bud by disallowing covariant types in certain places including member function parameter types. Places where covariant types are allowed and contravariant types are forbidden are called covariant positions. And the reverse is true, too. If contravariant types are allowed and covariant types are forbidden in some position, this is called a contravariant position.

The above code causes a compiler error for the add method. You can use covariant types in most other places including constructor parameter types, member val types, and method return types. You can also use covariant types as type parameters, but only where covariant types themselves are allowed. This means, in this example, you could add a method that returns a Set[A], but you could not add a method that takes a Set[A] as a parameter, because you could use that passed-in Set[A] to alter the state.

Contravariant types can be used as constructor parameter types, member function parameters types, and as type parameters in each of those positions.

Here’s a fun exercise for the reader. Experiment with using type parameters of the different kinds of variances in different positions. Below is some example code to get you started. For each usage that fails compilation, why is it not allowed? Can you think of a way that such a usage could cause an inconsistency (such as allowing a Double in a String collection, for example)?

class InVar[T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

class CoVar[+T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

class ContraVar[-T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

Maybe in a future post, I’ll do a more thorough analysis of all the places type parameters of the different variances can be used. If you’d be interested in reading such a thing, please do leave a comment.

Conclusion

What was the point of all that? We still don’t get the mutable, covariant collections we were hoping for. But we do get two things we don’t get in Java. We get invariant, typesafe, mutable collections that won’t wind up holding objects of the wrong type, and we get covariant, typesafe, immutable collections. If you’re new to functional programming (like I am), your first thought might be, “What good is an immutable list? Or an immutable array? If it’s immutable, I can’t add any items to it, right?”

Take the Scala List as an example. It’s declared as a “class List[+A]” and it has a method for adding new items that’s declared “def + [B >: A](x : B) : List[B]“. So List is covariant with respect to its type parameter A. That’s great! So a List[BigInt] “is-a” List[Number] and it “is-a” List[Object]. To add items to a List use the ‘+’ function. It doesn’t change the List it was called on, but it does return a new List.

Also, you can add anything to a List. What you add affects what gets returned. Look at the function definition again: “def + [B >: A](x : B) : List[B]“. It take a parameter of type B, where B is any supertype of A (or A itself), and it returns a List[B]. Let’s consider the simple case, and then something more complex.

scala> var strList = List[String]("abc")
strList: List[String] = List(abc)

scala> strList = strList + "xyz"
strList: List[String] = List(abc, xyz)

scala> var objList = strList + new Object()
objList: List[java.lang.Object] = List(abc, xyz, java.lang.Object@156ee8e)

scala> var anyList = strList + 3.1416
anyList: List[Any] = List(abc, xyz, 3.1416)

First we create a List[String] called strList containing one item. Then we add a second String and store the resulting List back in the strList variable. Then we call the ‘+’ function with an Object parameter. This is allowed because the parameter must have type B where B is a supertype of A, and Object is indeed a superclass of String. The call to ‘+’ returns a List[Object] which contains both the Strings and the Object. Simple.

Then we call the ‘+’ method on testList again. Remember, strList still just contains the two Strings because it’s immutable and we didn’t assign the last result back to strList. We couldn’t have. The result was a List[Object], not a List[String], and List is covariant, not contravariant. This time we supply a parameter of type Double. But Double isn’t even a supertype of String! That’s ok. Scala determines the nearest common ancestor, the type Any, and uses that as B. The ‘+’ function returms a List[Any] that contains the Strings and the Double. That’s handy. And covariant and typesafe.