Entries tagged with “variance”.


As I’ve marveled before, the Scala type system is extremely expressive.  In this post, I want to look at one particular aspect of the type system.  I want to look at a few of the ways to pass behaviors into functions.  Using classes, traits, function types, and a construct that looks like an ad hoc interface (and probably other ways I haven’t learned yet) you can exercise precise control over what units of behavior your functions will accept and use.

First, let’s define a zoo of classes that we will experiment on:

trait TestTrait {
  def func(s1: String): Unit
}

class TestClass1 {
  def func(s1: String): Unit = println("TestClass1: "+s1)
}

class TestClass2 {
  def func(s1: String): Int = { println("TestClass2: "+s1); 999; }
}

class TestClass3 extends TestTrait {
  def func(s1: String) = println("TestClass3: "+s1)
}

class TestClass4 {
  def func(a1: Any) = println("TestClass4: "+a1)
}

class TestClass5 {
  def otherFunc(s1: String) = println("TestClass5: "+s1)
}

class TestClass6 {
  def wrongFunc(list: List[Int]) = println("TestClass6: "+list)
}

There.  These are very simple, but I’ll just point out the salient points of each:

  • TestTrait – Basically, an interface with one method.
  • TestClass1 – One method, func, takes a String and returns Unit (void, to Java coders).
  • TestClass2 – Same as TestClass1, but returns Int.
  • TestClass3 – Same as TestClass1, but also extends TestTrait.
  • TestClass4 – Same as TestClass1, but takes an Any parameter.
  • TestClass5 – Same as TestClass1, but its function has a different name
  • TestClass6 – One method, different name, different parameter type, does notextend TestTrait.

Each of these (except TestTrait) defines some behavior that we might want to pass into a function.  That behavior is trivial in these classes, but I’m trying to keep the examples short.  Let’s now define several functions that might be able to use some of this behavior.

def test1(fn : (String) => Unit) = { fn("test1"); }

def test2(fn : (String) => Int)  = { fn("test2"); }

def test3(obj: { def func(x: String): Unit }) = { obj.func("test3") }

def test4(obj: TestClass1) = { obj.func("test4") }

def test5(obj: TestTrait)  = { obj.func("test5") }

def test6(obj: { def func(x: Any): Unit }) = { obj.func("test6") }

Now we just need to instantiate each one class and start passing parameters!  Create an instance of each class, TestClass1 to TestClass6, and name them tc1 to tc6.  The pass each instance’s function into test1 and test2, and pass the instance itself into test3 through test6.  Let’s do this in the Scala interpreter, rather than compiling.  Like so:

val tc1 = new TestClass1
val tc2 = new TestClass2
val tc3 = new TestClass3
val tc4 = new TestClass4
val tc5 = new TestClass5
val tc6 = new TestClass6

test1(tc1.func)
test1(tc2.func)
test1(tc3.func)
test1(tc4.func)
test1(tc5.otherFunc)
test1(tc6.wrongFunc)

test2(tc1.func)
test2(tc2.func)
test2(tc3.func)
test2(tc4.func)
test2(tc5.otherFunc)
test2(tc6.wrongFunc)

test3(tc1)
test3(tc2)
test3(tc3)
test3(tc4)
test3(tc5)
test3(tc6)

test4(tc1)
test4(tc2)
test4(tc3)
test4(tc4)
test4(tc5)
test4(tc6)

test5(tc1)
test5(tc2)
test5(tc3)
test5(tc4)
test5(tc5)
test5(tc6)

test6(tc1)
test6(tc2)
test6(tc3)
test6(tc4)
test6(tc5)
test6(tc6)

Did you run that?  It didn’t work, did it?  At least not all of it.  Now we we have something to discuss.  Here’s a chart of what did and didn’t work:

test1 test2 test3 test4 test5 test6
tc1 OK FAIL OK OK FAIL FAIL
tc2 OK OK FAIL FAIL FAIL FAIL
tc3 OK FAIL OK FAIL OK FAIL
tc4 OK FAIL FAIL FAIL FAIL OK
tc5 OK FAIL FAIL FAIL FAIL FAIL
tc6 FAIL FAIL FAIL FAIL FAIL FAIL

Taking these results one test function at a time:

test1(fn : (String) => Unit)

All but one of the test objects has at least one member function that can be passed in to this function.  It’s easy to see why this works for tc1, tc3, and tc5.  They each have a function that matches the type of parameter fn exactly.  It’s also easy to see why it doesn’t work for tc6.  TestClass6.wrongFunc takes a List[Int] parameter, which is unrelated to fn’s expected String parameter.

But what about tc2 and tc4?  TestClass2.func takes a String parameter, just like the test1′s fn parameter, but it returns an Int instead of Unit.  But it still works.  Instead of requiring a Unit return type for fn, Scala allows functions that return values.  It simply throws away any returned value and treats the passed in function as though it returned Unit.  Scala’s typesafety requirements in this case are very permissive!  This can be convenient, but be aware that you have very little control of what functions can be passed in when you use this kind of parameter type.

TestClass4.func returns Unit, but it takes a parameter of type Any.  Since test1 knows parameter fn as a function with a String parameter, test1 will only ever pass Strings into fn.  A String “is-a” Any so Scala says it’s ok to pass Strings into TestClass4.func.

This all goes back to variance.  In Scala, functions are contravariant with respect to parameters and covariant with respect to return type.  If you’re not familiar with variance, this means that if function f2′s parameters are simple supertypes (simple, meaning a supertype or the same type) of function f1′s parameters, and if f2′s return type is a simple subtype of f1′s return type then f2 is a subtype of f1 and can be used anywhere f1′s type is required.  The makes sense.  If a function takes type X, then you can pass in any value of type X or a subtype of X.  If a function returns type Y, then the result can be treated as Y or any supertype of Y.  Because Unit is a subtype of Int (and of all reference types) and String is a simple supertype of String, the function type “(String) => Int” is a subtype of “(String) => Unit” and can be used wherever a “(String) => Unit” is required.

test2(fn : (String) => Int)

This one only works for tc2.  TestClass2.func matches the type of parameter fn exactly.  But none of the other functions can be used here.  This is because none of the other functions take a parameter that is a simple subtype of String and return a simple supertype of Int, none of the others will work here.

A quick note about the use of functions in this way:  If you have an instance and pass a member function of that instance as a parameter, the function still has access to that instance’s data.  Moreover, if the passed function changes the state of the object of which it is a member, then function that receives the function as a parameter can mutate the object.  I don’t think I’m being clear.  Here’s some code to demonstrate what I mean.

class TestClass(name: String) {
  var memberStr: String = ""
  def func(str: String) = { memberStr += str }
  override def toString = name + ": " + memberStr
}

val testA = new TestClass("A")
val testB = new TestClass("B")

def test(fn : (String) => Unit, num: Int): Unit = {
  if (num > 0) { fn("X"); test(fn,num-1); }
}

test(testA.func, 5)
test(testB.func, 2)

println(testA + ", " + testB)

See?  Function test just takes a function as a parameter (and an Int).  It does NOT take a TestClass as a parameter, but it’s still able to alter those two instances of TestClass by way of the mutator member function TestClass.func.  So be aware that when you declare a function parameter like this, you never know whether you’re getting a bare function, or a member function.

test3(obj: { def func(x: String): Unit })

Here’s a construct that’s much more strict than the function types used in test1 and test2.  The test3 function take an ad-hoc interface as a parameter.  It’s not a named interface (a Trait), so the parameter’s type doesn’t have to be declared as extending anything.  The parameter is accepted if it has a member function of the given name and type.

Now, in this case, remember, the parameter isn’t the function, it’s the object.  So the permissive behavior that Scala allows in test1 and test2 because of the variance behavior of function types does not apply here.  An object of type A has a function funcA, and object of type B has a function funcB whose type is a subtype of funcA, that does NOT make B a subtype of A.  The names and types of the members declared in the ad-hoc interface must match the parameter exactly.  The parameter can have additional members, but it must at least provide all the members listed in the ad-hoc interface.

In this case, TestClass1 and TestClass3 are the only classes that have functions named func that take a String parameter and return Unit.  The other classes have either the wrong function name, parameter type, or return type.

test4(obj: TestClass1)

This is an easy one.  You know what’s happening here.  Function test4 states explicitly that it requires an object of type TestClass1.  Only a TestClass1 or a subtype of TestClass1 will do.

test5(obj: TestTrait)

Function test5 expects a TestTrait parameter.  Any object that has the trait TestTrait can be passed in.  In this case, TestClass3 is the only class that extends TestTrait.  We could have declared TestCase1 as extending TestTrait, but we didn’t.  Therefor, it doesn’t matter that TestClass1 has a function with the right name, the right parameter types, and the right return type.  It isn’t explicitly defined as extending TestTrait, so it isn’t allowed.

test6(obj: { def func(x: Any): Unit })

This is really just a more extreme example of the phenomenon demonstrated in test3.  This is another ad-hoc interface.  That member function is the supertype of ALL functions that have one parameter and return a reference type.  ALL of them.  You see why?  Any is the superclass of every type and Unit is a subclass of every reference type.  Function types are contravariate with respect to their parameters and covariant with respect to their return types.  But, of course, only our instance of TestClass4 is accepted because this parameter doesn’t have a function type, but an ad-hoc interface type.  It is not the case that all types are covariant with respect to the types of their member functions.  TestClass4 is the only one that has a function with the right name, parameter type, and return type.

Variance in Java

If you’re a Java developer then you probably know a thing or two about subtyping. B is a subtype of A if B extends or implements A (I’ll use this convention throughout this post). A is the supertype, B is the subtype. But what about arrays? Or generic collections? Is an array of B a subtype of an array of A? Is List<B> a subtype of List<A>? Let’s do some experiments:

Object testObj = null;

String[] arrayB = { "a", "b", "c" };
Object[] arrayA = arrayB;
testObj = arrayA[0];

List<String> listB = new ArrayList<String>();
listB.add("123");
List<Object> listA = (List)listB;
testObj = listA.get(0);

List<Double> listC = new ArrayList<Double>();
listC.add(new Double(10.0));
listB = (List)listC;
System.out.println(listB.get(0));

I’ve left out the boilerplate for brevity. This code compiles and it runs fine, mostly. This tells us three things. First, Java treats an array of B as a subtype of an array of A. We know this because we can use a reference to an array of Objects to refer to an array of Strings. This means an array of Strings “is-a” array of Objects.

Second, we know that an ArrayList<String> is a subtype of a List<String>. This makes sense, because ArrayList implements interface List. An ArrayList<String> “is-a” List<String>. We can even finagle the List<String> into a List<Object>, but we have to make an explicit cast.

Third, we see a ClassCastException during the call to listA.get(0). We use the explicit cast again to assign a List<Double> to a reference to List<String>. The compiler allows this! This kind of sloppy typing is one of the main complaints of Java generics’ detractors (this link includes an example of an erroneous assignment that doesn’t even require an explicit cast!). Now, let’s look at some of the things Java doesn’t allow.

Object[] arrayA = { new Object() };
String[] arrayB = arrayA;

List<String> listB = new ArrayList<String>();
List<Object> listA = listB;

This causes two compiler errors. The first occurs when we try to assign an array of Object to a reference to an array of String. This is sensible. An array of Objects could contain any object: Strings, Doubles, Threads, anything. We can’t treat such an array as an array of Strings.

The second error occurs when we try to assign a list of Strings to a reference to a List of Objects. This is the same code as in the previous snippet, but without the explicit cast. The compiler doesn’t allow this. Since lists are mutable (we can add and remove items) we could add non-String members to listA, which means those non-String members would also be in listB. But if we don’t add any items to listA, the assignment is perfectly safe. Java can’t figure out in all cases when it’s safe to do an assignment and when it is not.

Why do Java generics work the way they do? I think it’s mainly due to two factors: backward compatibility, promoting adoption of the feature. When generics were introduced, there were already millions of lines of code out there that depend on regular, non-generic, mutable collections. To make the new code compatible with legacy code, the type parameters are erased during compilation, and allowing those dangerous casts lets developers work around the fact that List<B> is not a subtype of List<A>. To make it easier to use generics in new code and convert to non-generics for integration with old code, the compiler rules were made fairly permissive.

Variance Terminology

If you’re not familiar with the term “variance”, here’s what it means with respect to Java. Java arrays are covariant. That means that an array of B is a subtype of an array of A, provided that B is a subtype of A. The type-subtype relationship of the arrays follows the relationship of the contents. Lists in Java are invariant (some say nonvariant). A List of Strings has no relationship to a List of Objects. You can explicitly cast a List of Strings to a List of Objects but you can force the conversion the opposite direction, too, (Yuck.) so that doesn’t count. Now consider some hypothetical generic class X such that X<A> is a subtype of X<B> if B is a subtype of A. That’s the opposite of the way arrays work. The hypothetical generic X is contravariant.

One more detail of terminology: A type class could theoretically be covariant with respect to one type parameter, and contravariant with respect to another (not in Java, just theoretically). So, say you have a generic class X that is covariant with respect to its first type parameter and contravariant with respect to its second. So X<B,I> is a subtype of X<A,J> only if B is a subtype of A and J is a subtype of I. Weird, huh? This actually happens in Scala.

Variance in Scala

In Scala, variance is not left to chance. There are very strict rules. Variance with respect to type parameters is spelled out explicitly for each class (or trait). The same conventions are used for Arrays, Lists, or any generic class! The variance system (indeed, the whole type system) in Scala is more complicated and has a steeper learning curve than in Java, but it affords you the ability to write very expressive code that behaves in a more intuitive fashion. Here are three generic classes that use each of the variance types.

class InVar[T]     { override def toString = "InVar" }
class CoVar[+T]     { override def toString = "CoVar" }
class ContraVar[-T] { override def toString = "ContraVar" }
/************ Regular Assignment ************/
val test1: InVar[String] = new InVar[String]
val test2: CoVar[String] = new CoVar[String]
val test3: ContraVar[String] = new ContraVar[String]

The ‘+’ denotes covariance with respect to the type parameter, and ‘-’ denotes contravariance. The class is invariant with respect to type parameters without a plus or minus. If you run this code you can see that when the type parameters are the same on both sides, the assignments work fine. Now, let’s see what happens when we test assignment for different type parameters.

scala> /************ Invariant Subtyping ************/

scala> val test1: InVar[String] = new InVar[AnyRef]
<console>:5: error: type mismatch;
 found   : InVar[AnyRef]
 required: InVar[String]
       val test1: InVar[String] = new InVar[AnyRef]
                                   ^

scala> val test2: InVar[AnyRef] = new InVar[String]
<console>:5: error: type mismatch;
 found   : InVar[String]
 required: InVar[AnyRef]
       val test2: InVar[AnyRef] = new InVar[String]
                                   ^

scala> /************ Covariant Subtyping ************/

scala> val test3: CoVar[String] = new CoVar[AnyRef]
<console>:5: error: type mismatch;
 found   : CoVar[AnyRef]
 required: CoVar[String]
       val test3: CoVar[String] = new CoVar[AnyRef]
                                  ^

scala> val test4: CoVar[AnyRef] = new CoVar[String]
test4: CoVar[AnyRef] = CoVar

scala> /************ Contravariant Subtyping ************/

scala> val test5: ContraVar[String] = new ContraVar[AnyRef]
test5: ContraVar[String] = ContraVar

scala> val test6: ContraVar[AnyRef] = new ContraVar[String]
<console>:5: error: type mismatch;
 found   : ContraVar[String]
 required: ContraVar[AnyRef]
       val test6: ContraVar[AnyRef] = new ContraVar[String]
                                      ^

Now you can see the difference in the three classes. The invariant class doesn’t allow assignment in either direction, regardless of whether their type parameters have a subtype relationship. The covariant class allows an assignment from subtype to supertype. String is a subtype of AnyRef, so CoVar[String] is a subtype of CoVar[AnyRef]. The contravariant class allows an assignment from supertype to subtype. String, again, is a subtype of AnyRef, so ContraVar[AnyRef] is a subtype of ContraVar[String].

So, it’s as simple as that, right? Sorry. There’s a little more to it. Once you’ve declared a type parameter as covariant or contravariant there are some restrictions on where this type can be used. Why? Scala is not a purely functional langage in that it allows objects to alter their internal state. It allows mutability. Mutability throws a monkey wrench into variance. Say, for example you had a Scala implementation of a linked list like so:

class LinkedList[+A] {
  private var next: LinkedList[A] = null
  def add(item: A): Unit = { ... }
  def get(index: Int): A = { ... }
}

val strList = new LinkedList[String]
strList.add("str1")
val anyList: LinkedList[Any] = strList
anyList.add(new Double(1.0))
val str: String = strList.get(1)

This code won’t compile. Do you see the problem? This code, if it worked, would allow us to create a list of Strings and then add a Double to that list! If we allow that then there’s a disaster when we get to the last line. A LinkedList of Strings returns a Double. That’s no good. That’s the same problem as we saw in Java. Scala nips this sort of code in the bud by disallowing covariant types in certain places including member function parameter types. Places where covariant types are allowed and contravariant types are forbidden are called covariant positions. And the reverse is true, too. If contravariant types are allowed and covariant types are forbidden in some position, this is called a contravariant position.

The above code causes a compiler error for the add method. You can use covariant types in most other places including constructor parameter types, member val types, and method return types. You can also use covariant types as type parameters, but only where covariant types themselves are allowed. This means, in this example, you could add a method that returns a Set[A], but you could not add a method that takes a Set[A] as a parameter, because you could use that passed-in Set[A] to alter the state.

Contravariant types can be used as constructor parameter types, member function parameters types, and as type parameters in each of those positions.

Here’s a fun exercise for the reader. Experiment with using type parameters of the different kinds of variances in different positions. Below is some example code to get you started. For each usage that fails compilation, why is it not allowed? Can you think of a way that such a usage could cause an inconsistency (such as allowing a Double in a String collection, for example)?

class InVar[T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

class CoVar[+T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

class ContraVar[-T](param1: T) {
  def method1(param2: T) = { }
  def method2: T = { param1 }
  def method3: List[T] = { List[T](param1) }
  def method4[U >: T]: List[U] = { List[U](param1) }
  val val1: T = method2
  val val2: Any = param1
  var var1: T = method2
  var var2: Any = param1
}

Maybe in a future post, I’ll do a more thorough analysis of all the places type parameters of the different variances can be used. If you’d be interested in reading such a thing, please do leave a comment.

Conclusion

What was the point of all that? We still don’t get the mutable, covariant collections we were hoping for. But we do get two things we don’t get in Java. We get invariant, typesafe, mutable collections that won’t wind up holding objects of the wrong type, and we get covariant, typesafe, immutable collections. If you’re new to functional programming (like I am), your first thought might be, “What good is an immutable list? Or an immutable array? If it’s immutable, I can’t add any items to it, right?”

Take the Scala List as an example. It’s declared as a “class List[+A]” and it has a method for adding new items that’s declared “def + [B >: A](x : B) : List[B]“. So List is covariant with respect to its type parameter A. That’s great! So a List[BigInt] “is-a” List[Number] and it “is-a” List[Object]. To add items to a List use the ‘+’ function. It doesn’t change the List it was called on, but it does return a new List.

Also, you can add anything to a List. What you add affects what gets returned. Look at the function definition again: “def + [B >: A](x : B) : List[B]“. It take a parameter of type B, where B is any supertype of A (or A itself), and it returns a List[B]. Let’s consider the simple case, and then something more complex.

scala> var strList = List[String]("abc")
strList: List[String] = List(abc)

scala> strList = strList + "xyz"
strList: List[String] = List(abc, xyz)

scala> var objList = strList + new Object()
objList: List[java.lang.Object] = List(abc, xyz, java.lang.Object@156ee8e)

scala> var anyList = strList + 3.1416
anyList: List[Any] = List(abc, xyz, 3.1416)

First we create a List[String] called strList containing one item. Then we add a second String and store the resulting List back in the strList variable. Then we call the ‘+’ function with an Object parameter. This is allowed because the parameter must have type B where B is a supertype of A, and Object is indeed a superclass of String. The call to ‘+’ returns a List[Object] which contains both the Strings and the Object. Simple.

Then we call the ‘+’ method on testList again. Remember, strList still just contains the two Strings because it’s immutable and we didn’t assign the last result back to strList. We couldn’t have. The result was a List[Object], not a List[String], and List is covariant, not contravariant. This time we supply a parameter of type Double. But Double isn’t even a supertype of String! That’s ok. Scala determines the nearest common ancestor, the type Any, and uses that as B. The ‘+’ function returms a List[Any] that contains the Strings and the Double. That’s handy. And covariant and typesafe.