Archive for August, 2009

In Scala, as in Java, C and many other languages, identifiers may contain a mix of lower and upper case characters. These identifiers are treated in a case sensitive manner. For example “index”, “Index” and “INDEX” would be treated as three separate identifiers. You can define all three in the same scope. That goes for Scala, Java, and most if not all descendants of the C language. In most of these languages, although case is significant in distinguishing identifiers, and although various capitalization schemes are used by convention, case does not alter functionality. Whether you name a variable “index”, “Index” or “INDEX”, as long as you don’t hide an identifier from an enclosing scope, the code will function in exactly the same way.

Scala, though, diverges slightly from this tradition. Here’s an example. Say we have a Pair of two Ints. Say we also have two plain Int values and we want to know whether those two Ints are equal to the values inside the Pair. In the case that they do match, we also want to know in which order they appear in the Pair.

Operations on tuples (such as a Pair) can often be implemented neatly by pattern matching. Here’s one solution to this problem:

def matchPair(x: (Int,Int), A: Int, b: Int): String = 
x match {
  case (A, b) => "Matches (A, b)"
  case (b, A) => "Matches (b, A)"
  case _      => "Matches neither"
}

This is completely unsurprising code except for one little detail. One of the function parameters is upper case while the other two are lower case. Other than that, there’s nothing unusual so far. So let’s try out this code in the Scala interpreter.

scala> def matchPair(x: (Int,Int), A: Int, b: Int): String = 
     | x match {
     |   case (A, b) => "Matches (A, b)"
     |   case (b, A) => "Matches (b, A)"
     |   case _      => "Matches neither"
     | }
matchPair: ((Int, Int),Int,Int)String

scala> val pair = (5, 10)
pair: (Int, Int) = (5,10)

scala> matchPair( pair,  5, 10 )
res1: String = Matches (A, b)

scala> matchPair( pair, 10,  5 )
res2: String = Matches (b, A)

scala> matchPair( pair, 99, 99 )
res3: String = Matches neither

So far so good! It returns the expected value when the values match in order, in reverse order, and when both values don’t match. Is this sufficient unit testing? What other tests would you run?

As you may well guess, no, this isn’t sufficient unit testing. Let’s try the case where one Int matches but not the other:

scala> matchPair( pair,  5, 99 )
res4: String = Matches (A, b)

scala> matchPair( pair, 99, 10 )
res5: String = Matches (b, A)

That didn’t work right. Is the matchPair function telling us that ‘pair’ (which is (5, 10) ) matches (5, 99) or (99 10)? That’s what it look like, but no. Scala does something a little bit surprising here. Do you know why?

As I said before, you can have variable and constants in Scala with upper or lower case names. Both are legal, just as they are in Java. But Scala makes some distinctions that Java doesn’t. Within a pattern (the part between ‘case’ and ‘=>’) Scala treats simple lower case identifiers differently. It uses them as new variables into which matched data is stored, but this is not the case for identifiers that begin with an upper case letter!

If you want to capture results of a pattern match in Scala you must use a lower case identifier and that identifier will hide any identifiers with the same name from an enclosing scope. So in our example function, “case (A, b)” matches a Pair. The first element of the pair is “A” which start with an upper case letter, so pattern matching results can’t be stored in it. It is used the way we intended, i.e. the pattern is matched if x._1 equals A.

The “b” in “case (A, b)”, though, begins with a lower case letter so it is assigned the value of x._2 (assuming x._1 equals A). It is as if you had typed “val b = x._2” in the function body. Within the case line, the “b” from the pattern hides the parameter named “b”.

So how can we make this function work the way we want? Here’s one way:

def matchPair(x: (Int,Int), A: Int, B: Int): String = 
x match {
  case (A, B) => "Matches (A, B)"
  case (B, A) => "Matches (B, A)"
  case _      => "Matches neither"
}

Now both the the Int parameters start with an upper case letter and are therefore tested against x._1 and x._2. This code passes our tests. Note that the code behaves differently simply based on the parameter names we choose. There’s another way to prevent Scala from using the identifiers for storing pattern results.

def matchPair(x: (Int,Int), a: Int, b: Int): String = 
x match {
  case (`a`, `b`) => "Matches (a, b)"
  case (`b`, `a`) => "Matches (b, a)"
  case _          => "Matches neither"
}

You can use the more traditional lower case parameter names if you quote them using the backquote character. That’s the key to the left of the “1” on my keyboard. This matchPair is equivalent to the one that used capital “A” and “B”.

Another quick example:

scala> val pair = (5, 10)
pair: (Int, Int) = (5,10)

scala> val (a,b) = pair
a: Int = 10
b: Int = 5

scala> val (X,Y) = pair
:5: error: not found: value X
       val (X,Y) = pair
            ^
:5: error: not found: value Y
       val (X,Y) = pair
              ^

You know that you can use the construct from line 4 above to declare and initialize multiple vals or vars using a tuple, right? And you know how that magic is done? Patterns, so the same principle applies here. The capitalized identifiers “X” and “Y” are taken to refer to existing identifiers because they can’t be used to store pattern match results. Since no such identifiers had been defined, you get an error.

If we define these values beforehand then Scala tries to match their values:

scala> val pair = (5,10)
pair: (Int, Int) = (5,10)

scala> val I = 5
I: Int = 5

scala> val J = 10
J: Int = 10

scala> val (I,q) = pair
q: Int = 10

scala> val (J,r) = pair
scala.MatchError: (5,10)
        at .<init>(<console>:6)
        at .<clinit>(<console>)
        at RequestResult$.<init>(<console>:3)
        at RequestResult$.<clinit>(<console>)
        at RequestResult$result(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(...

scala> val (I,J) = pair

In line 10, the value I has been declared and initialized to 5 so it matches the first part of the Pair. The identifier q becomes a new val initialized to the value in the second part of the Pair.

In line 13, we do the same thing but we try to match J to the first part of the Pair. This won’t work since pair._1 is 5 and J is 10. A MatchError is thrown.

In line 24, we use both of the capitalized identifiers. They match, but there are no lower case identifiers to make new values out of, so the line does nothing except to confirm (by not throwing an Error) that I equals pair._1 and J equals pair._2.

Now you know how to match when you want to match and assign results when you want to assign results. I hope that being able to match against already-defined identifiers will make your matching code more powerful.

I noticed some strange behavior in some Scala code recently. It was rather a mystery. I looked for my error and googled for a solution for the longest time with no success. Eventually I got my answer from the Scala mailing list / Nabble forum. Here’s the class that was causing the trouble.

class ArrayWrapper[A](length: Int) {
  private val array = new Array[A](length)
  def apply(x: Int) = array(x)
  def update(x: Int, value: A) = array(x) = value
  override def toString(): String = array.toString
}

The first thing you’ll notice about this class is that it is extremely simple! There aren’t a lot of moving parts. It’s a simple wrapper that exposes 3 basic array behaviors: apply (a ‘getter’), update (a ‘putter’), and good ol’ toString. Arrays in Scala take a type parameter, and to ensure that this class could wrap an array of any type I used a type parameter, too. Have a good look at the class and make sure you understand how it works. It won’t take long.

How do you expect this class to behave? Let’s play a little fill-in-the-blanks. Here is a Scala interpreter session with some results blanked out.

scala> class ArrayWrapper[A](length: Int) {
    |   private val array = new Array[A](length)
    |   def apply(x: Int) = array(x)
    |   def update(x: Int, value: A) = array(x) = value
    |   override def toString(): String = array.toString
    | }
defined class ArrayWrapper

scala> val a = new ArrayWrapper[Int](5)
??????????????

scala> val x = a(0)
??????????????

scala> x.toString
??????????????

scala> a(0).toString
??????????????

scala> a(0) = 0

scala> a.toString
??????????????

scala> a(0).toString
??????????????

scala>

There are 6 blanks. What do you expect to see in each of those? Well, the first blank follows the creation of a new ArrayWrapper[Int] and its assignment to a val ‘a’. So, according to our overriding definition of toString, it is simply the result of the underlying Array’s toString method. I know from experience how a brand new Array of Ints looks. It looks like this:

scala> new Array[Int](5)
res1: Array[Int] = Array(0, 0, 0, 0, 0)

So that’s what I expect to see here. Anyone expect something different? Here’s what I actually saw in the first blank:

scala> val a = new ArrayWrapper[Int](5)
a: ArrayWrapper[Int] = Array(null, null, null, null, null)

Hmm. That’s not what I expected. Did you predict this? Why is this array full of nulls when a new Array[Int] is usually full of zeros? I was stumped. The array is parameterized, I reasoned, so maybe type erasure was involved. That doesn’t make sense, though. No types should be erased at this point.

Let’s look at the next few lines, 12-16. I called a(0) (the apply method) and assigned the result to x. I then called the toString method on x. What do you expect in these two lines? I would have expected a(0) to return 0 and x.toString to return “0”, but my conviction is shaken by that last result. Will a(0) return null? Will x.toString throw a NullPointerException? Decide what you predict will happen. Here’s the actual result:

scala> val x = a(0)
x: Int = 0

scala> x.toString
res0: java.lang.String = 0

Each line behaves in the “correct” way even though we saw all those nulls in the underlying array. That’s good news, I suppose. Maybe the problem is limited to Array’s toString method. It should be smooth sailing now. Let’s now look at line 18, in which we call a(0).toString. It’s just combining the operations (apply and toString) from the previous two lines without storing the intermediate result in ‘x’. I expected that to return String “0”. You can probably guess by now that what I expected is not what I got. Make your own prediction before you read the actual result below. What will happen when we call a(0).toString?

scala> a(0).toString
java.lang.NullPointerException
       at .<init>(<console>:7)
       at .<clinit>(<console>)
       at RequestResult$.<init>(<console>:3)
       at RequestResult$.<clinit>(<console>)
       at RequestResult$result(<console>)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.i...

Ouch! NullPointerException! This is an unpleasant surprise. The call a(0) returned a zero earlier, and calling toString on that zero returned a String “0”. But now we get this disaster. I’m getting more and more confused. Do you have an explanation for this crazy behavior yet?

Moving along, in line 21 we assign 0 to a(0). Remember that a(0) returned 0 earlier. By the way, behind the scenes the line “a(0) = 0” doesn’t call the apply method, but the ‘update’ method. It succeeds. In lines 23 and 26 we call a.toString and a(0).toString. What will happen in each case? At this point, it’s anybody’s guess. The behavior has been so wacky I can’t even make a sensible prediction. Make a guess of your own, if you dare, and observe the actual result below:

scala> a(0) = 0

scala> a.toString
res3: String = Array(0, null, null, null, null)

scala> a(0).toString
res4: java.lang.String = 0

The underlying Array now appears to contain a zero in addition to the nulls. Also, the a(0).toString, which was throwing a NullPointerException earlier, is now succeeding.

As I say, I puzzled over this problem for some time. I wanted to blame the issue on type erasure in the parameterized type, but that explanation didn’t make sense. I posted a question to the Scala forum on Nabble and got a response back in short order from Daniel Sobral.

The culprit? Drumroll…

Boxing. Well, boxing, unboxing, and a peculiarity of parameterized types. To review, here is our ArrayWrapper class:

class ArrayWrapper[A](length: Int) {
  private val array = new Array[A](length)
  def apply(x: Int) = array(x)
  def update(x: Int, value: A) = array(x) = value
  override def toString(): String = array.toString
}

We declared ‘array’ to be an Array[A], which is to say an Array of who-knows-what. When the Array is defined in this way, with a type parameter of unknown type, the Array must be an array of object references! It cannot be an array of Java int primitives. That’s the peculiarity of parameterized types. That’s why the default values for the members of the array were null instead of 0. The underlying array is actually an array of java.lang.Integer objects.

When we ran ‘val x = a(0)’, Scala retrieved the value at index 0 which was null. The apply method has Int return type in our example, and null is not an legal value of an Int. Int is Scala’s version of the Java int primitive type. So the null was converted (unboxed) to Int value 0. Then it could be stored in val x, etc. Once it’s safely unboxed, it behaves like a normal Scala Int value.

So, why did a(0).toString not work? Shouldn’t the null returned from a(0) be unboxed to Int 0, then re-boxed for the toString call? Apparently it doesn’t work that way. The unboxing hasn’t happened at the time the toString call is executed, so that toString is called on the null, giving us the NullPointerException. I don’t know whether this behavior is imposed by the JVM or the Scala language. Either way, it seems to me like a violation of the Principle of Least Astonishment and an opportunity for improvement.

Once we call a(0) = 0, then the underlying array is populated with a boxed version of 0, which is to say an instance of java.lang.Integer. After it’s populated with a non-null it works normally.

Again, this only happens for Arrays with parameterized types. If we make ArrayWrapper non-parameterized and declare ‘array’ as an Array[Int] then the problem goes away.

scala> class ArrayWrapper(length: Int) {
     |   private val array = new Array[Int](length)
     |   def apply(x: Int) = array(x)
     |   def update(x: Int, value: Int) = array(x) = value
     |   override def toString(): String = array.toString
     | }
defined class ArrayWrapper

scala> val a = new ArrayWrapper(5)
a: ArrayWrapper = Array(0, 0, 0, 0, 0)

scala> a(0).toString
res0: java.lang.String = 0

There are a few lessons in all this for the Scala developer:

  • Be vigilant about Array initialization. Initialize them explicitly, especially when dealing with primitives like Int, Long, Float, Double, Byte and Char. Don’t trust the default values.
  • Beware parameterized Arrays. They are flawed. Consider specifying their type or using another collection instead, such as a List or Map which can’t contain un-initialized values.
  • Unit test all your code, even those parts that look too simple to screw up.