The way Scala deals with functions is pretty interesting. If you want to use them as you would use Java functions then they’re not that complicated. You have to learn the syntax, a little about the Scala type system, and bada-bing, you’re in business. But if you start exploring the way types are implemented in Scala you find some interesting stuff.

First, we’ll briefly describe the basics.  Here are a few very simple Scala function definitions and invocations in the scala interpreter:

scala> def method1() = { println("method1") }
method1: ()Unit

scala> def method2(str: String) = { println("method2: " + str) }
method2: (String)Unit

scala> def method3(str: String): Int = {
     |   println("method3: " + str); str.length;
     | }
method3: (String)Int

scala> def method4(f: (String) => Int) = {
     |   printf("method4: " + f("method4"))
     | }
method4: ((String) => Int)Unit

scala> method1
method1

scala> method2("abc")
method2: abc

scala> method3("abcdefg")
method3: abcdefg
res13: Int = 7

scala> method4(method3)
method3: method4
method4: 7

Very basic:  method1 takes no parameters and returns nothing, method2 takes a single parameter of type String and returns nothing, method3 takes a String parameter and returns an Int, and method4 takes a parameter of type “function that takes a String parameter and returns Int” and returns nothing.

Why are we able to declare functions like this in Scala?  Didn’t I read somewhere that Scala is very object oriented?  Didn’t I read that everything is an object?  Why do we have these bare naked functions defined outside of objects?  The reason is that in Scala, everything really is an object, even functions!  That method1 we defined?  That’s an object.  When we type “def method1() = {…}” we actually declared an instance of a special class.  I’ll declare method1 again, but with the underlying object exposed:

scala> val method1 = new Function0[Unit] {
     |   def apply: Unit = { println("method1") }
     | }
method1: java.lang.Object with () => Unit = <function>

scala> method1
res1: java.lang.Object with () => Unit = <function>

scala> method1.apply
method1

scala> method1()
method1

We instantiate an instance of trait Function0[Unit] and implement its one abstract method, called apply, and assign it to a val named method1.  Now you can see method1 is actually just a plain old Scala object.  When we type in “method1″ and hit enter, the interpreter just tells us the resulting value of the statement which is an Object with trait Function0.  Hmm, that didn’t work.  Next we try calling the apply method on the object.  That works!  But it’s just a regular call to a member method.  But when we type “method1()” then Scala knows that we want to use this object as a function, and that we’re not refering to the object itself.  When you declare a function using “def” Scala assumes that when you refer to the method you want the apply method invoked, and that you don’t want to return the function object.  Neat.

That Function0[Unit], by the way, defines a function that takes 0 parameters and returns Unit (which is to say nothing as in Java void (not to be confused with Nothing)).  If you want a function that takes two parameters, an Int and a String, and returns a List of Doubles, you would use Function2[Int, String, List[Double]].  So class FunctionX takes (X+1) type parameters, the first X of which define the function parameter types, and the last of which defines the return type.

So what if we go the other way?  What if we declare a method and then store it in a val?  In this case, Scala gets very picky.  Watch this:

scala> def method2 = { println("method2") }
method2: Unit

scala> val m2: () => Unit = method2
<console>:5: error: type mismatch;
 found   : Unit
 required: () => Unit
       val m2: () => Unit = method2
                            ^

scala> def method2() = { println("method2") }
method2: ()Unit

scala> val m2: () => Unit = method2
m2: () => Unit = <function>

scala> def method2 = { println("method2") }
method2: Unit

scala> val m2: () => Unit = method2 _
m2: () => Unit = <function>

Some strange stuff happens here.  First we just define a function called method2.  Nothing fancy.  Then we try to assign it to a val of type () => Unit.  It fails.  See the error message?  Found : Unit.  It parses it all wrong.  Scala thinks we’re trying to call method2 and assign the result to m2.  How can we set things straight?  Well, one way is to slightly change the way we define method2.  The only difference in the first and second definition is the addition of an empty parameter list, that empty pair parentheses.  For some reason, when we define the method in this apparently equivalent fashion, Scala rightly interprets our intentions and allows us to assign to m2.  There is another way, though.  In the third definition of method2, we’ve again removed the parentheses.  But this time we assign it successfully to val m2 by following method2 with an underscore.  The underscore just causes Scala to treat method2 as a Function0 object, rather than attempting to invoke it.

So a function is an object.  Who cares?  We’re still just calling functions.  Ah, but use your imagination.  You can do all kinds of tricks once you realize that a function is just an object.  For example:

scala> class TestClass {
     |   def f1(): Unit = { println("f1!!!"); func = f2 }
     |   def f2(): Unit = { println("f2!!!"); func = f3 }
     |   def f3(): Unit = { println("f3!!!"); func = f1 }
     |
     |   var func: () => Unit = f1
     |
     |   def test = { func() }
     | }
defined class TestClass

scala> val tc = new TestClass
tc: TestClass = TestClass@1eff71e

scala> tc.test
f1!!!

scala> tc.test
f2!!!

scala> tc.test
f3!!!

scala> tc.test
f1!!!

See what’s happening here?  We can store a reference to a function object, call the function it refers to, and re-assign it.  So the method “test” actually calls a different function each time.

Can you guess why I added the test method instead of just calling func directly?  func is declared with the var keyword, so if I entered “tc.func” instead of “tc.func()” then the interpreter would think I was refering to the function object.  Just so there’s no confusion, I wrapped the call “func()” inside a regular def-defined function called test.

Let’s see, what other neat tricks can we do?  Here’s something interesting:

scala> def printAll(str1: String, str2: String, str3: String): Unit = {
     |   println( str1 + ":" + str2 + ":" + str3 )
     | }
printAll: (String,String,String)Unit

scala> def fillInStr1(func: (String,String,String) => Unit, str1: String): (String,String) => Unit = {
     |   new Function2[String,String,Unit] {
     |     def apply(str2: String, str3: String) = {
     |       func(str1, str2, str3)
     |     }
     |   }
     | }
fillInStr1: ((String, String, String) => Unit,String)(String, String) => Unit

scala> val newPrint = fillInStr1(printAll _, "test123")
newPrint: (String, String) => Unit = <function>

scala> newPrint("abc","xyz")
test123:abc:xyz

scala> newPrint("123","456")
test123:123:456

First, we define printAll.  It’s just a function that prints out its 3 string parameters.  The next method, fillInStr1, is the interesting one.  The method signature is kind of complex.  It takes 2 parameters, func and str1.  func is a function taking 3 String parameters and returning nothing.  str1 is just a String.  fillInStr1 returns a function taking 2 String parameters and returning nothing.

Inside fillInStr, it just creates an instance of Function2, a function that takes 2 parameters.  This function object is defined so that the apply method calls the func function and passes str1 as the first parameter.  The other two parameters are the parameters of the Function2′s apply method.  Do you see what it’s doing?  It’s taking a function on 3 strings, and transforming it into a function on only 2 strings.  We can call fillInStr1 by passing in printAll (note the underscore), and a string.  What we get back is a function that behaves just like printAll, except with the first parameter already filled in.  Neat trick!

In fact, this trick is so neat that it has a name and is actually built into the language.  This little demonstration is a very simple, non-generalized application of a concept called currying.  The Code Commit blog has an excellent article on function currying in Scala if you’d like to know more about it.

This, of course, isn’t all there is to functions.  There’s a lot more!  But now you know enough to go out there and start experimenting.  See what tricks you can do, what problems you can solve with Scala’s versatile and powerful function objects.

I like to include a copyright notice on my posts, a small one down on the lower right, and a couple of links for RSS and Twitter.  The problem is that I always go back and add it as an afterthough.  So I write a post, proof it, convince myself that it’s perfect, post it, and then I notice that I didn’t put that little footer at the end.

I use wordpress.com to host my site.  I looked around the admin console for something pertaining to post templates but didn’t find anything.  I googled a little to see whether such a feature exists, but I didn’t find anything.

Then it occurred to me.  Matt Malone, you handsome devil, aren’t you a professional software developer?  And don’t you have the gall to write a blog proclaiming yourself such?  Why don’t you write something yourself?  So I did.  I wrote a very simple greasemonkey script.  Here it is below.  Feel free to install the script if you use wordpress.com and think it would be useful.

// ==UserScript==
// @name           WordPress Post Template
// @namespace      oldfashionedsoftware.com
// @description    Inserts some template text in new blog posts
// @include        http://matthewmalone.wordpress.com/wp-admin/post-new.php
// ==/UserScript==
var postTextAreaList, postTextArea;
postTextAreaList = document.evaluate( "//textarea[@name='content']",
    document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
postTextArea = postTextAreaList.snapshotItem(0);
postTextArea.value = "Your template text here";

One of the main complaints you hear about the Scala language is that it’s too complicated compared to Java. The average developer will never be able to achieve a sufficient understanding of the type system, the functional programming idioms, etc. That’s the argument. To support this position, you’ll often hear it pointed out that Scala includes several notions of nothingness (Null, null, Nil, Nothing, None, and Unit) and that you have to know which one to use in each situation. I’ve read an argument like this more than once.

It’s not as bad as all that. Yes, each of those things is part of Scala, and yes, you have to use the right one in the right situation. But the situations are so wildly different it’s not hard to figure out once you know what each of these things mean.

Null and null

First, let’s tackle Null and null. Null is a trait, which (if you’re not familiar with traits) is sort of like an abstract class in Java. There exists exactly one instance of Null, and that is null. Not so hard. The literal null serves the same purpose as it does in Java. It is the value of a reference that is not refering to any object. So if you write a method that takes a parameter of type Null, you can only pass in two things: null itself or a reference of type Null. Observe:

scala> def tryit(thing: Null): Unit = { println("That worked!"); }
tryit: (Null)Unit

scala> tryit("hey")
<console>:6: error: type mismatch;
 found   : java.lang.String("hey")
 required: Null
       tryit("hey")
             ^

scala> val someRef: String = null
someRef: String = null

scala> tryit(someRef)
<console>:7: error: type mismatch;
 found   : String
 required: Null
       tryit(someRef)
             ^

scala> tryit(null)
That worked!

scala> val nullRef: Null = null
nullRef: Null = null

scala> tryit(nullRef)
That worked!

In line 4 we try to pass in a String, and of course that doesn’t work. Then in line 14 we try to pass in a null reference, but that doesn’t work either! Why? It’s a null reference to a String. It may be null at run-time, but compile-time type checking says this is a no-no.

But look at line 21. We can pass in the literal null. And in line 27 we pass in another null reference, but this one is actually of type Null. Notice that we initialized nullRef to null. That’s the only value to which we could have initialized it, because null is the sole instance of Null.

Nil

Nil is an easy one. Nil is an object that extends List[Nothing] (we’ll talk about Nothing next). It’s an empty list. Here’s some example code using Nil:

scala> Nil
res4: Nil.type = List()

scala> Nil.length
res5: Int = 0

scala> Nil + "ABC"
res6: List[java.lang.String] = List(ABC)

scala> Nil + Nil
res7: List[object Nil] = List(List())

See? It’s basically a constant encapsulating an empty list of anything. It’s has zero length. It doesn’t really represent ‘nothingness’ at all. It’s a thing, a List. There are just no contents.

Nothing

If any of these is a little difficult to get, it’s Nothing. Nothing is another trait. It extends class Any. Any is the root type of the entire Scala type system. An Any can refer to object types as well as values such as plain old integers or doubles. There are no instances of Nothing, but (here’s the tricky bit) Nothing is a subtype of everything. Nothing is a subtype of List, it’s a subtype of String, it’s a subtype of Int, it’s a subtype of YourOwnCustomClass.

Remember Nil? It’s a List[Nothing] and it’s empty. Since Nothing is a subtype of everything, Nil can be used as an empty List of Strings, an empty List of Ints, an empty List of Any. So Nothing is useful for defining base cases for collections or other classes that take type parameters. Here’s a snippet of a scala session:

scala> val emptyStringList: List[String] = List[Nothing]()
emptyStringList: List[String] = List()

scala> val emptyIntList: List[Int] = List[Nothing]()
emptyIntList: List[Int] = List()

scala> val emptyStringList: List[String] = List[Nothing]("abc")
<console>:4: error: type mismatch;
 found   : java.lang.String("abc")
 required: Nothing
       val emptyStringList: List[String] = List[Nothing]("abc")

On line 1 we assign a List[Nothing] to a reference to List[String]. A Nothing is a String, so this works. On line 4 we assign a List[Nothing] to a reference to List[Int]. A Nothing is also an Int, so this works too. A Nothing is a subtype of everything. But both of these List[Nothing] instances contain no members. What happens when we try to create a List[Nothing] containing a String and assign that List to a List[String] reference? It fails because although Nothing is a subtype of everything, it isn’t a superclass of anything and there are no instances of Nothing, including String “abc”. So any collection of Nothing must necessarily be empty.

One other use of Nothing is as a return type for methods that never return. It makes sense if you think about it. If a method’s return type is Nothing, and there exists absolutely no instance of Nothing, then such a method must never return.

None

When you’re writing a function in Java and run into a situation where you don’t have a useful value to return, what do you do? There are a few ways to handle it. You could return null, but this causes problems. If the caller isn’t expecting to get a null, he could be faced with a NullPointerException when he tries to use it, or else the caller must check for null. Some functions will definitely never return null, but some may. As a caller, you don’t know. There is a way to declare in the function signature that you might not be able to return a good value, the throws keyword. But there is a cost associated with try/catch blocks, and you usually want to reserve the use of exceptions for truly exceptional situations, not just to signify an ordinary no-result situation.

Scala has a built-in solution to this problem. If you want to return a String, for example, but you know that you may not be able to return a sensible value you can return an Option[String]. Here’s a simple example.

scala> def getAStringMaybe(num: Int): Option[String] = {
     |   if ( num >= 0 ) Some("A positive number!")
     |   else None // A number less than 0?  Impossible!
     | }

getAStringMaybe: (Int)Option[String]

scala> def printResult(num: Int) = {
     |   getAStringMaybe(num) match {
     |     case Some(str) => println(str)
     |     case None => println("No string!")
     |   }
     | }
printResult: (Int)Unit

scala> printResult(100)
A positive number!

scala> printResult(-50)
No string!

The method getAStringMaybe returns Option[String]. Option is an abstract class with exactly two subclasses, class Some and object None. Those are the only two ways to instantiate an Option. So getAStringMaybe returns either a Some[String] or None. Some and None are case classes, so you can use the handy match/case construct to handle the result. None is object that signifies no result from the method.

The purpose of an Option[T] return type is to tell callers that the method might return a T in the form of a Some[T], or it might return None to signify no result. This way, the caller supposedly knows when he does and does not need to check for a good return value.

On the other hand, just because a method is declared as returning some non-Option type doesn’t mean it can’t return null. Moreover, a method declared as returning Option can, in fact, return a null. So the technique isn’t perfect.

This is a neat trick, but can you imagine a codebase peppered with Option[This] and Option[That] all over the place, and all those ensuing match blocks? I say use Option sparingly.

Unit

This is another easy one. Unit is the type of a method that doesn’t return a value of any sort. Sound familiar? It’s like a void return type in Java. Here’s an example:

scala> def doThreeTimes(fn: (Int) => Unit) = {
     |   fn(1); fn(2); fn(3);
     | }
doThreeTimes: ((Int) => Unit)Unit

scala> doThreeTimes(println)
1
2
3

scala> def specialPrint(num: Int) = {
     |    println(">>>" + num + "<<<")
     | }
specialPrint: (Int)Unit

scala> doThreeTimes(specialPrint)
>>>1<<<
>>>2<<<
>>>3<<<

In the definition of doThreeTimes we specify that the method takes a parameter called fn, which has a type of (Int) => Unit. This means that fn is a method that takes a single parameter of type Int and a return type of Unit, which is to say fn isn’t supposed to return a value at all just like a Java void function.

That’s it. Those are the ‘nothingness’ items in Scala. If you know of any more, please leave a comment! There is admitedly a lot to learn when you’re taking up Scala, but in return you get an incredibly expressive and succinct language.

If you enjoyed my earlier post on parsing in Scala Stephan Zeiger has a 3-part series at a more technical level.

Yesterday I made a post called Easy Parsing in Scala about using the Scala parsing libraries. I’ve made a couple changes to the code since then.

First, I noticed that the regex method takes a Regex object as its only parameter. Why, I thought to myself, didn’t they just make the method take a String so I don’t have to keep typing “new Regex”. Duh. Big duh. They’re giving me the opportunity to reuse Regex objects instead of stupidly recreating them over and over. So I added three private constant regular expressions that I could reuse: spaceRegex, numberRegex, and wordRegex. I couldn’t make a constant Regex for the one that matches a given number of characters, of course.

Second, I eliminated some repetition by adding a regexAndSpace method that matches a regular expression and then throws away the following whitespace. That’s a job that’s repeated 3 times, so I thought it made sense to factor it out. Without further ado, here’s the updated code:

import scala.util.parsing.combinator._
import scala.util.matching.Regex

object SvnParser extends RegexParsers {
  private val spaceRegex  = new Regex("[ \\n]+");
  private val numberRegex = new Regex("[0-9]+");
  private val wordRegex   = new Regex("[a-zA-Z][a-zA-Z0-9-]*");

  private def space  = regex(spaceRegex)
  private def regexAndSpace(re: Regex) = regex(re) <~ space

  override def skipWhitespace = false

  def number = regexAndSpace(numberRegex)
  def word   = regexAndSpace(wordRegex)
  def string = regex(numberRegex) >> { len => ":" ~> regexAndSpace(new Regex(".{" + len + "}")) }
  def list: Parser[List[Any]] = "(" ~> space ~> ( item + ) <~ ")" <~ space

  def item = ( number | word | string | list )

  def parseItem(str: String) = parse(item, str)
}

SvnParser.parseItem("( 5:abcde 3:abc  \n   20:three separate words     (  abc def     \n\n\n   123 ) ) ") match {
  case SvnParser.Success(result, _) => println(result.toString)
  case _ => println("Could not parse the input string.")
}

I’ve been experimenting with Scala lately.  As a practice project, I started writing some parts of a Subversion client.  Honestly, I don’t intend to create a real finished product, but it’s been a good source of interesting problems to solve.  Subversion systems pass messages back and forth using a fairly simple protocol.  This makes the Subversion protocol an ideal example for a Scala parsing tutorial. Let’s write a parser for this protocol using Scala’s parsing package.  Here’s the first few lines from the Subversion protocol spec:

The Subversion protocol is specified in terms of the following
syntactic elements, specified using ABNF [RFC 2234]:

  item   = word / number / string / list
  word   = ALPHA *(ALPHA / DIGIT / "-") space
  number = 1*DIGIT space
  string = 1*DIGIT ":" *OCTET space
         ; digits give the byte count of the *OCTET portion
  list   = "(" space *item ")" space
  space  = 1*(SP / LF)

Here is an example item showing each of the syntactic elements:

  ( word 22 6:string ( sublist ) )

Very simple!  Every message is made up of a number of items.  Each item is either an integer number, a word (made up of letters, digits, and hyphens), a string (which can have any character), or a list of these items, and a list can contain other sub-lists.  Plus, notice that the protocol is further simplified by the fact that each item is followed by a “space” which is defined as 1 or more space character or linefeeds.  First, let’s look at a Scala parser that implements a small subset of this spec:

import scala.util.parsing.combinator._
import scala.util.matching.Regex

object SvnParser extends RegexParsers {
  private def number = regex(new Regex("[0-9]+[ \\n]+"))
  def parseItem(str: String): ParseResult[Any] = parse(number, str)
}

SvnParser.parseItem("123  \n\n  ") match {
  case SvnParser1.Success(result, _) => println(result.toString)
  case _ => println("Could not parse the input string.")
}

Assuming you have a passing familiarity with Scala, this looks pretty straightforward.  You have an object (something like a Java singleton) that inherits from trait RegexParsers.  It makes calls to two RegexParsers methods: regex and parse.  It adds only one public method: parseItem.  The second section is a call to the parseItem method followed by a match block for handling the possible outcomes.  If the parse is successful, the parse result is printed.

The interesting part is that call to parse.  The first parameter is number.  In this program, number is just a method that calls method regex which returns an instance of Parser[+T].  And it’s pretty obvious what regex does.  It attempts to match a regular expression.  The regex pattern in this example is “[0-9]+[ \\n]+” which matches 1 or more digits followed by 1 or more spaces or newlines.  Try running this code through the Scala interpreter.  It works!  However, it includes all that whitespace which we don’t really need.  Let’s see if we can match it, but keep it out of the results.  I’ll leave out the imports and test code for brevity.

object SvnParser extends RegexParsers {
  private def number = regex(new Regex("[0-9]+")) ~ regex(new Regex("[ \\n]+"))
  def parseItem(str: String) = parse(number, str)
}

This time, we’re making two regex calls with a tilde (~) in between.  What is that thing?  It’s a function from the Parser[+T] class.  It has a signature like this:

def ~ [U](p : => Parser[U]) : Parser[~[T, U]]

This “type soup” is the kind of thing that scares people away from Scala.  But it’s not that bad.  This signature says that a Parser of type T has a method called “~” that takes a single parameter: a method returning a Parser of type U.  This “~” method returns a Parser of a class called “~” of type T and U.  Again, there’s a method called “~” and class called “~”.  What does that mean for our example?  The first call to regex returns a Parser[T] (T is String in this case) that matches the digits.  That Parser calls its “~” method with the result of a second regex call as its parameter, which returns a Parser[U] (U is also String) that matches the whitespace.  That “~” method returns another Parser of class “~” of T and U.  It turns out that ~[T,U] (more specifically, ~[String,String]) is basically an ordered pair.

Try running this through the interpreter.  What happens?  The parse fails.  Why?  This confused me for 10 or 15 minutes and I resorted to looking into the RegexParsers source code.  Here’s what I found:

  protected val whiteSpace = """\s+""".r

  def skipWhitespace = whiteSpace.toString.length > 0
  protected def handleWhiteSpace(source: java.lang.CharSequence, offset: Int): Int =
        if (skipWhitespace)
          (whiteSpace findPrefixMatchOf (source.subSequence(offset, source.length))) match {
            case Some(matched) => offset + matched.end
            case None => offset
          }
        else
          offset

I don’t know the details of how and why, but it looks like RegexParsers is messing around with our whitespace.  That’s OUR whitespace!  And we’ll handle it as we see fit.  So let’s override the default behavior.  Add a method to our SvnParser like this:  “override def skipWhitespace = false”  This should set things straight.  Whew.  That was a fun diversion.  Try running the code now.  The output string is this:

(123~

  )

That’s the contents of that ~[T,U].   It’s printed as (T~U).  That’s great if we need both parts of the parse, but we don’t.  We only want the digits.  There are two more Parser methods we should look at.  They are “<~” and “~>”.  Why all these funny names?  They’re brief and they’ll make more sense as we go.  Let’s look at the signatures.

def <~ [U](p : => Parser[U]) : Parser[T]
def ~> [U](p : => Parser[U]) : Parser[U]

Again, it looks daunting if you’re not familiar with Scala, but let’s take a closer look.  It’s all the same as the “~” method except the return type.  That means we use these the same way we use “~”, but we get a different result.  The “<~” method returns a Parser[T], the same type as the object calling “<~”.  The “~>” method returns Parser[U], the same type as the parameter.  You see?  “<~” returns the left side and throws away the right.  “~>” returns the right side and throws away the left.  The angle bracket points to the one we want to keep!  Now let’s try using one of these new methods to keep the digits and throw away the whitespace.  Which will we use?  “<~” or “~>”?  Right!  We’ll use “<~” because we want to keep the left side, the digits.  Here’s what the object looks like now:

object SvnParser extends RegexParsers {
  override def skipWhitespace = false
  private def number = regex(new Regex("[0-9]+")) <~ regex(new Regex("[ \\n]+"))
  def parseItem(str: String) = parse(number, str)
}

Look at that “<~”.  It says that in order to parse we must match both sides, but we only want to keep the left.  When we run this through the interpreter we see just the number output.  Great.  Now we’re getting the hang of it.  Let’s take a big leap forward and add parsing of words as well as numbers.  Here’s the new code:

object SvnParser extends RegexParsers {
  override def skipWhitespace = false

  private def space = regex(new Regex("[ \\n]+"))
  private def number = regex(new Regex("[0-9]+")) <~ space
  private def word   = regex(new Regex("[a-zA-Z][a-zA-Z0-9-]*")) <~ space
  private def item = ( number | word )

  def parseItem(str: String) = parse(item, str)
}

New things in this code:  we’re parsing for item, item is defined as a number or a word, word is defined using a new regex, and we’ve factored out the whitespace Parser into a new method.

This is mainly all self explanatory.  The space method just encapsulates the Parser that matches whitespace.  The item method calls method number which returns a Parser.  Then we call yet another Parser method, called “|” (pipe).  The “|” method take a single parameter, another Parser.  As you’ve probably guessed, it returns whichever side is matched and so lets us match the left or the right side.  Try changing the test string to some valid and invalid numbers and words.

Now let’s add strings.  This one’s tricky.  We have to match an integer number we’ll call N, a “:” literal, then a string of N characters, and finally the trailing whitespace.  The only part we want to keep is the actual string.  Here’s the code:

object SvnParser extends RegexParsers {
  override def skipWhitespace = false

  private def space = regex(new Regex("[ \\n]+"))
  private def number = regex(new Regex("[0-9]+")) <~ space
  private def word   = regex(new Regex("[a-zA-Z][a-zA-Z0-9-]*")) <~ space
  private def string = regex(new Regex("[0-9]+")) >> { len => ":" ~> regex(new Regex(".{" + len + "}")) <~ space }

  private def item = ( number | word | string )

  def parseItem(str: String) = parse(item, str)
}

First, note that we chained another “|” call in the definition of the item method.  The new string method first calls regex which returns a Parser that matches an integer.  Then guess what.  Another Parser method with a funny name.  This one is called “>>”.  Let’s look at the method signature again:

def >> [U](fq : (T) => Parser[U]) : Parser[U]

T is the type of the result of the left Parser.  U is the type of the result of the right Parser.  So this method takes a single parameter:  a method with one parameter of type T returning a Parser[U].  The Parser returned by this fq function is also returned as the result of “>>”.  What does that mean?  We get to do something with the results of the Parser on the left and we return a new Parser so we can keep chaining Parsers if we want.

So the right side of the “>>” call in our example is a closure, an sort of anonymous method.  It takes a parameter called len.  This will be of type String because the left side of “>>” is a regex Parser.  Inside the closure we match a literal “:” and call the “~>” method because we don’t really care about the “:”.  We’re interested in the string on the right of the “:” so the right side is a regex call that returns a Parser that matches the number of characters specified by the len parameter.  So now we’ve passed the string length to the closure (and otherwise thrown it away), we’ve thrown away the “:”, and we’ve matched (and kept) the string data.  Finally, we call the “<~” method on a call to the space method.  So we match the trailing whitespace, but we don’t keep it.  Once again try this code with some different input strings and see how it works.

Now we’re only missing the list item.  Remember, we want to match a literal “(“, whitespace, then zero or more items, a literal “)”, and trailing whitespace.  We’re going to make list one of the items, but we’re also going to have items in our list.  You know what that means.  Recursion.

object SvnParser extends RegexParsers {
  override def skipWhitespace = false

  private def space  = regex(new Regex("[ \\n]+"))
  private def number = regex(new Regex("[0-9]+")) <~ space
  private def word   = regex(new Regex("[a-zA-Z][a-zA-Z0-9-]*")) <~ space
  private def string = regex(new Regex("[0-9]+")) >> { len => ":" ~> regex(new Regex(".{" + len + "}")) <~ space }
  private def list: Parser[Any] = "(" ~> space ~> ( item * ) <~ ")" <~ space

  private def item = ( number | word | string | list )

  def parseItem(str: String) = parse(item, str)
}

Easy as pie!  We just add list to the chain of methods in method “item” and define the new list method.  The righthand side is easy to figure out.  Match “(” and throw it away, match whitespace and throw it away, match zero or more items (the “*” is yet another handy Parser method) and keep them, match “)” and throw it away, and then match trailing whilespace and throw it away.  See how easy it is to read those “<~” and “~>” methods?

If you’re not well versed in Scala’s type inference (I’m not) then this is a little confusing at first.  I first tried to run this without the “: Parser[Any]” specified.  The interpreter told me that I had to specify the return type for recursive methods.  Oh yeah!  This is recursive, isn’t it?  Method item calls method list and method list calls method item.  Scala is great at looking at the code and inferring what the types must be, which I love.  Some people don’t like it.  It declutters the code and generally behaves pretty intuitively.  Notice that we don’t have any return types on any of the other methods.  Scala looks at the method body and figures out what the return type is.

So why do we have to specify the return type for recursive methods if we didn’t have to do in for any of the other methods?  Think about it.  In the space method we’re returning whatever is returned from the call to regex (that’s Parser[String]) so Scala infers that the return type of space must also be Parser[String].  But the type of the list method will be whatever is returned by the “*” method, which will be based on the return type of method item.  But the type of method item depends on type of list.  So we go around in circles.  That’s why must we specify the return type in this case.

I was lazy.  I just specified “Parser[Any]“, meaning that any kind of Parser could be returned.  It works, but we could be more specific if we desired stronger type safety.  For example, the “*” method always returns a Parser[List[T]] so we could have specified “Parser[List[Any]]” or maybe even more specific if we can narrow down what T can be.

That’s it.  I don’t claim this is the most efficient parser.  If speed is an issue you could surely get better performance from a custom parser, but you get a lot of functionality from a tiny bit of code with Scala’s parsing package.  Try out more test input strings and see how they behave.  Study that Parser[+T] documentation, and even the source code.  It’s those Parser methods that give you the power to do so much with so little code.

This blog will contain my observations, ideas, and discoveries concerning software development.

« Previous Page