Yesterday I made a post called Easy Parsing in Scala about using the Scala parsing libraries. I’ve made a couple changes to the code since then.

First, I noticed that the regex method takes a Regex object as its only parameter. Why, I thought to myself, didn’t they just make the method take a String so I don’t have to keep typing “new Regex”. Duh. Big duh. They’re giving me the opportunity to reuse Regex objects instead of stupidly recreating them over and over. So I added three private constant regular expressions that I could reuse: spaceRegex, numberRegex, and wordRegex. I couldn’t make a constant Regex for the one that matches a given number of characters, of course.

Second, I eliminated some repetition by adding a regexAndSpace method that matches a regular expression and then throws away the following whitespace. That’s a job that’s repeated 3 times, so I thought it made sense to factor it out. Without further ado, here’s the updated code:

import scala.util.parsing.combinator._
import scala.util.matching.Regex

object SvnParser extends RegexParsers {
  private val spaceRegex  = new Regex("[ \\n]+");
  private val numberRegex = new Regex("[0-9]+");
  private val wordRegex   = new Regex("[a-zA-Z][a-zA-Z0-9-]*");

  private def space  = regex(spaceRegex)
  private def regexAndSpace(re: Regex) = regex(re) <~ space

  override def skipWhitespace = false

  def number = regexAndSpace(numberRegex)
  def word   = regexAndSpace(wordRegex)
  def string = regex(numberRegex) >> { len => ":" ~> regexAndSpace(new Regex(".{" + len + "}")) }
  def list: Parser[List[Any]] = "(" ~> space ~> ( item + ) <~ ")" <~ space

  def item = ( number | word | string | list )

  def parseItem(str: String) = parse(item, str)
}

SvnParser.parseItem("( 5:abcde 3:abc  \n   20:three separate words     (  abc def     \n\n\n   123 ) ) ") match {
  case SvnParser.Success(result, _) => println(result.toString)
  case _ => println("Could not parse the input string.")
}