Yesterday I made a post called Easy Parsing in Scala about using the Scala parsing libraries. I’ve made a couple changes to the code since then.
First, I noticed that the regex method takes a Regex object as its only parameter. Why, I thought to myself, didn’t they just make the method take a String so I don’t have to keep typing “new Regex”. Duh. Big duh. They’re giving me the opportunity to reuse Regex objects instead of stupidly recreating them over and over. So I added three private constant regular expressions that I could reuse: spaceRegex, numberRegex, and wordRegex. I couldn’t make a constant Regex for the one that matches a given number of characters, of course.
Second, I eliminated some repetition by adding a regexAndSpace method that matches a regular expression and then throws away the following whitespace. That’s a job that’s repeated 3 times, so I thought it made sense to factor it out. Without further ado, here’s the updated code:
import scala.util.parsing.combinator._
import scala.util.matching.Regex
object SvnParser extends RegexParsers {
private val spaceRegex = new Regex("[ \\n]+");
private val numberRegex = new Regex("[0-9]+");
private val wordRegex = new Regex("[a-zA-Z][a-zA-Z0-9-]*");
private def space = regex(spaceRegex)
private def regexAndSpace(re: Regex) = regex(re) <~ space
override def skipWhitespace = false
def number = regexAndSpace(numberRegex)
def word = regexAndSpace(wordRegex)
def string = regex(numberRegex) >> { len => ":" ~> regexAndSpace(new Regex(".{" + len + "}")) }
def list: Parser[List[Any]] = "(" ~> space ~> ( item + ) <~ ")" <~ space
def item = ( number | word | string | list )
def parseItem(str: String) = parse(item, str)
}
SvnParser.parseItem("( 5:abcde 3:abc \n 20:three separate words ( abc def \n\n\n 123 ) ) ") match {
case SvnParser.Success(result, _) => println(result.toString)
case _ => println("Could not parse the input string.")
}
Don’t forget to
subscribe to my RSS feed, or
follow this blog on Twitter.
Copyright © 2008 Matthew Jason Malone
September 22, 2008 at 12:34 am
You can type “new Regex” a lot less yet. There’s an “r” method defined on RichString which yields a Regex, so new Regex(“[0-9]+”) can be written as “[0-9]+”.r.
September 22, 2008 at 3:47 am
Thanks for the tip! Very succinct.