Parsing ranges with FParsec

FParsec is an F# adaptation of Parsec, a free monadic parser combinator library for Haskell. It can parse context-sensitive, infinite look-ahead grammars, has complete support for unicode input and large files (> 4 GB) and produces excellent error messages.

You can download and install it from the above link. To use it in an application, you must add a reference to FParsec.dll and FParsecCS.dll.

In this post I will use FParsec to parse ranges of numerical values, such as [1-100], and IPs, such as [10.5.10.0-10.5.255.255].

I will start by defining some types:

type IPAddr = {C1: int64; C2:int64; C3: int64; C4: int64;}
type TValue =
   | Num of int64
   | IP of IPAddr

type Range = {Min : TValue; Max : TValue;}

Having that, I can define a parser for numerical ranges like this:

let p_numrange =
   parse
      {
         do! skipChar '['
         let! a = pint64
         do! skipChar '-'
         let! b = pint64
         do! skipChar ']'
         return {Min = Num a; Max = Num b;}
      }

That can be read like this: make sure it stars with [ byt skip that, parse an numerical value and let that be a, make sure - follows but skip that, parse a second numerical value and let that be b, make sure ] follows but skip that, and it if successful return a Range of a and b.

To print a Range value I’ll use this function:

let printip (ip:IPAddr) =
   printf "%d.%d.%d.%d" ip.C1 ip.C2 ip.C3 ip.C4
   ()

let printrange (r:Range) =
   match (r.Min, r.Max) with
   | Num(l1),Num(l2) -> printfn "Range: %d-%d" l1 l2
   | IP(ip1),IP(ip2) ->
      printf "Range: "
      printip ip1
      printf "-"
      printip ip2
      printfn ""
   | _ -> ()
   ()

Here is a test

   let result = run p_numrange "[1-100]"
   match result with
   | Success r -> printrange r
   | Failure (msg, err) -> printf "%s\n" msg
   ()

That outputs:

Range: 1-100

Using “(1-100)” instead of “[1-100]” yields the following error message:

Error in Ln: 1 Col: 1
(1-100)
^
Expecting: '['

Let's move forward and define an IP parses. Based on what I did so far this should be pretty easy to understand.

let p_ip =
   parse
      {
         let! c1 = pint64
         do! skipChar '.'
         let! c2 = pint64
         do! skipChar '.'
         let! c3 = pint64
         do! skipChar '.'
         let! c4 = pint64
         return {C1 = c1; C2 = c2; C3 = c3; C4 = c4;}
      }

And an IP range parser would look just like the first range parser, except that pint64 would be replaced by p_ip:

let p_iprange =
   parse
      {
         do! skipChar '['
         let! a = p_ip
         do! skipChar '-'
         let! b = p_ip
         do! skipChar ']'
         return {Min = IP a; Max = IP b;}
      }

Here are some helper functions for testing these parses:

let testrange range =
   let result = run (p_numrange .>> eof) range
   match result with
   | Success r -> printrange r
   | Failure (msg, err) -> printf "%s\n" msg
   ()

let testip ip =
   let result = run (p_ip .>> eof) ip
   match result with
   | Success ip -> printip ip; printfn ""
   | Failure (msg, err) -> printf "%s\n" msg
   ()   

let testiprange range =
   let result = run (p_iprange .>> eof) range
   match result with
   | Success r -> printrange r
   | Failure (msg, err) -> printf "%s\n" msg
   ()

You probably noticed the construction “(p_numrange .>> eof)”. That reads: parse a range, and then the end of the file and return the range. That helps us making sure nothing follows the range, like “[1-100] more here”.

Executing

testrange "(1-100)"
testrange "[1-100]"
testrange "[1]"

yields

Error in Ln: 1 Col: 1
(1-100)
^
Expecting: '['

Range: 1-100
Error in Ln: 1 Col: 3
[1]
  ^
Expecting: '-'

Executing

testip "192.168"
testip "192.168.0.1"
testip "192.168.0.1 "

yields

Error in Ln: 1 Col: 8
192.168
       ^
Expecting: '.'

192.168.0.1
Error in Ln: 1 Col: 12
192.168.0.1
           ^
Expecting: end of file

and executing

testiprange "[10.5.0.1-10.5.255.255]"
testiprange "[10-10.5.255.255]"

yields

Range: 10.5.0.1-10.5.255.255
Error in Ln: 1 Col: 4
[10-10.5.255.255]
   ^
Expecting: '.'

But why defining two range parser? Isn’t possible to have just a single one and specify which parser to use for the values? Yes, it is possible, and that would look like this:

let p_range p =
   parse
      {
         do! skipChar '['
         let! a = p
         do! skipChar '-'
         let! b = p
         do! skipChar ']'
         return {Min = a; Max = b;}
      }

Having that, we can re-define the p_numrange and p_iprange like shown bellow:

let p_ip =
   parse
      {
         let! c1 = pint64
         do! skipChar '.'
         let! c2 = pint64
         do! skipChar '.'
         let! c3 = pint64
         do! skipChar '.'
         let! c4 = pint64
         return IP {C1 = c1; C2 = c2; C3 = c3; C4 = c4;}
      }

let p_int =
   parse
      {
         let! v = pint64
         return Num v
      }

let p_numrange = p_range p_int
let p_iprange = p_range p_ip

Everything above would work, except for testip function that needs change to:

let testip ip =
   let result = run (p_ip .>> eof) ip
   match result with
   | Success ip -> match ip with
                   | IP v -> printip v; printfn ""
                   | _ -> printfn "error"
   | Failure (msg, err) -> printf "%s\n" msg
   ()

That’s about it about parsing ranges with FParsec.

Hits for this post: 9610 .
Trackback

2 comments untill now

  1. Gravatar

    Nice article, Marius.

    There’s a small issue with the error messages: the caret (^) isn’t displayed at the right position because the whitespace isn’t significant in HTML. You could fix this by putting pre-tags around the error messages.

  2. Gravatar
    mariusbancila @ 2008-11-07 02:14

    Right. That was a good catch. I fixed that. Than you for pointing it out.

Add your comment now