FParsec is an F# adaptation of Parsec, a free monadic parser combinator library for Haskell. It can parse context-sensitive, infinite look-ahead grammars, has complete support for unicode input and large files (> 4 GB) and produces excellent error messages.
You can download and install it from the above link. To use it in an application, you must add a reference to FParsec.dll and FParsecCS.dll.
In this post I will use FParsec to parse ranges of numerical values, such as [1-100], and IPs, such as [10.5.10.0-10.5.255.255].
I will start by defining some types:
1 2 3 4 5 6 |
type IPAddr = {C1: int64; C2:int64; C3: int64; C4: int64;} type TValue = | Num of int64 | IP of IPAddr type Range = {Min : TValue; Max : TValue;} |
Having that, I can define a parser for numerical ranges like this:
1 2 3 4 5 6 7 8 9 10 |
let p_numrange = parse { do! skipChar '[' let! a = pint64 do! skipChar '-' let! b = pint64 do! skipChar ']' return {Min = Num a; Max = Num b;} } |
That can be read like this: make sure it stars with [ byt skip that, parse an numerical value and let that be a, make sure – follows but skip that, parse a second numerical value and let that be b, make sure ] follows but skip that, and it if successful return a Range of a and b.
To print a Range value I’ll use this function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
let printip (ip:IPAddr) = printf "%d.%d.%d.%d" ip.C1 ip.C2 ip.C3 ip.C4 () let printrange (r:Range) = match (r.Min, r.Max) with | Num(l1),Num(l2) -> printfn "Range: %d-%d" l1 l2 | IP(ip1),IP(ip2) -> printf "Range: " printip ip1 printf "-" printip ip2 printfn "" | _ -> () () |
Here is a test
1 2 3 4 5 |
let result = run p_numrange "[1-100]" match result with | Success r -> printrange r | Failure (msg, err) -> printf "%s\n" msg () |
That outputs:
1 |
Range: 1-100 |
Using “(1-100)” instead of “[1-100]” yields the following error message:
1 2 3 4 |
Error in Ln: 1 Col: 1 (1-100) ^ Expecting: '[' |
Let’s move forward and define an IP parses. Based on what I did so far this should be pretty easy to understand.
1 2 3 4 5 6 7 8 9 10 11 12 |
let p_ip = parse { let! c1 = pint64 do! skipChar '.' let! c2 = pint64 do! skipChar '.' let! c3 = pint64 do! skipChar '.' let! c4 = pint64 return {C1 = c1; C2 = c2; C3 = c3; C4 = c4;} } |
And an IP range parser would look just like the first range parser, except that pint64 would be replaced by p_ip:
1 2 3 4 5 6 7 8 9 10 |
let p_iprange = parse { do! skipChar '[' let! a = p_ip do! skipChar '-' let! b = p_ip do! skipChar ']' return {Min = IP a; Max = IP b;} } |
Here are some helper functions for testing these parses:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
let testrange range = let result = run (p_numrange .>> eof) range match result with | Success r -> printrange r | Failure (msg, err) -> printf "%s\n" msg () let testip ip = let result = run (p_ip .>> eof) ip match result with | Success ip -> printip ip; printfn "" | Failure (msg, err) -> printf "%s\n" msg () let testiprange range = let result = run (p_iprange .>> eof) range match result with | Success r -> printrange r | Failure (msg, err) -> printf "%s\n" msg () |
You probably noticed the construction “(p_numrange .>> eof)”. That reads: parse a range, and then the end of the file and return the range. That helps us making sure nothing follows the range, like “[1-100] more here”.
Executing
1 2 3 |
testrange "(1-100)" testrange "[1-100]" testrange "[1]" |
yields
1 2 3 4 5 6 7 8 9 10 |
Error in Ln: 1 Col: 1 (1-100) ^ Expecting: '[' Range: 1-100 Error in Ln: 1 Col: 3 [1] ^ Expecting: '-' |
Executing
1 2 3 |
testip "192.168" testip "192.168.0.1" testip "192.168.0.1 " |
yields
1 2 3 4 5 6 7 8 9 10 |
Error in Ln: 1 Col: 8 192.168 ^ Expecting: '.' 192.168.0.1 Error in Ln: 1 Col: 12 192.168.0.1 ^ Expecting: end of file |
and executing
1 2 |
testiprange "[10.5.0.1-10.5.255.255]" testiprange "[10-10.5.255.255]" |
yields
1 2 3 4 5 |
Range: 10.5.0.1-10.5.255.255 Error in Ln: 1 Col: 4 [10-10.5.255.255] ^ Expecting: '.' |
But why defining two range parser? Isn’t possible to have just a single one and specify which parser to use for the values? Yes, it is possible, and that would look like this:
1 2 3 4 5 6 7 8 9 10 |
let p_range p = parse { do! skipChar '[' let! a = p do! skipChar '-' let! b = p do! skipChar ']' return {Min = a; Max = b;} } |
Having that, we can re-define the p_numrange and p_iprange like shown bellow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
let p_ip = parse { let! c1 = pint64 do! skipChar '.' let! c2 = pint64 do! skipChar '.' let! c3 = pint64 do! skipChar '.' let! c4 = pint64 return IP {C1 = c1; C2 = c2; C3 = c3; C4 = c4;} } let p_int = parse { let! v = pint64 return Num v } let p_numrange = p_range p_int let p_iprange = p_range p_ip |
Everything above would work, except for testip function that needs change to:
1 2 3 4 5 6 7 8 |
let testip ip = let result = run (p_ip .>> eof) ip match result with | Success ip -> match ip with | IP v -> printip v; printfn "" | _ -> printfn "error" | Failure (msg, err) -> printf "%s\n" msg () |
That’s about it about parsing ranges with FParsec.
Nice article, Marius.
There’s a small issue with the error messages: the caret (^) isn’t displayed at the right position because the whitespace isn’t significant in HTML. You could fix this by putting pre-tags around the error messages.
Right. That was a good catch. I fixed that. Than you for pointing it out.