Peg Package¶
A Parsing Expression Grammar (PEG) library for Pony. It provides two ways to
define parsers: writing a .peg grammar file that is compiled at runtime, or
building parsers directly in Pony code using combinators.
PEG File Mode¶
Write a grammar in a .peg file, then compile it with PegCompiler:
use "peg"
use "files"
actor Main
new create(env: Env) =>
try
let auth = FileAuth(env.root)
let source = Source(FilePath(auth, "my_grammar.peg"))?
match recover val PegCompiler(source) end
| let parser: Parser val =>
let input = Source.from_string("some input text")
match recover val parser.parse(input) end
| (_, let ast: AST) => env.out.print(recover val Printer(ast) end)
| (let offset: USize, let r: Parser val) =>
let e = recover val SyntaxError(input, offset, r) end
env.out.writev(PegFormatError.console(e))
end
| let errors: Array[PegError] val =>
for e in errors.values() do
env.out.writev(PegFormatError.console(e))
end
end
end
Use PEG file mode when grammars are user-supplied, loaded from disk, or when you want to iterate on the grammar without recompiling Pony code.
Combinator Mode¶
Build parsers directly in Pony using L (literal), R (unicode range),
Unicode, and operators:
use "peg"
actor Main
new create(env: Env) =>
let digit = R('0', '9')
let number = digit.many1().term(TNumber)
let op = (L("+") / L("-") / L("*") / L("/")).term(TOp)
let expr = (number * op * number).node(TExpr)
let whitespace = (L(" ") / L("\t")).many1()
let parser = recover val expr.hide(whitespace) end
let source = Source.from_string("42 + 7")
match recover val parser.parse(source) end
| (_, let ast: AST) => env.out.print(recover val Printer(ast) end)
| (let offset: USize, let r: Parser val) =>
let e = recover val SyntaxError(source, offset, r) end
env.out.writev(PegFormatError.console(e))
end
primitive TNumber is Label fun text(): String => "Number"
primitive TOp is Label fun text(): String => "Op"
primitive TExpr is Label fun text(): String => "Expr"
Use combinator mode when the grammar is fixed at compile time and you want full type safety and IDE support.
PEG File Grammar Reference¶
Rules are defined as name <- expression and separated by whitespace.
Comments use //, #, or /* ... */ (nested comments are supported).
Operators¶
| Syntax | Name | Description |
|---|---|---|
"text" |
String literal | Matches the exact string |
'c' |
Character literal | Matches a single character |
. |
Any | Matches any character (codepoint >= space) |
'a'..'z' |
Range | Matches a character in the codepoint range |
e1 e2 |
Sequence | Matches e1 followed by e2 |
e1 / e2 |
Ordered choice | Tries e1 first, then e2 if e1 fails |
e? |
Option | Matches e or succeeds with nothing |
e* |
Zero or more | Matches e repeatedly (zero or more times) |
e+ |
One or more | Matches e repeatedly (at least once) |
!e |
Not predicate | Succeeds (consuming nothing) if e fails |
&e |
And predicate | Succeeds (consuming nothing) if e matches |
-e |
Skip (extension) | Matches e but omits it from the parse tree |
e % sep |
Separated list (extension) | Zero or more e separated by sep |
e %+ sep |
Separated list (extension) | One or more e separated by sep |
The -, %, and %+ operators are extensions beyond standard PEG.
Reserved Rule Names¶
start— required entry point; parsing begins herehidden— defines the whitespace/comment channel; tokens matching this rule are automatically skipped between other tokens
Naming Convention¶
- Uppercase rule names (e.g.
NUMBER,STRING) produce terminal tokens: the matched text becomes a singleTokenleaf node - Lowercase rule names (e.g.
value,pair) produceASTnodes with children
Combinator API¶
The combinators mirror the PEG file operators:
| PEG file | Pony combinator |
|---|---|
"text" / 'c' |
L("text") |
. |
R(' ') (Unicode matches all codepoints) |
'a'..'z' |
R('a', 'z') |
e1 e2 |
e1 * e2 |
e1 / e2 |
e1 / e2 |
e? |
e.opt() |
e* |
e.many() |
e+ |
e.many1() |
!e |
not e |
&e |
not not e |
-e |
-e |
e % sep |
e.many(sep) |
e %+ sep |
e.many1(sep) |
| Terminal rule | e.term(MyLabel) |
| Non-terminal rule | e.node(MyLabel) (on Sequence or Many) |
| Hidden channel | e.hide(hidden_parser) |
| End of file | e.eof() |
| Recursive rule | Forward + update() |
Parse Results¶
Calling Parser.parse() returns a ParseResult, which is
(USize, (ParseOK | Parser)):
- Success: the second element is one of:
AST— a labeled tree node with childrenToken— a leaf node with matched source textNotPresent— returned byOptionwhen the optional parse is absentSkipped— returned bySkip(andNoton success)- Failure: the second element is the
Parserthat failed, and theUSizeis the byte offset where it failed. Wrap inSyntaxErrorand format withPegFormatError.console()for display.
Forward References¶
Use Forward to create mutually recursive grammars. Create the Forward
first, use it in other rules, then assign the real rule with update():
let expr = Forward
let group = -L("(") * expr * -L(")")
expr() = group / some_other_rule // () is sugar for update()
Built-in Parsers¶
JsonParser— a JSON parser built from combinators, with comment supportPegParser— the parser for.peggrammar files (used byPegCompiler)
See the examples/ directory for a CLI tool that demonstrates both.
Public Types¶
- class AST
- type ASTChild
- class Choice
- type Defs
- class DuplicateDefinition
- class EndOfFile
- class Forward
- class Hidden
- primitive JsonParser
- type L
- trait Label
- primitive Lex
- class Literal
- primitive MalformedAST
- class Many
- type Marker
- class MissingDefinition
- primitive NoLabel
- primitive NoParser
- primitive NoStartDefinition
- class Not
- primitive NotPresent
- class Option
- type ParseOK
- type ParseResult
- trait Parser
- primitive PegAnd
- primitive PegAny
- primitive PegChar
- primitive PegChoice
- primitive PegCompiler
- primitive PegDef
- trait PegError
- primitive PegFormatError
- primitive PegGrammar
- primitive PegIdent
- class PegLabel
- primitive PegMany
- primitive PegMany1
- primitive PegNot
- primitive PegOpt
- primitive PegParser
- primitive PegRange
- primitive PegSep
- primitive PegSep1
- primitive PegSeq
- primitive PegSkip
- primitive PegString
- primitive Position
- primitive Printer
- type R
- class Sequence
- class Skip
- primitive Skipped
- class Source
- class SyntaxError
- primitive TArray
- primitive TBool
- primitive TNull
- primitive TNumber
- primitive TObject
- primitive TPair
- primitive TString
- class Terminal
- class Token
- primitive Unicode
- class UnicodeRange
- class UnknownNodeLabel