Generating a parser¶
-
parsy.
generate
()¶
generate
converts a generator function (one that uses the yield
keyword)
into a parser. The generator function must yield parsers. These parsers are
applied successively and their results are sent back to the generator using the
.send()
protocol. The generator function should return the final result of
the parsing. Alternatively it can return another parser, which is equivalent to
applying it and returning its result.
Motivation and examples¶
Constructing parsers by using combinators and Parser
methods to make
larger parsers works well for many simpler cases. However, for more complex
cases the generate
function decorator is both more readable and more
powerful.
Alternative syntax to combinators¶
The first example just shows a different way of building a parser that could have easily been using combinators:
from parsy import generate
@generate("form")
def form():
"""
Parse an s-expression form, like (a b c).
An equivalent to lparen >> expr.many() << rparen
"""
yield lparen
exprs = yield expr.many()
yield rparen
return exprs
In the example above, the parser was given a string name "form"
, which does
the same as Parser.desc()
. This is not required, as per the examples below.
Note that there is no guarantee that the entire function is executed: if any of the yielded parsers fails, the function will not complete, and parsy will try to backtrack to an alternative parser if there is one.
Building complex objects¶
The second example shows how you can use multiple parse results to build up a complex object:
from datetime import date
from parsy import generate, regex, string
@generate
def date():
"""
Parse a date in the format YYYY-MM-DD
"""
year = yield regex("[0-9]{4}").map(int)
yield string("-")
month = yield regex("[0-9]{2}").map(int)
yield string("-")
day = yield regex("[0-9]{2}").map(int)
return date(year, month, day)
This could also have been achieved using seq()
and Parser.combine()
.
Using values already parsed¶
The third example shows how we can use an earlier parsed value to influence the
subsequent parsing. This example parses Hollerith constants. Holerith constants
are a way of specifying an arbitrary set of characters by first writing the
integer that specifies the length, followed by the character H, followed by the
set of characters. For example, pancakes
would be written 8Hpancakes
.
from parsy import generate, regex, string, any_char
@generate
def hollerith():
num = yield regex(r'[0-9]+').map(int)
yield string('H')
return any_char.times(num).concat()
(You may want to compare this with an implementation of Hollerith constants that uses pyparsing, originally by John Shipman from his pyparsing docs.)
There are also more complex examples in the tutorial of using the generate
decorator to create parsers
where there is logic that is conditional upon earlier parsed values.
Implementing recursive definitions¶
A fourth examples shows how you can use this syntax for grammars that you would like to define recursively (or mutually recursively).
Say we want to be able to pass an s-expression like syntax which uses parenthesis for grouping items into a tree structure, like the following:
(0 1 (2 3) (4 5 6) 7 8)
A naive approach would be:
simple = regex('[0-9]+').map(int)
group = string('(') >> expr.sep_by(string(' ')) << string(')')
expr = simple | group
The problem is that the second line will get a NameError
because expr
is
not defined yet.
Using the @generate
syntax will introduce a level of laziness in resolving
expr
that allow things to work:
simple = regex('[0-9]+').map(int)
@generate
def group():
return (yield string('(') >> expr.sep_by(string(' ')) << string(')'))
expr = simple | group
>>> expr.parse("(0 1 (2 3) (4 5 6) 7 8)")
[0, 1, [2, 3], [4, 5, 6], 7, 8]