Overview¶
Parsy is an easy way to combine simple, small parsers into complex, larger parsers.
If it means anything to you, it’s a monadic parser combinator library for LL(infinity) grammars in the spirit of Parsec, Parsnip, and Parsimmon.
If that means nothing, rest assured that parsy is a very straightforward and Pythonic solution for parsing text that doesn’t require knowing anything about monads.
Parsy differentiates itself from other solutions with the following:
- it is not a parser generator, but a combinator based parsing library.
- a very clean implementation, only a few hundred lines, that borrows from the best of recent combinator libraries.
- free, good quality documentation, all in one place. (Please raise an issue on GitHub if you have any problems, or find the documentation lacking in any way).
- it avoids mutability, and therefore a ton of related bugs.
- it has monadic binding with a nice syntax. In plain
English:
- we can easily handle cases where later parsing depends on the value of something parsed earlier e.g. Hollerith constants.
- it’s easy to build up complex result objects, rather than returning lists of lists etc. which then need to be further processed.
- there is no need for things like pyparsing’s Forward class .
- it has a minimalist philosophy. It doesn’t include built-in helpers for any specific grammars or languages, but provides building blocks for making these.
Basic usage looks like this:
Example 1 - parsing a set of alternatives:
>>> from parsy import string
>>> parser = (string('Dr.') | string('Mr.') | string('Mrs.')).desc("title")
>>> parser.parse('Mrs.')
'Mrs.'
>>> parser.parse('Mr.')
'Mr.'
>>> parser.parse('Joe')
ParseError: expected title at 0:0
>>> parser.parse_partial('Dr. Who')
('Dr.', ' Who')
Example 2 - Parsing a dd-mm-yy date:
>>> from parsy import string, regex
>>> from datetime import date
>>> ddmmyy = regex(r'[0-9]{2}').map(int).sep_by(string("-"), min=3, max=3).combine(
... lambda d, m, y: date(2000 + y, m, d))
>>> ddmmyy.parse('06-05-14')
datetime.date(2014, 5, 6)
To learn how to use parsy, you should continue with:
- the tutorial, especially if you are not familiar with this type of parser library.
- the parser generator decorator
- the builtin parser primitives
- the method and combinator reference
Other Python projects¶
- pyparsing. Also a combinator approach, but in general much less cleanly implemented, and rather scattered documentation, although it has more builtin functionality in terms of provided utilities for certain parsing tasks.
- PLY. A pure Python implementation of the classic lex/yacc parsing tools. It is well suited to large grammars that would be found in typical programming languages.
- funcparserlib - the most
similar to parsy. It differs from parsy mainly in normally using a separate
tokenization phase, lacking the convenience of the
generate()
method for creating parsers, and documentation that relies on understanding Haskell type annotations. - Lark. With Lark you write a grammar definition in a separate mini-language as a string, and have a parser generated for you, rather than writing the grammar in Python. It has the advantage of speed and being able to use different parsing algorithms.