Overview

Parsy is an easy way to combine simple, small parsers into complex, larger parsers.

If it means anything to you, it’s a monadic parser combinator library for LL(infinity) grammars in the spirit of Parsec, Parsnip, and Parsimmon.

If that means nothing, rest assured that parsy is a very straightforward and Pythonic solution for parsing text that doesn’t require knowing anything about monads.

Parsy differentiates itself from other solutions with the following:

  • it is not a parser generator, but a combinator based parsing library.

  • a very clean implementation, only a few hundred lines, that borrows from the best of recent combinator libraries.

  • it produces fairly terse code, with an embedded DSL feel — not too far from things like EBNF notation or Haskell’s parsec.

  • free, good quality documentation, all in one place. (Please raise an issue on GitHub if you have any problems, or find the documentation lacking in any way).

  • it avoids mutability, and therefore a ton of related bugs.

  • it has monadic binding with a nice syntax. In plain English:

    • we can easily handle cases where later parsing depends on the value of something parsed earlier e.g. Hollerith constants.

    • it’s easy to build up complex result objects, rather than returning lists of lists etc. which then need to be further processed.

  • it has a minimalist philosophy. It doesn’t include built-in helpers for any specific grammars or languages, but provides building blocks for making these.

Basic usage looks like this:

Example 1 - parsing a set of alternatives:

>>> from parsy import string
>>> title = (string('Dr.') | string('Mr.') | string('Mrs.')).desc("title")
>>> title.parse('Mrs.')
'Mrs.'
>>> title.parse('Mr.')
'Mr.'

>>> title.parse('Joe')
ParseError: expected title at 0:0

>>> title.parse_partial('Dr. Who')
('Dr.', ' Who')

Example 2 - Parsing a dd-mm-yy date:

>>> from parsy import string, regex
>>> from datetime import date
>>> ddmmyy = regex(r'[0-9]{2}').map(int).sep_by(string("-"), min=3, max=3).combine(
...                lambda d, m, y: date(2000 + y, m, d))
>>> ddmmyy.parse('06-05-14')
datetime.date(2014, 5, 6)

To learn how to use parsy, you should continue with:

Other Python projects

This library isn’t for everyone or for every project. It excels at quickly writing easy-to-read parsers for relatively small languages, and it’s great if you are a relative newcomer to the subject of parsing but want something better than str.split. If you have demanding needs in terms of performance, or producing good error messages, you may need to look elsewhere. Below are some other Python libraries you might consider:

  • PLY. A pure Python implementation of the classic lex/yacc parsing tools. It is well suited to large grammars that would be found in typical programming languages.

  • Lark. With Lark you write a grammar definition in a separate mini-language as a string, and have a parser generated for you, rather than writing the grammar in Python. It has the advantage of speed and being able to use different parsing algorithms.

  • pyparsing. Also a combinator approach, but in general much less cleanly implemented, and rather scattered documentation, although it has more builtin functionality in terms of provided utilities for certain parsing tasks.

  • funcparserlib - the most similar to parsy. It differs from parsy mainly in normally using a separate tokenization phase and lacking the convenience of the generate() method for creating parsers.