Extensions
Formulaic was designed to be extensible from day one, and nearly all of its core functionality is implemented as "plugins"/"modules" that you can use as examples for how extensions could be written. In this document we will provide a basic high-level overview of the basic components of Formulaic that can extended.
An important consideration is that while Formulaic offers extensible APIs, and effort will be made not to break extension APIs without reason (and never in patch releases), the safest place for you extensions is in Formulaic itself, where they can be kept up to date and maintained (assuming the extension is not overly bespoke). If you think your extensions might help others, feel free to reach out via the issue tracker and/or open a pull request.
Transforms
Transforms are likely the most commonly extended feature of Formulaic, and also likely the least valuable to upstream (since transforms are often domain specific). Documentation for implementing transforms is described in detail in the Transforms user guide.
Materializers
Materializers are responsible for translating formulae into model matrices as documented in the How it works user guide. You need to implement a new materializer if you want to add support for new input and/or output types.
Implementing a new materializer is as simple as subclassing the abstract class
formulaic.materializers.FormulaMaterializer
(or one of its subclasses). This
base class defines the API expected by the rest of the Formulaic system. Example
implementations include
pandas
and
pyarrow.
During subclassing, the new class is registered according to the various
REGISTER_*
attributes if REGISTER_NAME
is specified. This registration
allows looking up of the materializer by name through the model_matrix()
and
.get_model_matrix()
functions. You can always manually pass in your
materializer class explicitly without this registration.
Parsers
Parsers translate a formula string to a set of terms and factors that are then evaluated and assembled into the model matrix, as documented in the How it works user guide. This is unlikely to be necessary very often, but can be used to add additional formula operators, or change the behavior of existing ones.
Formula parsers are expected to implement the API of
formulaic.parser.types.FormulaParser
. The default implementation can be seen
here.
You can pass in custom parsers to Formula()
via the parser
and
nested_parser
options (see inline documentation for more details).
If you are considering extending the parser, please do reach out via the issue tracker.