Introduction

Formulaic is a high-performance implementation of Wilkinson formulas for Python, which are very useful for transforming dataframes into a form suitable for ingestion into various modelling frameworks (especially linear regression).

It provides:

  • high-performance dataframe to model-matrix conversions.
  • support for reusing the encoding choices made during conversion of one data-set on other datasets.
  • extensible formula parsing.
  • extensible data input/output plugins, with implementations for:
  • input:
    • pandas.DataFrame
    • pyarrow.Table
  • output:
    • pandas.DataFrame
    • numpy.ndarray
    • scipy.sparse.CSCMatrix
  • support for symbolic differentiation of formulas (and hence model matrices).

with more to come!