Skip to content

Structure

Dependencies

Sapsan is a python-based framework. Dependencies can be associated with logical modules of the project. The core module does not have any particular dependencies as all classes are implemented using native Python. Lib modules relying heavily on PyTorch with a Catalyst wrapper, as well as scikit-learn for regression-based ML models. CLI module depends on the click library for implementing command line interfaces.

Sapsan is integrated with MLflow to provide for easy and automatic tracking during the experiment stage, as well as saving the trained model. This gives direct access to run history and performance, which in turn gives the user ability to analyze and further tweak their model.

Structure & flexibility

To provide flexibility and scalability of the project a number of abstraction classes were introduced. Core abstractions include:

Core Abstraction Description
Experiment main abstraction which encapsulates execution of algorithms, experiments, tests, etc.
Algorithm a base class which all models are extended from
BaseAlgorithm base class for all algorithms that do not need to be trained and has only run method
Estimator an algorithm that has train and predict methods, like regression model or classifier
Dataset high level wrapped over dataset loaders

Next Sapsan has utility abstractions responsible for all-things tracking:

Utility Abstractions Description
Metric a single instance of metric emitted during the experiment run
Parameter a parameter used in the experiment
Artifact artifacts for an algorithm (model weights, images, etc.)
TrackingBackend adapter for tracking systems to save metrics, parameters, and artifact

The project is built around those abstractions to make it easier to reason about. In order to extend the project with new models/algorithms, the user will inherit from Estimator(or BaseAlgorithm) and implement required methods.