Structure
Dependencies
Sapsan is a python-based framework. Dependencies can be associated with logical modules of the project. The core module does not have any particular dependencies as all classes are implemented using native Python. Lib modules relying heavily on PyTorch with a Catalyst wrapper, as well as scikit-learn for regression-based ML models. CLI module depends on the click library for implementing command line interfaces.
Sapsan is integrated with MLflow to provide for easy and automatic tracking during the experiment stage, as well as saving the trained model. This gives direct access to run history and performance, which in turn gives the user ability to analyze and further tweak their model.
Structure & flexibility
To provide flexibility and scalability of the project a number of abstraction classes were introduced. Core abstractions include:
Core Abstraction | Description |
---|---|
Experiment | main abstraction which encapsulates execution of algorithms, experiments, tests, etc. |
Algorithm | a base class which all models are extended from |
BaseAlgorithm | base class for all algorithms that do not need to be trained and has only run method |
Estimator | an algorithm that has train and predict methods, like regression model or classifier |
Dataset | high level wrapped over dataset loaders |
Next Sapsan has utility abstractions responsible for all-things tracking:
Utility Abstractions | Description |
---|---|
Metric | a single instance of metric emitted during the experiment run |
Parameter | a parameter used in the experiment |
Artifact | artifacts for an algorithm (model weights, images, etc.) |
TrackingBackend | adapter for tracking systems to save metrics, parameters, and artifact |
The project is built around those abstractions to make it easier to reason about. In order to extend the project with new models/algorithms, the user will inherit from Estimator(or BaseAlgorithm) and implement required methods.