API Reference

Glossary

Variable	Definition
N	# of Batches
C_in	# of input channels (i.e. features)
D or D_b	Data or Batch depth (z)
H or H_b	Data or Batch height (y)
W or W_b	Data or Batch width (x)

Train/Evaluate

Train

CLASS

sapsan.lib.experiments.train.Train(model: Estimator, data_parameters: dict, backend = FakeBackend(), show_log = True, run_name = 'train')

Call Train to set up your run

Parameters

Name	Type	Discription	Default
`model`	object	model to use for training
`data_parameters`	dict	data parameters from the data loader, necessary for tracking
`backend`	object	backend to track the experiment	FakeBackend()
`show_log`	bool	show the loss vs. epoch progress plot (it will be save in mlflow in either case)	True
`run_name`	str	'run name' tag as recorded under MLflow	train

sapsan.lib.experiments.train.Train.run()

Run the model

Return

Type	Description
pytorch or sklearn or custom type	trained model

Evaluate

CLASS

sapsan.lib.experiments.evaluate.Evaluate(model: Estimator, data_parameters: dict, backend = FakeBackend(), cmap: str = 'plasma', run_name: str = 'evaluate', **kwargs)

Call Evaluate to set up the testing of the trained model. Don't forget to update estimator.loaders with the new data for testing.

Parameters

Name	Type	Discription	Default
`model`	object	model to use for testing
`data_parameters`	dict	data parameters from the data loader, necessary for tracking
`backend`	obejct	backend to track the experiment	FakeBackend()
`cmap`	str	matplotlib colormap to use for slice plots	plasma
`run_name`	str	'run name' tag as recorded under MLflow	evaluate
`pdf_xlim`	tuple	x-axis limits for the PDF plot
`pdf_ylim`	tuple	y-axis limits for the PDF plot

sapsan.lib.experiments.evaluate.Evaluate.run()

Run the evaluation of the trained model

Return

Type	Description
dict{'target' : np.ndarray, 'predict' : np.ndarray}	target and predicted data

Estimators

CNN3d

CLASS

sapsan.lib.estimator.CNN3d(loaders, config, model)

A model based on Pytorch's 3D Convolutional Neural Network

Parameters

Name	Type	Discription	Default
`loaders`	dict	contains input and target data (loaders['train'], loaders['valid']). Datasets themselves have to be torch.tensor(s)	CNN3dConfig()
`config`	class	configuration to use for the model	CNN3dConfig()
`model`	class	the model itself - should not be adjusted	CNN3dModel()

sapsan.lib.estimator.CNN3d.save(path: str)

Saves model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively

sapsan.lib.estimator.CNN3d.load(path: str, estimator, load_saved_config = False)

Loads model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further.
`load_saved_config`	bool	updates config parameters from `{path}/params.json`.	False

Return

Type	Description
pytorch model	loaded model

CLASS

sapsan.lib.estimator.CNN3dConfig(n_epochs, patience, min_delta, logdir, lr, min_lr, *args, **kwargs)

Configuration for the CNN3d - based on pytorch and catalyst libraries

Parameters

Name	Type	Discription	Default
`n_epochs`	int	number of epochs	1
`patience`	int	number of epochs with no improvement after which training will be stopped. Default	10
`min_delta`	float	minimum change in the monitored metric to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement	1e-5
`log_dir`	int	path to store the logs	./logs/
`lr`	float	learning rate	1e-3
`min_lr`	float	a lower bound of the learning rate for ReduceLROnPlateau	lr*1e-2
`device`	str	specify the device to run the model on	cuda (or switch to cpu)
`loader_key`	str	the loader to use for early stop: train or valid	first loader provided*, which is usually 'train'
`metric_key`	str	the metric to use for early stop	'loss'
`ddp`	bool	turn on Distributed Data Parallel (DDP) in order to distribute the data and train the model across multiple GPUs. This is passed to Catalyst to activate the `ddp` flag in `runner` (see more Distributed Training Tutorial; the `runner` is set up in pytorch_estimator.py). Note: doesn't support jupyter notebooks - prepare a script!	False

PIMLTurb

CLASS

sapsan.lib.estimator.PIMLTurb(activ, loss, loaders, ks_stop, ks_frac, ks_scale, l1_scale, l1_beta, sigma, config, model)

Physics-informed machine learning model to predict Reynolds-like stress tensor, \(Re\), for turbulence modeling. Learn more on the wiki: PIMLTurb

A custom loss function was developed for this model combining spatial (SmoothL1) and statistical (Kolmogorov-Smirnov) losses.

Parameters

Name	Type	Discription	Default
`activ`	str	activation function to use from PyTorch	Tanhshrink
`loss`	str	loss function to use; accepts only custom	SmoothL1_KSLoss
`loaders`	dict	contains input and target data (loaders['train'], loaders['valid']). Datasets themselves have to be torch.tensor(s)
`ks_stop`	float	early-stopping condition based on the KS loss value alone	0.1
`ks_frac`	float	fraction the KS loss contributes to the total loss	0.5
`ks_scale`	float	scale factor to prioritize KS loss over SmoothL1 (should not be altered)	1
`l1_scale`	float	scale factor to prioritize SmoothL1 loss over KS	1
`l1_beta`	float	\(beta\) threshold for smoothing the L1 loss	1
`sigma`	float	\(sigma\) for the last layer of the network that performs a filtering operation using a Gaussian kernel	1
`config`	class	configuration to use for the model	PIMLTurbConfig()
`model`	class	the model itself - should not be adjusted	PIMLTurbModel()

sapsan.lib.estimator.PIMLTurb.save(path: str)

Saves model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively

sapsan.lib.estimator.PIMLTurb.load(path: str, estimator, load_saved_config = False)

Loads model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further.
`load_saved_config`	bool	updates config parameters from `{path}/params.json`.	False

Return

Type	Description
pytorch model	loaded model

CLASS

sapsan.lib.estimator.PIMLTurbConfig(n_epochs, patience, min_delta, logdir, lr, min_lr, *args, **kwargs)

Configuration for the PIMLTurb - based on pytorch (catalyst is not used)

Parameters

Name	Type	Discription	Default
`n_epochs`	int	number of epochs	1
`patience`	int	number of epochs with no improvement after which training will be stopped (not used)	10
`min_delta`	float	minimum change in the monitored metric to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement (not used)	1e-5
`log_dir`	int	path to store the logs	./logs/
`lr`	float	learning rate	1e-3
`min_lr`	float	a lower bound of the learning rate for ReduceLROnPlateau	lr*1e-2
`device`	str	specify the device to run the model on	cuda (or switch to cpu)

PIMLTurb1D

CLASS

sapsan.lib.estimator.PIMLTurb1D(activ, loss, loaders, ks_stop, ks_frac, ks_scale, l1_scale, l1_beta, sigma, config, model)

Physics-informed machine learning model to predict Reynolds-like stress tensor, \(Re\), for turbulence modeling. Learn more on the wiki: PIMLTurb

A custom loss function was developed for this model combining spatial (SmoothL1) and statistical (Kolmogorov-Smirnov) losses.

Parameters

Name	Type	Discription	Default
`activ`	str	activation function to use from PyTorch	Tanhshrink
`loss`	str	loss function to use; accepts only custom	SmoothL1_KSLoss
`loaders`	dict	contains input and target data (loaders['train'], loaders['valid']). Datasets themselves have to be torch.tensor(s)
`ks_stop`	float	early-stopping condition based on the KS loss value alone	0.1
`ks_frac`	float	fraction the KS loss contributes to the total loss	0.5
`ks_scale`	float	scale factor to prioritize KS loss over SmoothL1 (should not be altered)	1
`l1_scale`	float	scale factor to prioritize SmoothL1 loss over KS	1
`l1_beta`	float	\(beta\) threshold for smoothing the L1 loss	1
`sigma`	float	\(sigma\) for the last layer of the network that performs a filtering operation using a Gaussian kernel	1
`config`	class	configuration to use for the model	PIMLTurb1DConfig()
`model`	class	the model itself - should not be adjusted	PIMLTurb1DModel()

sapsan.lib.estimator.PIMLTurb1D.save(path: str)

Saves model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively

sapsan.lib.estimator.PIMLTurb1D.load(path: str, estimator, load_saved_config = False)

Loads model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further.
`load_saved_config`	bool	updates config parameters from `{path}/params.json`.	False

Return

Type	Description
pytorch model	loaded model

CLASS

sapsan.lib.estimator.PIMLTurb1DConfig(n_epochs, patience, min_delta, logdir, lr, min_lr, *args, **kwargs)

Configuration for the PIMLTurb1D - based on pytorch (catalyst is not used)

Parameters

Name	Type	Discription	Default
`n_epochs`	int	number of epochs	1
`patience`	int	number of epochs with no improvement after which training will be stopped (not used)	10
`min_delta`	float	minimum change in the monitored metric to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement (not used)	1e-5
`log_dir`	int	path to store the logs	./logs/
`lr`	float	learning rate	1e-3
`min_lr`	float	a lower bound of the learning rate for ReduceLROnPlateau	lr*1e-2
`device`	str	specify the device to run the model on	cuda (or switch to cpu)

PICAE

CLASS

sapsan.lib.estimator.PICAE(loaders, config, model)

Convolutional Auto Encoder with Divergence-Free Kernel and with periodic padding. Further details can be found on the PICAE page

Parameters

Name	Type	Discription	Default
`loaders`	dict	contains input and target data (loaders['train'], loaders['valid']). Datasets themselves have to be torch.tensor(s)
`config`	class	configuration to use for the model	PICAEConfig()
`model`	class	the model itself - should not be adjusted	PICAEModel()

sapsan.lib.estimator.PICAE.save(path: str)

Saves model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively

sapsan.lib.estimator.PICAE.load(path: str, estimator, load_saved_config = False)

Loads model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further.
`load_saved_config`>	bool	updates config parameters from `{path}/params.json`	False

Return

Type	Description
pytorch model	loaded model

CLASS

sapsan.lib.estimator.PICAEConfig(n_epochs, patience, min_delta, logdir, lr, min_lr, weight_decay, nfilters, kernel_size, enc_nlayers, dec_nlayers, *args, **kwargs)

Configuration for the CNN3d - based on pytorch and catalyst libraries

Parameters

Name	Type	Discription	Default
`n_epochs`	int	number of epochs	1
`batch_dim`	int	dimension of a batch in each axis	64
`patience`	int	number of epochs with no improvement after which training will be stopped	10
`min_delta`	float	minimum change in the monitored metric to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement	1e-5
`log_dir`	str	path to store the logs	./logs/
`lr`	float	learning rate	1e-3
`min_lr`	float	a lower bound of the learning rate for ReduceLROnPlateau	lr*1e-2
`weight_decay`	float	weight decay (L2 penalty)	1e-5
`nfilters`	int	the output dim for each convolutional layer, which is the number of "filters" learned by that layer	6
`kernel_size`	tuple	size of the convolutional kernel	(3,3,3)
`enc_layers`	int	number of encoding layers	3
`dec_layers`	int	number of decoding layers	3
`device`	str	specify the device to run the model on	cuda (or switch to cpu)
`loader_key`	str	the loader to use for early stop: train or valid	first loader provided*, which is usually 'train'
`metric_key`	str	the metric to use for early stop	'loss'
`ddp`	bool	turn on Distributed Data Parallel (DDP) in order to distribute the data and train the model across multiple GPUs. This is passed to Catalyst to activate the `ddp` flag in `runner` (see more Distributed Training Tutorial; the `runner` is set up in pytorch_estimator.py). Note: doesn't support jupyter notebooks - prepare a script!	False

KRR

CLASS

sapsan.lib.estimator.KRR(loaders, config, model)

A model based on sk-learn Kernel Ridge Regression

Parameters

Name	Type	Discription	Default
`loaders`	list	contains input and target data
`config`	class	configuration to use for the model	KRRConfig()
`model`	class	the model itself - should not be adjusted	KRRModel()

sapsan.lib.estimator.KRR.save(path: str)

Saves the model

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively

sapsan.lib.estimator.KRR.load(path: str, estimator, load_saved_config = False)

Loads the model

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further.
`load_saved_config`	bool	updates config parameters from `{path}/params.json`	False

Return

Type	Description
sklearn model	loaded model

CLASS

sapsan.lib.estimator.KRRConfig(alpha, gamma)

Configuration for the KRR model

Parameters

Name	Type	Discription	Default
`alpha`	float	regularization term, hyperparameter	None
`gamma`	float	full-width at half-max for the RBF kernel, hyperparameter	None

load_estimator

CLASS

sapsan.lib.estimator.load_estimator()

Dummy estimator to call load() to load the saved pytorch models

sapsan.lib.estimator.load_estimator.load(path: str, estimator, load_saved_config = False)

Loads model and optimizer states, as well as final epoch and loss

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup, changing `n_epochs` to keep training the model further
`load_saved_config`	bool	updates config parameters from `{path}/params.json`	False

Return

Type	Description
pytorch model	loaded model

load_sklearn_estimator

CLASS

sapsan.lib.estimator.load_sklearn_estimator()

Dummy estimator to call load() to load the saved sklearn models

sapsan.lib.estimator.load_sklearn_estimator.load(path: str, estimator, load_saved_config = False)

Loads model

Parameters

Name	Type	Discription	Default
`path`	str	save path of the model and its config parameters, `{path}/model.pt` and `{path}/params.json` respectively
`estimator`	estimator	need to provide an initialized model for which to load the weights. The estimator can include a new config setup to keep training the model further
`load_saved_config`	bool	updates config parameters from `{path}/params.json`	False

Return

Type	Description
sklearn model	loaded model

Torch Modules

Gaussian

CLASS

sapsan.lib.estimator.torch_modules.Gaussian(sigma: int)

[1D,3D] Applies a Guassian filter as a torch layer through a series of 3 separable 1D convolutions, utilizing torch.nn.funcitonal.conv3d. CUDA is supported.

Parameters

Name	Type	Discription	Default
`sigma`	int	standard deviation \(\sigma\) for a Gaussian kernel	2

sapsan.lib.estimator.torch_modules.Gaussian.forward(tensor: torch.tensor)

Parameters

Name	Type	Discription	Default
`tensor`	torch.tensor	input torch tensor of shape [N, C_in, D_in, H_in, W_in]

Return

Type	Description
torch.tensor	filtered 3D torch data

Interp1d

CLASS

sapsan.lib.estimator.torch_modules.Interp1d()

Linear 1D interpolation done in native PyTorch. CUDA is supported. Forked from @aliutkus

sapsan.lib.estimator.torch_modules.Interp1d.forward(x: torch.tensor, y: torch.tensor, xnew: torch.tensor, out: torch.tensor)

Parameters

Name	Type	Discription	Default
`x`	torch.tensor	1D or 2D tensor
`y`	torch.tensor	1D or 2D tensor; the length of `y` along its last dimension must be the same as that of `x`
`xnew`	torch.tensor	1D or 2D tensor of real values. `xnew` can only be 1D if both `x` and `y` are 1D. Otherwise, its length along the first dimension must be the same as that of whichever `x` and `y` is 2D.
`out`	torch.tensor	Tensor for the output	If None, allocated automatically

Return

Type	Description
torch.tensor	interpolated tensor

Data Loaders

HDF5Dataset

CLASS

sapsan.lib.data.hdf5_dataset.HDF5Dataset( path: str, features: List[str], target: List[str], checkpoints: List[int], batch_size: int = None, input_size: int = None, sampler: Optional[Sampling] = None, time_granularity: float = 1, features_label: Optional[List[str]] = None, target_label: Optional[List[str]] = None, flat: bool = False, shuffle: bool=False, train_fraction = None)

HDF5 data loader class

Parameters

Name	Type	Discription	Default
`path`	str	path to the data in the following format: `"data/t_{checkpoint:1.0f}/{feature}_data.h5"`
`features`	List[str]	list of train features to load	['not_specified_data']
`target`	List[str]	list of target features to load	None
`checkpoints`	List[int]	list of checkpoints to load (they will be appended as batches)
`input_size`	int	dimension of the loaded data in each axis
`batch_size`	int	dimension of a batch in each axis. If batch_size != input_size, the datacube will be evenly splitted	input_size (doesn't work with sampler)
`batch_num`	int	the number of batches to be loaded at a time	1
`sampler`	object	data sampler to use (ex: EquidistantSampling())	None
`time_granularity`	float	what is the time separation (dt) between checkpoints	1
`features_label`	List[str]	hdf5 data label for the train features	list(file.keys())[-1], i.e. last one in hdf5 file
`target_label`	List[str]	hdf5 data label for the target features	list(file.keys())[-1], i.e. last one in hdf5 file
`flat`	bool	flatten the data into [C_in, DHW]. Required for sk-learn models	False
`shuffle`	bool	shuffle the dataset	False
`train_fraction`	float or int	a fraction of the dataset to be used for training (accessed through loaders['train']). The rest will be used for validation (accessed through loaders['valid']). If int is provided, then that number of batches will be used for training. If float is provided, then it will try to split the data either by batch or by actually slicing the data cube into smaller chunks	None - training data will be used for validation, effectively skipping the latter

sapsan.lib.data.hdf5_dataset.HDF5Dataset.load_numpy()

HDF5 data loader method - call it to load the data as a numpy array. If targets are not specified, than only features will be loaded (hence you can just load 1 dataset at a time).

Return

Type	Description
np.ndarray, np.ndarray	loaded a dataset as a numpy array

sapsan.lib.data.hdf5_dataset.HDF5Dataset.convert_to_torch([x, y])

Splits numpy arrays into batches and converts to torch dataloader

Parameters

Name	Type	Discription	Default
`[x, y]`	list or np.ndarray	a list of input datasets to batch and convert to torch loaders

Return

Type	Description
OrderedDict{'train' : DataLoader, 'valid' : DataLoader }	Data in Torch Dataloader format ready for training

sapsan.lib.data.hdf5_dataset.HDF5Dataset.load()

Loads, splits into batches, and converts into torch dataloader. Effectively combines .load_numpy and .convert_to_torch

Return

Type	Description
np.ndarray, np.ndarray	loaded train and target features: x, y

get_loader_shape

sapsan.lib.data.data_functions.get_loader_shape()

Returns the shape of the loaded tensors - the loaded data that has been split into train and valid datasets.

Parameters

Name	Type	Discription	Default
`loaders`	torch DataLoader	the loader of tensors passed for training
`name`	str	name of the dataset in the loaders; usually either `train` or `valid`	None - chooses the first entry in loaders

Return

Type	Description
np.ndarray	shape of the tensor

Data Manipulation

EquidistantSampling

CLASS

sapsan.lib.data.sampling.EquidistantSampling(target_dim)

Samples the data to a lower dimension, keeping separation between the data points equally distant

Parameters

Name	Type	Discription	Default
`target_dim`	np.ndarray	new shape of the input in the form [D, H, W]

sapsan.lib.data.sampling.EquidistantSampling.sample(data)

Performs sampling of the data

Parameters

Name	Type	Discription	Default
`data`	np.ndarray	input data to be sampled - has the shape of [axis, D, H, W]

Return

Type	Description
np.ndarray	Sampled data with the shape [axis, D, H, W]

split_data_by_batch

sapsan.utils.shapes.split_data_by_batch(data: np.ndarray, size: int, batch_size: int, n_features: int, axis: int)

[2D, 3D]: splits data into smaller cubes or squares of batches

Parameters

Name	Type	Discription
`data`	np.ndarray	input 2D or 3D data, [C_in, D, H, W]
`size`	int	dimensionality of the data in each axis
`batch_size`	int	dimensionality of the batch in each axis
`n_features`	int	number of channels of the input data
`axis`	int	number of axes, 2 or 3

Return

Type	Description
np.ndarray	batched data: [N, C_in, D_b, H_b, W_b]

combine_data

sapsan.utils.shapes.combine_data(data: np.ndarray, input_size: tuple, batch_size: tuple, axis: int)

[2D, 3D] - reverse of split_data_by_batch function

Parameters

Name	Type	Discription
`data`	np.ndarray	input 2D or 3D data, [N, C_in, D_b, H_b, W_b]
`input_size`	tuple	dimensionality of the original data in each axis
`batch_size`	tuple	dimensionality of the batch in each axis
`axis`	int	number of axes, 2 or 3

Return

Type	Description
np.ndarray	reassembled data: [C_in, D, H, W]

slice_of_cube

sapsan.utils.shapes.slice_of_cube(data: np.ndarray, feature: Optional[int] = None, n_slice: Optional[int] = None)

Select a slice of a cube (to plot later)

Parameters

Name	Type	Discription	Default
`data`	np.ndarray	input 3D data, [C_in, D, H, W]
`feature`	int	feature to take the slice of, i.e. the value of C_in	1
`n_slice`	int	what slice to select, i.e. the value of D	1

Return

Type	Description
np.ndarray	data slice: [H, W]

Filter

spectral

sapsan.utils.filter.spectral(im: np.ndarray, fm: int)

[2D, 3D] apply a spectral filter

Parameters

Name	Type	Discription	Default
`im`	np.ndarray	input dataset (ex: [C_in, D, H, W])
`fm`	int	number of Fourier modes to filter down to

Return

Type	Description
np.ndarray	filtered dataset

box

sapsan.utils.filter.box(im: np.ndarray, ksize)

[2D] apply a box filter

Parameters

Name	Type	Discription	Default
`im`	np.ndarray	input dataset (ex: [C_in, H, W])
`ksize`	tupple	kernel size (ex: ksize = (2,2))

Return

Type	Description
np.ndarray	filtered dataset

gaussian

sapsan.utils.filter.gaussian(im: np.ndarray, sigma)

[2D, 3D] apply a gaussian filter

Note: Guassian filter assumes dx=1 between the points. Adjust sigma accordingly.

Parameters

Name	Type	Discription	Default
`im`	np.ndarray	input dataset (ex: [H, W] or [D, H, W])
`sigma`	float or tuple of floats	standard deviation for Gaussian kernel. Sigma can be defined for each axis individually.

Return

Type	Description
np.ndarray	filtered dataset

Backend (Tracking)

MLflowBackend

CLASS

sapsan.lib.backends.mlflow.MLflowBackend(name, host, port)

Initilizes mlflow and starts up mlflow ui at a given host:port

Parameters

Name	Type	Discription	Default
`name`	str	name under which to record the experiment	"experiment"
`host`	str	host of mlflow ui	"localhost"
`port`	int	port of mlflow ui	5000

sapsan.lib.backends.mlflow.MLflowBackend.start_ui()

starts MLflow ui at a specified host and port

sapsan.lib.backends.mlflow.MLflowBackend.start(run_name: str, nested = False, run_id = None)

Starts a tracking run

Parameters

Name	Type	Discription	Default
`run_name`	str	name of the run	"train" for `Train()`, "evaluate" for `Evaluate()`
`nested`	bool	whether or not to nest the recorded run	False for `Train()`, True for `Evaluate()`
`run_id`	str	run id	None - a new will be generated

Return

Type	Description
str	run_id

sapsan.lib.backends.mlflow.MLflowBackend.resume(run_id, nested = True)

Resumes a previous run, so you can record extra parameters

Parameters

Name	Type	Discription	Default
`run_id`	str	id of the run to resume
`nested`	bool	whether or not to nest the recorded run	True, since it will usually be an `Evaluate()` run

sapsan.lib.backends.mlflow.MLflowBackend.log_metric()

Logs a metric

sapsan.lib.backends.mlflow.MLflowBackend.log_parameter()

Logs a parameter

sapsan.lib.backends.mlflow.MLflowBackend.log_artifact()

Logs an artifact (any saved file such, e.g. .png, .txt)

sapsan.lib.backends.mlflow.MLflowBackend.log_model()

Log a PyTorch model as an MLflow artifact for the current run. Corresponds to mlflow.pytorch.log_model()

sapsan.lib.backends.mlflow.MLflowBackend.load_model()

Load a PyTorch model from a local file or a run. Corresponds to mlflow.pytorch.load_model()

sapsan.lib.backends.mlflow.MLflowBackend.close_active_run()

Closes all active MLflow runs

sapsan.lib.backends.mlflow.MLflowBackend.end()

Ends the most recent MLflow run

FakeBackend

CLASS

sapsan.lib.backends.fake.FakeBackend()

Pass to train in order to disable backend (tracking)

Plotting

plot_params

sapsan.utils.plot.plot_params()

Contains the matplotlib parameters that format all of the plots (font.size, axes.labelsize, etc.)

Return

Type	Description
dict	matplotlib parameters

Default Parameters

def plot_params():
    params = {'font.size': 14, 'legend.fontsize': 14, 
            'axes.labelsize': 20, 'axes.titlesize': 24,
            'xtick.labelsize': 17,'ytick.labelsize': 17,
            'axes.linewidth': 1, 'patch.linewidth': 3, 
            'lines.linewidth': 3,
            'xtick.major.width': 1.5,'ytick.major.width': 1.5,
            'xtick.minor.width': 1.25,'ytick.minor.width': 1.25,
            'xtick.major.size': 7,'ytick.major.size': 7,
            'xtick.minor.size': 4,'ytick.minor.size': 4,
            'xtick.direction': 'in','ytick.direction': 'in',              
            'axes.formatter.limits': [-7, 7],'axes.grid': True, 
            'grid.linestyle': ':','grid.color': '#999999',
            'text.usetex': False,}
    return params

pdf_plot

sapsan.utils.plot.pdf_plot(series: List[np.ndarray], bins: int = 100, label: Optional[List[str]] = None, figsize: tuple, dpi: int, ax: matplotlib.axes, style: str)

Plot a probability density function (PDF) of a single or multiple datasets

Parameters

Name	Type	Discription	Default
`series`	List[np.ndarray]	input datasets
`bins`	int	number of bins to use for the dataset to generate the pdf	100
`label`	List[str]	list of names to use as labels in the legend	None
`figsize`	tuple	figure size as passed to matplotlib figure	(6,6)
`dpi`	int	resolution of the figure	60
`ax`	matplotlib.axes	axes object to use for plotting (if you want to define your own figure and subplots)	None - creates a separate figure
`style`	str	accepts matplotlib styles	'tableau-colorblind10'

Return

Type	Description
matplotlib.axes object	ax

cdf_plot

sapsan.utils.plot.cdf_plot(series: List[np.ndarray], bins: int = 100, label: Optional[List[str]] = None, figsize: tuple, dpi: int, ax: matplotlib.axes, ks: bool, style: str)

Plot a cumulative distribution function (CDF) of a single or multiple datasets

Parameters

Name	Type	Discription	Default
`series`	List[np.ndarray]	input datasets
`bins`	int	number of bins to use for the dataset to generate the pdf	100
`label`	List[str]	list of names to use as labels in the legend	None
`figsize`	tuple	figure size as passed to matplotlib figure	(6,6)
`dpi`	int	resolution of the figure	60
`ax`	matplotlib.axes	axes object to use for plotting (if you want to define your own figure and subplots)	None - creates a separate figure
`ks`	bool	if True prints out on the plot itself the Kolomogorov-Smirnov Statistic. It will also be returned along with the ax object	False
`style`	str	accepts matplotlib styles	'tableau-colorblind10'

Return

Type	Description
matplotlib.axes object, float (if ks==True)	ax, ks (if ks==True)

line_plot

sapsan.utils.plot.line_plot(series: List[np.ndarray], bins: int = 100, label: Optional[List[str]] = None, plot_type: str, figsize: tuple, dpi: int, ax: matplotlib.axes, style: str)

Plot linear data of x vs y - same matplotlib formatting will be used as the other plots

Parameters

Name	Type	Discription	Default
`series`	List[np.ndarray]	input datasets
`bins`	int	number of bins to use for the dataset to generate the pdf	100
`label`	List[str]	list of names to use as labels in the legend	None
`plot_type`	str	axis type of the matplotlib plot; options = ['plot', 'semilogx', 'semilogy', 'loglog']	'plot'
`figsize`	tuple	figure size as passed to matplotlib figure	(6,6)
`linestyle`	List[str]	list of linestyles to use for each profile for the matplotlib figure	['-'] (solid line)
`dpi`	int	resolution of the figure	60
`ax`	matplotlib.axes	axes object to use for plotting (if you want to define your own figure and subplots)	None - creates a separate figure
`style`	str	accepts matplotlib styles	'tableau-colorblind10'

Return

Type	Description
matplotlib.axes object	ax

slice_plot

sapsan.utils.plot.slice_plot(series: List[np.ndarray], label: Optional[List[str]] = None, cmap = 'plasma', figsize: tuple, dpi: int, ax: matplotlib.axes)

Plot 2D spatial distributions (slices) of your target and prediction datasets. Colorbar limits for both slices are set based on the minimum and maximum of the 2^nd (target) provided dataset.

Parameters

Name	Type	Discription	Default
`series`	List[np.ndarray]	input datasets
`label`	List[str]	list of names to use as labels in the legend	None
`cmap`	str	matplotlib colormap to use	'viridis'
`figsize`	tuple	figure size as passed to matplotlib figure	(6,6)
`dpi`	int	resolution of the figure	60
`ax`	matplotlib.axes	axes object to use for plotting (if you want to define your own figure and subplots) WARNING: only works if a single image is supplied to `slice_plot()`, otherwise will be ignored	None - creates a separate figure

Return

Type	Description
matplotlib.axes object	ax

log_plot

sapsan.utils.plot.log_plot(show_log = True, log_path = 'logs/logs/train.csv', valid_log_path = 'logs/logs/valid.csv', delimiter=',', train_name = 'train_loss', valid_name = 'valid_loss', train_column = 1, valid_column = 1, epoch_column = 0)

Plots an interactive training log of train_loss vs. epoch with plotly

Parameters

Name	Type	Discription	Default
`show_log`	bool	show the loss vs. epoch progress plot (it will be save in mlflow in either case)	True
`log_path`	str	path to training log produced by the catalyst wrapper	'logs/logs/train.csv'
`valid_log_path`	str	path to validation log produced by the catalyst wrapper	'logs/logs/valid.csv'
`delimiter`	str	delimiter to use for numpy.genfromtxt data loading	','
`train_name`	str	name for the training label	'train_loss'
`valid_name`	str	name for the validation label	'valid_loss'
`train_column`	int	column to load for training data from `log_path`	1
`valid_column`	int	column to load for validation data from `valid_log_path`	1
`epoch_column`	int	column to load the epoch index from `log_path`. If None, then epoch will be generated fro the number of entries	0

Return

Type	Description
plotly.express object	plot figure

model_graph

sapsan.utils.plot.model_graph(model, shape: np.array, transforms)

Creates a graph of the ML model (needs graphviz to be installed). A tutorial is available on the wiki: Model Graph

The method is based on hiddenlayer originally written by Waleed Abdulla.

Parameters

Name	Type	Discription	Default
`model`	object	initialized pytorch or tensorflow model
`shape`	np.array	shape of the input array in the form [N, C_in, D_b, H_b, W_b], where C_in=1
`transforms`	list[methods]	a list of hiddenlayer transforms to be applied (Fold, FoldId, Prune, PruneBranch, FoldDuplicates, Rename), defined in transforms.py	See below

Default Parameters

import sapsan.utils.hiddenlayer as hl
transforms = [
                hl.transforms.Fold("Conv > MaxPool > Relu", "ConvPoolRelu"),
                hl.transforms.Fold("Conv > MaxPool", "ConvPool"),    
                hl.transforms.Prune("Shape"),
                hl.transforms.Prune("Constant"),
                hl.transforms.Prune("Gather"),
                hl.transforms.Prune("Unsqueeze"),
                hl.transforms.Prune("Concat"),
                hl.transforms.Rename("Cast", to="Input"),
                hl.transforms.FoldDuplicates()
            ]

Return

Type	Description
graphviz.Digraph object	SVG graph of a model

Physics

ReynoldsStress

sapsan.utils.physics.ReynoldsStress(u, filt, filt_size, only_x_components=False)

Calculates a stress tensor of the form

\[ \tau_{ij} = \widetilde{u_i u_j} - \tilde{u}_i\tilde{u}_j \]

where \(\tilde{u}\) is the filtered \(u\)

Parameters

Name	Type	Discription	Default
`u`	np.ndarray	input velocity in 3D - [axis, D, H, W]
`filt`	sapsan.utils.filters	the type of filter to use (spectral, box, gaussian). Pass the filter itself by loading the appropriate one from `sapsan.utils.filters`	gaussian
`filt_size`	int or float	size of the filter to apply. For different filter types, the size is defined differently. Spectral - fourier mode to filter to, Box - k_size (box size), Gaussian - sigma	2 (sigma=2 for gaussian)
`only_x_component`	bool	calculates and outputs only x components of the tensor in shape [row, D, H, W] - calculating all 9 can be taxing on memory	False

Return

Type	Description
np.ndarray	stress tensor of shape [column, row, D, H, W]

PowerSpectrum

CLASS

sapsan.utils.physics.PowerSpectrum(u: np.ndarray)

Sets up to produce a power spectrum

Parameters

Name	Type	Discription	Default
`u`	np.ndarray	input velocity (first dimension must be the axis=[1, 2, or 3], e.g. the shape for 3D velocity should be: [axis, D, H, W])

sapsan.utils.physics.PowerSpectrum.calculate()

Calculates the power spectrum

Return

Type	Description
np.ndarray, np.ndarray	k_bins (fourier modes), Ek_bins (E(k))

sapsan.utils.physics.PowerSpectrum.spectrum_plot(k_bins, Ek_bins, kolmogorov=True, kl_a)

Plots the calculated power spectrum

Parameters

Name	Type	Discription	Default
`k_bins`	np.ndarray	fourier mode values along x-axis
`Ek_bins`	np.ndarray	energy as a function of k: E(k)
`kolmogorov`	bool	plots scaled Kolmogorov's -5/3 spectrum alongside the calculated one	True
`kl_A`	float	scaling factor of Kolmogorov's law	np.amax(self.Ek_bins)1e1*

Return

Type	Description
matplotlib.axes object	spectrum plot

GradientModel

CLASS

sapsan.utils.physics.GradientModel(u: np.ndarray, filter_width, delta_u = 1)

sets up to apply a gradient turbulence subgrid model:

\[ \tau_{ij} = \frac{1}{12} \Delta^2 \,\delta_k u^*_i \,\delta_k u^*_j \]

where \(\Delta\) is the filter width and \(u^*\) is the filtered \(u\)

Parameters

Name	Type	Discription	Default
`u`	np.ndarray	input filtered quantity in 3D - [axis, D, H, W]
`filter_width`	float	width of the filter which was applied onto `u`
`delta_u`		distance between the points on the grid to use for scaling	1

sapsan.utils.physics.GradientModel.gradient()

calculated the gradient of the given input data from GradientModel

Return

Type	Description
np.ndarray	gradient with shape [column, row, D, H, W]

sapsan.utils.physics.GradientModel.model()

calculates the gradient model tensor with shape [column, row, D, H, W]

Return

Type	Description
np.ndarray	gradient model tensor

DynamicSmagorinskyModel

CLASS

sapsan.utils.physics.DynamicSmagorinskyModel(u: np.ndarray, filt, original_filt_size, filt_ratio, du, delta_u)

sets up to apply a Dynamic Smagorinsky (DS) turbulence subgrid model:

\[ \tau_{ij} = -2(C_s\Delta^*)^2|S^*|S^*_{ij} \]

where \(\Delta\) is the filter width and \(S^*\) is the filtered \(u\)

Parameters

Name	Type	Discription	Default
`u`	np.ndarray	input filtered quantity either in 3D [axis, D, H, W] or 2D [axis, D, H]
`du`	np.ndarray	gradient of `u`	None*: if `du` is not provided, then it will be calculated with `np.gradient()`
`filt`	sapsan.utils.filters	the type of filter to use (spectral, box, gaussian). Pass the filter itself by loading the appropriate one from `sapsan.utils.filters`	spectral
`original_fil_size`	int	width of the filter which was applied onto `u`	15 (spectral, fourier modes = 15)
`delta_u`	float	distance between the points on the grid to use for scaling	1
`filt_ratio`	float	the ratio of additional filter that will be applied on the data to find the slope for Dynamic Smagorinsky extrapolation over `original_filt_size`	0.5

sapsan.utils.physics.DynamicSmagorinskyModel.model()

calculates the DS model tensor with shape [column, row, D, H, W]

Return

Type	Description
np.ndarray	DS model tensor