Estimators

Sapsan has several models in its arsenal to get started.

Convolution Neural Network (CNN)

Example: cnn_example.ipynb
Estimator: cnn3d_estimator.py

The network is based around Conv3d and MaxPool3d layers, reducing the spatial dimensions down to 1 by increasing the number of features. In order to do that, the network iterates over the following NN block:

def nn_block():
  torch.nn.Conv3d(D_in, D_in*2, kernel_size=2, stride=2, padding=1)
  torch.nn.MaxPool3d(kernel_size=2, padding=1)

where D_in is the input dimension.

As final layers, ReLU activation function is used and the data is linearized. An example model graph for the input data with the spatial dimensions of [16, 16, 16] split into 8 batches is provided below.

Physics-Informed CNN for Turbulence Modeling (PIMLTurb)

Example: pimlturb_diagonal_example.ipynb
Estimator: pimlturb_diagonal_estimator.py

The estimator is based on Physics-Informed Machine Learning for Modeling Turbulence in Supernovae by P.I.Karpov et al. (2022). The model is based on a 3D convolutional network with some additions to enforce a realizability constraint (\(Re_{ii} > 0\), where \(Re\) is the Reynolds stress tensor and \(i\) is the component index). Its overall schematic and graph are shown below.

The method also utilizes a custom loss that combines statistical (Kolmogorov-Smirnov Statistic) and spatial (Smooth L1) losses. The full description can be found in the paper linked above.

For the example included in Sapsan, the data included is from the same dataset as the publication, but it has been heavily sampled (down to \(17^3\)). To achieve comparable published results, the model will need to be trained for 3000-4000 epochs.

Physics-Informed CNN for 1D Turbulence Modeling (PIMLTurb1D)

Example: pimlturb1d_example.ipynb
Estimator: pimlturb1d_estimator.py

The estimator is based on Machine Learning for Core-Collapse Supernovae: 1D Models by P.I.Karpov et al. (2023, in prep), and it is similar to the 3D implementation above. The model was adopted for 1D data using 1D convolutional network with some additions to enforce a realizability constraint (\(Re_{ii} > 0\), where \(Re\) is the Reynolds stress tensor and \(i\) is the component index) and a smoothing Gaussian layer.

The method also utilizes a custom loss that combines statistical (KS) and spatial (Smooth L1) losses. The full description can be found in the paper linked above.

For the example included in Sapsan, only the mapped 3D-to-1D 12 \(M_{\odot}\) is provided, stripped of all features not used in training. That being said, it is exactly the same as the data used for publication-ready results for the 12 \(M_{\odot}\) model. To achieve comparable results, the model will need to be trained for ~1000 epochs. The original 3D dataset was provided by Adam Burrow, produced with FORNAX, and published by Burrows et al. (2020).

The trained ML models were used to inference turbulent pressure at runtime in a 1D Fortran-based CCSN code called COLLAPSO1D. To make it work, we integrated a PyTorch wrapper for Fortran, implementation of which can be found on its GitHub. COLLAPSO1D's documentation also contains instructions on integrating said wrapper into other Fortran codebases.

Physics-Informed Convolutional Autoencoder (PICAE)

Example: picae_example.ipynb
Estimator: picae_estimator.py

Note: The estimator is based on Embedding Hard Physical Constraints in Neural Network Coarse-Graining of 3D Turbulence by M.T.Arvind et al.

The model consists of 2 main parts: 1. Convolutional Auto-Encoder (trainable) 2. Static layers enforcing divergence-free condition (constant)

Thus, the latter force the CAE portion of the model to adjust to the curl of \(A\) to be 0. Through this, we are effectively enforcing the conservation of mass. A schematic of the model is shown below.

Kernel Ridge Regression (KRR)

Example: krr_example.ipynb
Estimator: krr_estimator.py

We have included one of the classic regression-based methods used in machine learning - Kernel Ridge Regression. The model has two hyperparameters to be tuned: regularization term \(\alpha\) and full-width at half-max \(\sigma\). KRR has the following form:

\[ y^′ = y(K + \alpha I)^{− 1}k \]

where \(K\) is the kernel, chosen to be the radial basis function (gaussian):

\[ K(x, x^′) = exp\left( -\frac{||x − x^′||^2}{2\sigma^2}\right) \]