Build (`build.py`)¶

This module provides the ModelBuilder abstract base class, which defines a standard interface for building, fitting, saving, and loading PyMC models, inspired by the scikit-learn API.

`ModelBuilder`¶

class ModelBuilder(ABC):
    """Abstract Base Class for building PyMC models with a scikit-learn-like API."""

An abstract base class designed to provide a consistent, scikit-learn-like interface for PyMC models. Subclasses must implement several abstract methods and properties related to model definition, data handling, and configuration.

Attributes¶

_model_type (str): Class attribute identifying the specific model type (e.g., “BaseMMM”). Should be overridden by subclasses.
version (str): Class attribute specifying the version of the model class. Should be overridden by subclasses.
X (Optional[pd.DataFrame]): Holds the feature data after preprocessing within the instance. Initialised as None.
y (Optional[Union[pd.Series, np.ndarray]]): Holds the target data after preprocessing within the instance. Initialised as None.
model (Optional[pm.Model]): The PyMC model object. Created by the build_model method.
idata (Optional[az.InferenceData]): Stores the results of the model fitting process (posterior samples, observed data, sample stats, etc.). Generated by the fit method.
is_fitted_ (bool): A flag indicating whether the fit method has been successfully called. Initialised as False.
sampler_config (Dict): Dictionary holding the configuration parameters for the PyMC sampler (e.g., draws, tune, chains).
model_config (Dict): Dictionary holding the configuration parameters for the model structure (e.g., priors, hyperparameters).

Initialization¶

    def __init__(
        self,
        model_config: Optional[Dict] = None,
        sampler_config: Optional[Dict] = None,
    ) -> None:

Initialises the ModelBuilder. Sets up the model_config and sampler_config, using defaults defined by abstract properties if None is provided.

Parameters:

model_config (Optional[Dict], optional): Configuration for the model structure. Defaults to self.default_model_config.
sampler_config (Optional[Dict], optional): Configuration for the sampler. Defaults to self.default_sampler_config.

Abstract Properties¶

(These must be implemented by subclasses)

`output_var`¶

    @property
    @abstractmethod
    def output_var(self) -> str:

Should return the name of the target variable within the PyMC model (the observed variable in the likelihood function).

`default_model_config`¶

    @property
    @abstractmethod
    def default_model_config(self) -> Dict:

Should return a dictionary defining the default structure and parameters (e.g., priors) for the specific model implementation.

`default_sampler_config`¶

    @property
    @abstractmethod
    def default_sampler_config(self) -> Dict:

Should return a dictionary defining the default parameters for the PyMC sampler (e.g., draws, tune, chains).

`_serializable_model_config`¶

    @property
    @abstractmethod
    def _serializable_model_config(self) -> Dict:

Should return a version of self.model_config where all values are JSON-serializable (e.g., converting tuples or NumPy arrays to lists). This is used when saving the model’s metadata.

Abstract Methods¶

(These must be implemented by subclasses)

`_data_setter`¶

    @abstractmethod
    def _data_setter(
        self,
        X: Union[np.ndarray, pd.DataFrame],
        y: Optional[Union[np.ndarray, pd.Series]] = None
    ) -> None:

Should update the data containers within the already built PyMC model (self.model) using pm.set_data(). This is essential for making predictions on new data.

`_generate_and_preprocess_model_data`¶

    @abstractmethod
    def _generate_and_preprocess_model_data(
        self,
        X: Union[np.ndarray, pd.DataFrame],
        y: Union[np.ndarray, pd.Series]
    ) -> None:

Should perform any necessary preprocessing on the input data X and y, store the results (e.g., in self.X, self.y), and define the model’s coordinates using self.model.add_coords() if applicable. This method is called internally by fit before build_model.

`build_model`¶

    @abstractmethod
    def build_model(
        self,
        X: Union[np.ndarray, pd.DataFrame],
        y: Union[np.ndarray, pd.Series],
        **kwargs: Any,
    ) -> None:

Should define the complete PyMC model structure within a pm.Model() context, using the (potentially preprocessed) data X and y, and assign the created model to self.model.

Concrete Methods¶

`_validate_data`¶

    def _validate_data(self, X: Any, y: Optional[Any] = None) -> Union[Tuple[Any, Any], Any]:

Internal helper to validate input data using scikit-learn’s check_X_y or check_array if available, otherwise returns the data unchanged.

`set_idata_attrs`¶

    def set_idata_attrs(self, idata: Optional[az.InferenceData] = None) -> az.InferenceData:

Adds standard metadata (model type, version, configurations, ID) to the attributes of an ArviZ InferenceData object.

`save`¶

    def save(self, fname: str) -> None:

Saves the InferenceData object (self.idata) to a NetCDF file using idata.to_netcdf(). Requires the model to be fitted first.

`load` (classmethod)¶

    @classmethod
    def load(cls, fname: str) -> "ModelBuilder":

Loads a previously saved model instance from a NetCDF file. It reads the InferenceData, extracts configurations, reinstantiates the model class (cls), rebuilds the PyMC model structure, and sets the idata.

`_model_config_formatting` (classmethod)¶

    @classmethod
    def _model_config_formatting(cls, model_config: Dict) -> Dict:

Internal helper used by load to convert list representations in the loaded JSON configuration back into tuples (for ‘dims’) or NumPy arrays (for other lists).

`fit`¶

    def fit(
        self,
        X: Union[np.ndarray, pd.DataFrame],
        y: Optional[Union[np.ndarray, pd.Series]] = None,
        progressbar: bool = True,
        random_seed: Optional[int] = None,
        **kwargs: Any,
    ) -> az.InferenceData:

Fits the model to the provided data X and y.

Calls _generate_and_preprocess_model_data to prepare data and coordinates.
Calls build_model to define the PyMC model structure.
Calls pm.sample() using sampler_config and any additional kwargs.
Stores the results in self.idata.
Adds the original fitting data to idata under the fit_data group.
Sets standard attributes on idata using set_idata_attrs.
Sets self.is_fitted_ to True.

Returns the InferenceData object.

`predict`¶

    def predict(
        self,
        X_pred: Union[np.ndarray, pd.DataFrame],
        extend_idata: bool = True,
        **kwargs: Any,
    ) -> np.ndarray:

Generates point predictions for new data X_pred by calculating the mean of the posterior predictive distribution. Calls sample_posterior_predictive internally.

`sample_prior_predictive`¶

    def sample_prior_predictive(
        self,
        X_pred: Union[np.ndarray, pd.DataFrame],
        y_pred: Optional[Union[np.ndarray, pd.Series]] = None,
        samples: Optional[int] = None,
        extend_idata: bool = False,
        combined: bool = True,
        **kwargs: Any,
    ) -> xr.DataArray:

Draws samples from the model’s prior predictive distribution given input data X_pred. Builds the model if not already built. Uses _data_setter to provide the new data to the model context.

`sample_posterior_predictive`¶

    def sample_posterior_predictive(
        self,
        X_pred: Union[np.ndarray, pd.DataFrame],
        extend_idata: bool = True,
        combined: bool = True,
        **kwargs: Any
    ) -> xr.DataArray:

Draws samples from the model’s posterior predictive distribution given new input data X_pred. Requires the model to be fitted first. Uses _data_setter to provide the new data to the model context. Returns the entire posterior predictive group from the InferenceData.

`get_params`¶

    def get_params(self, deep: bool = True) -> Dict:

Returns a dictionary containing the model’s initialization parameters (model_config, sampler_config). Compatible with scikit-learn’s API.

`set_params`¶

    def set_params(self, **params: Any) -> "ModelBuilder":

Sets the model’s initialization parameters (model_config, sampler_config). Compatible with scikit-learn’s API. Returns the instance.

`predict_proba`¶

    def predict_proba(
        self,
        X_pred: Union[np.ndarray, pd.DataFrame],
        extend_idata: bool = True,
        combined: bool = False,
        **kwargs: Any,
    ) -> xr.DataArray:

Alias for predict_posterior. Returns the full posterior predictive distribution samples for the output variable. combined defaults to False here.

`predict_posterior`¶

    def predict_posterior(
        self,
        X_pred: Union[np.ndarray, pd.DataFrame],
        extend_idata: bool = True,
        combined: bool = True,
        **kwargs: Any,
    ) -> xr.DataArray:

Generates the full posterior predictive distribution samples for the output_var given new input data X_pred. Calls sample_posterior_predictive internally and extracts the samples for the specific output_var.

Properties (Concrete)¶

`id`¶

    @property
    def id(self) -> str:

Generates a unique 16-character ID (SHA256 hash) based on the sorted model_config, version, and _model_type. Useful for identifying specific model configurations.

Build (build.py)¶

ModelBuilder¶

Attributes¶

Initialization¶

Abstract Properties¶

output_var¶

default_model_config¶

default_sampler_config¶

_serializable_model_config¶

Abstract Methods¶

_data_setter¶

_generate_and_preprocess_model_data¶

build_model¶

Concrete Methods¶

_validate_data¶

set_idata_attrs¶

save¶

load (classmethod)¶

_model_config_formatting (classmethod)¶

fit¶

predict¶

sample_prior_predictive¶

sample_posterior_predictive¶

get_params¶

set_params¶

predict_proba¶

predict_posterior¶