# Abacus Core Mixins: MMM Predict (`mmm_predict.py`) This module provides the `MMMPredictMixin` class, designed to be inherited by Marketing Mix Model (MMM) classes in Abacus. It focuses on generating predictions and simulating outcomes based on the fitted model, including evaluating the impact of hypothetical future spend scenarios. ::: abacus.core.mixins.mmm_predict ## `MMMPredictMixin` Class This mixin assumes the inheriting class provides several attributes and methods, including: - `X`: The original input DataFrame used for training. - `channel_columns`: List of channel names. - `adstock_max_lag`: The maximum lag used in adstock transformations. - `idata`: An ArviZ `InferenceData` object containing the model fitting results (posterior or prior samples). - `channel_transformer`: The fitted scaler object for channel data (if used). - `target_transformer`: The fitted scaler object for the target variable (if used). - `get_target_transformer()`: A method to retrieve the target transformer. - `output_var`: The name of the target variable in the PyMC model. - `model`: The PyMC model object (`pm.Model`). - `_data_setter()`: A method (likely from the base class) to set new data into the PyMC model's data containers. - `date_column`: Name of the date column in `X`. ### Methods #### `new_spend_contributions` ```python def new_spend_contributions( self, spend: Optional[np.ndarray] = None, one_time: bool = True, spend_leading_up: Optional[np.ndarray] = None, prior: bool = False, original_scale: bool = True, **sample_posterior_predictive_kwargs: Any, ) -> DataArray: ``` Calculates the expected channel contribution trajectories resulting from a new, hypothetical spend pattern over a future period. This method simulates applying a specified `spend` amount to the channels, considering potential adstock carry-over effects from spend that occurred just before the simulation period (`spend_leading_up`). It constructs a temporary PyMC model using parameters sampled from the fitted model's posterior (or prior) distribution and calculates the resulting channel contributions over a period determined by `adstock_max_lag`. **Parameters:** - `spend` (`np.ndarray`, optional): An array containing the spend amount for *each* channel in the model for the simulation period. If `None`, the mean spend for each channel from the original training data (`self.X`) is used. Defaults to `None`. - `one_time` (`bool`, optional): If `True` (default), the `spend` is applied only at the first time step (t=0) of the simulation. If `False`, the `spend` is applied continuously for `adstock_max_lag + 1` time steps. - `spend_leading_up` (`np.ndarray`, optional): An array of spend for each channel during the `adstock_max_lag` periods immediately *preceding* the simulation start (t=0). This initializes the adstock effect. If `None`, it defaults to zero spend. - `prior` (`bool`, optional): If `True`, uses samples from the model's prior distribution (`self.idata['prior']`). If `False` (default), uses samples from the posterior distribution (`self.idata['posterior']`). - `original_scale` (`bool`, optional): If `True` (default), the returned contributions are inverse-transformed to the original scale of the target variable using `self.target_transformer`. If `False`, contributions are returned in the potentially transformed scale used during modelling. - `**sample_posterior_predictive_kwargs`: Additional keyword arguments passed to the underlying `pm.sample_posterior_predictive` call used internally for sampling contributions from the temporary model. **Returns:** - `xr.DataArray`: An xarray DataArray containing the calculated channel contributions. Dimensions are (`chain`, `draw`, `time_since_spend`, `channel`), where `time_since_spend` ranges from `-adstock_max_lag` to `+adstock_max_lag`. **Raises:** - `ValueError`: If the length of the provided `spend` array does not match the number of channels in the model. - `RuntimeError`: If `spend` is `None` and the original training data `self.X` is unavailable; if the model hasn't been fitted (when `prior=False`); if prior samples are requested but unavailable; or if required channel or target transformers are missing when needed. --- #### `sample_posterior_predictive` ```python def sample_posterior_predictive( self, X_pred: pd.DataFrame, extend_idata: bool = True, combined: bool = True, include_last_observations: bool = False, original_scale: bool = True, **sample_posterior_predictive_kwargs: Any, ) -> DataArray: ``` Generates samples from the posterior predictive distribution for a given set of future input data (`X_pred`). This method predicts the expected target variable outcome for new data points. It overrides the standard PyMC method to handle specific MMM requirements, such as scaling and accounting for adstock carry-over effects from the end of the training period. **Parameters:** - `X_pred` (`pd.DataFrame`): A DataFrame containing the input data for the prediction period. It must include the `date_column` and all `channel_columns`. If control variables or Fourier terms were used in the model, they must also be present (or Fourier terms will be regenerated based on the dates if `include_last_observations` is `False`). - `extend_idata` (`bool`, optional): If `True` (default), the generated posterior predictive samples are added to the model's `self.idata` object under the `posterior_predictive` group. - `combined` (`bool`, optional): If `True` (default), the `chain` and `draw` dimensions of the output samples are combined into a single `sample` dimension. This is done *after* potential inverse scaling. - `include_last_observations` (`bool`, optional): If `True`, the last `adstock_max_lag` observations from the original training data (`self.X`) are automatically prepended to `X_pred` before making predictions. This allows the model to correctly account for adstock carry-over effects, assuming `X_pred` immediately follows the training data time period. Defaults to `False`. - `original_scale` (`bool`, optional): If `True` (default), the predictions are inverse-transformed back to the original scale of the target variable using `self.target_transformer`. If `False`, predictions remain in the potentially transformed scale. - `**sample_posterior_predictive_kwargs`: Additional keyword arguments passed directly to `pymc.sample_posterior_predictive`. **Returns:** - `xr.DataArray`: An xarray DataArray containing the posterior predictive samples for the target variable (`self.output_var`). Dimensions depend on the `combined` parameter: (`chain`, `draw`, `date`) if `False`, or (`sample`, `date`) if `True`. **Raises:** - `RuntimeError`: If the model hasn't been built or fitted, if `self.X` is missing when `include_last_observations=True`, or if required transformers are missing when `original_scale=True`. - `ValueError`: If `include_last_observations=True` and the training data `self.X` has fewer rows than `adstock_max_lag`, or if `X_pred` is missing required columns for the prediction (including carry-over calculation if applicable).