# Abacus Core: Legacy Imports (`_legacy_imports.py`) This module serves as a repository for functions, classes, and enumerations that were originally part of the `pymc-marketing` library or its dependencies. They have been adapted and included directly within Abacus to remove the external dependency while preserving essential functionality for marketing mix modelling, particularly around adstock effects, saturation, and lift test data handling. ::: abacus.core._legacy_imports ## Enumerations ### `ConvMode` ```python class ConvMode(str, Enum): After = "After" Before = "Before" Overlap = "Overlap" ``` Defines the modes for applying 1D convolution, determining how boundaries are handled: - `After`: Trailing decay effect (typical adstock). - `Before`: Leading effect ("excitement" factor). - `Overlap`: Effect overlaps preceding and succeeding elements. --- ### `WeibullType` ```python class WeibullType(str, Enum): PDF = "PDF" CDF = "CDF" ``` Specifies the type of Weibull distribution function to use for adstock calculation: - `PDF`: Probability Density Function. - `CDF`: Cumulative Distribution Function (specifically, `1 - CDF` is used for decay). ## Adstock Functions These functions apply carryover effects to time-series data, commonly used for modelling advertising impact over time. ### `batched_convolution` ```python def batched_convolution( x, w, axis: int = 0, mode: ConvMode | str = ConvMode.After, ): ``` Applies a 1D convolution across multiple batch dimensions in a vectorized manner. This is the core function used by other adstock implementations. **Parameters:** - `x`: The array to convolve. - `w`: The convolution weights (kernel). The last axis determines the number of steps (lag). - `axis` (`int`): The axis of `x` along which to apply the convolution. - `mode` (`ConvMode | str`): The convolution mode (`After`, `Before`, `Overlap`). **Returns:** - The convolved array, with shape matching `x` (considering broadcasting with `w`). --- ### `geometric_adstock` ```python def geometric_adstock( x, alpha: float = 0.0, l_max: int = 12, normalize: bool = False, axis: int = 0, mode: ConvMode = ConvMode.After, ): ``` Applies a geometric adstock transformation, where the effect decays geometrically over time. **Parameters:** - `x`: Input tensor. - `alpha` (`float`): Retention rate (0 to 1). - `l_max` (`int`): Maximum duration of carryover. - `normalize` (`bool`): Whether to normalize weights to sum to 1. - `axis` (`int`): Axis to apply convolution along. - `mode` (`ConvMode`): Convolution mode. **Returns:** - Transformed tensor. --- ### `delayed_adstock` ```python def delayed_adstock( x, alpha: float = 0.0, theta: int = 0, l_max: int = 12, normalize: bool = False, axis: int = 0, mode: ConvMode = ConvMode.After, ): ``` Applies a delayed adstock transformation, allowing the peak effect to be delayed. **Parameters:** - `x`: Input tensor. - `alpha` (`float`): Retention rate (0 to 1). - `theta` (`int`): Delay of the peak effect (0 to `l_max` - 1). - `l_max` (`int`): Maximum duration of carryover. - `normalize` (`bool`): Whether to normalize weights. - `axis` (`int`): Axis to apply convolution along. - `mode` (`ConvMode`): Convolution mode. **Returns:** - Transformed tensor. --- ### `weibull_adstock` ```python def weibull_adstock( x, lam=1, k=1, l_max: int = 12, axis: int = 0, mode: ConvMode = ConvMode.After, type: WeibullType | str = WeibullType.PDF, ): ``` Applies an adstock transformation based on the Weibull distribution (either PDF or CDF). **Parameters:** - `x`: Input tensor. - `lam` (`float`): Scale parameter (lambda > 0) of the Weibull distribution. - `k` (`float`): Shape parameter (k > 0) of the Weibull distribution. - `l_max` (`int`): Maximum duration of carryover. - `axis` (`int`): Axis to apply convolution along. - `mode` (`ConvMode`): Convolution mode. - `type` (`WeibullType | str`): Type of Weibull function (`PDF` or `CDF`). **Returns:** - Transformed tensor. ## Saturation Functions These functions model the diminishing returns of an input variable, often used for advertising spend. ### `logistic_saturation` ```python def logistic_saturation(x, lam: npt.NDArray[np.float64] | float = 0.5): ``` Applies a logistic saturation function: `(1 - exp(-lam * x)) / (1 + exp(-lam * x))`. **Parameters:** - `x`: Input tensor. - `lam` (`float` or `array-like`): Saturation parameter. **Returns:** - Transformed tensor. --- ### `tanh_saturation` ```python def tanh_saturation( x: pt.TensorLike, b: pt.TensorLike = 0.5, c: pt.TensorLike = 0.5, ) -> pt.TensorVariable: ``` Applies a hyperbolic tangent (tanh) saturation function: `b * tanh(x / (b * c))`. **Parameters:** - `x`: Input tensor. - `b` (`float`): Saturation level (max effect). - `c` (`float`): Controls the shape (related to cost per acquisition at the start). **Returns:** - Transformed tensor. --- ### `tanh_saturation_baselined` ```python def tanh_saturation_baselined( x: pt.TensorLike, x0: pt.TensorLike, gain: pt.TensorLike = 0.5, r: pt.TensorLike = 0.5, ) -> pt.TensorVariable: ``` Applies a reparameterised tanh saturation function based on a baseline point `x0`. **Parameters:** - `x`: Input tensor. - `x0`: Baseline input value. - `gain`: Return on Ad Spend (ROAS) at `x0`, defined as `f(x0) / x0`. - `r`: Overspend fraction at `x0`, defined as `f(x0) / saturation_level`. **Returns:** - Transformed tensor. --- ### `michaelis_menten` ```python def michaelis_menten( x: float | np.ndarray | npt.NDArray[np.float64], alpha: float | np.ndarray | npt.NDArray[np.float64], lam: float | np.ndarray | npt.NDArray[np.float64], ) -> float | Any: ``` Applies the Michaelis-Menten saturation function: `alpha * x / (lam + x)`. **Parameters:** - `x`: Input value(s). - `alpha`: Maximum effect (saturation level). - `lam`: Michaelis constant (value of `x` at which effect is half of `alpha`). **Returns:** - Transformed value(s). ## Tanh Saturation Parameter Containers ### `TanhSaturationParameters` ```python class TanhSaturationParameters(NamedTuple): b: pt.TensorLike c: pt.TensorLike ``` A `NamedTuple` to hold the standard parameters (`b`: saturation, `c`: shape/cost) for the `tanh_saturation` function. Includes a method `.baseline(x0)` to convert to `TanhSaturationBaselinedParameters`. --- ### `TanhSaturationBaselinedParameters` ```python class TanhSaturationBaselinedParameters(NamedTuple): x0: pt.TensorLike gain: pt.TensorLike r: pt.TensorLike ``` A `NamedTuple` to hold the baselined parameters (`x0`: baseline point, `gain`: ROAS at `x0`, `r`: overspend fraction at `x0`) for the `tanh_saturation_baselined` function. Includes methods `.debaseline()` to convert back to standard parameters and `.rebaseline(x1)` to change the baseline point. ## Lift Test Scaling Functions These functions are used to scale data from lift tests (experiments) so they can be incorporated into the model, typically alongside observational time-series data. ### `scale_channel_lift_measurements` ```python def scale_channel_lift_measurements( df_lift_test: pd.DataFrame, channel_col: str, channel_columns: list[str], transform: Callable[[np.ndarray], np.ndarray], ) -> pd.DataFrame: ``` Scales the spend (`x`) and spend change (`delta_x`) columns for lift tests related to specific channels, using a provided scaling function (likely the same scaler used for the main channel data). **Parameters:** - `df_lift_test`: DataFrame containing lift test results. Must include columns `x`, `delta_x`, and the specified `channel_col`. - `channel_col`: The name of the column identifying the channel for each lift test row. - `channel_columns`: A list of all channel names used in the model. - `transform`: The scaling function (e.g., from a `sklearn.preprocessing.StandardScaler`) to apply. **Returns:** - A DataFrame with the same structure as the input, but with `x` and `delta_x` values scaled. --- ### `scale_target_for_lift_measurements` ```python def scale_target_for_lift_measurements( target: pd.Series, transform: Callable[[np.ndarray], np.ndarray], ) -> pd.Series: ``` Scales a target-related Series (like `delta_y` or `sigma` from lift tests) using a provided scaling function (likely the same scaler used for the main target variable). **Parameters:** - `target`: A pandas Series containing the values to scale (e.g., `df_lift_test['delta_y']`). - `transform`: The scaling function to apply. **Returns:** - A pandas Series with the scaled values. --- ### `scale_lift_measurements` ```python def scale_lift_measurements( df_lift_test: pd.DataFrame, channel_col: str, channel_columns: list[str | int], channel_transform: Callable[[np.ndarray], np.ndarray], target_transform: Callable[[np.ndarray], np.ndarray], ) -> pd.DataFrame: ``` Applies scaling to all relevant columns (`x`, `delta_x`, `delta_y`, `sigma`) in a lift test DataFrame using appropriate channel and target scalers. **Parameters:** - `df_lift_test`: DataFrame containing lift test results. - `channel_col`: Name of the channel identifier column. - `channel_columns`: List of all channel names in the model. - `channel_transform`: Scaling function for channel spend (`x`, `delta_x`). - `target_transform`: Scaling function for target-related values (`delta_y`, `sigma`). **Returns:** - A DataFrame with scaled lift test data, ready for model input. ## Utility Functions ### `create_new_spend_data` ```python def create_new_spend_data( spend: np.ndarray, adstock_max_lag: int, one_time: bool, spend_leading_up: np.ndarray | None = None, ) -> np.ndarray: ``` Prepares spend data for calculating out-of-sample response curves or ROI, potentially padding it based on adstock lag. **Parameters:** - `spend`: Array of spend values for each channel. - `adstock_max_lag`: The maximum adstock lag used in the model. - `one_time`: Boolean indicating if the spend is a one-time impulse or continuous. - `spend_leading_up`: Optional array representing spend in periods before the main `spend` array (used for initial adstock state). **Returns:** - A potentially padded array ready for adstock transformation. --- *(Private helper function `_swap_columns_and_last_index_level` is used internally by scaling functions for DataFrame manipulation.)*