Core Utilities (utils.py)

This module provides various utility functions used across the ABACUS core library, including Fourier mode generation, parameter estimation for response curves, sigmoid function calculations, and data manipulation helpers.

Functions

generate_fourier_modes

def generate_fourier_modes(
    periods: npt.NDArray[np.float_],
    n_order: int
) -> pd.DataFrame:

Generates sine and cosine features (Fourier modes) up to a specified order, typically used for modelling seasonality.

Parameters:

  • periods (npt.NDArray[np.float_]): An array representing the time periods, usually scaled between 0 and 1 (e.g., day of year / 365.25).

  • n_order (int): The maximum order of Fourier modes to generate. For yearly seasonality, order k corresponds to a frequency of k cycles per year. Must be >= 1.

Returns:

  • pd.DataFrame: A DataFrame where columns represent the generated Fourier modes (e.g., sin_order_1, cos_order_1, sin_order_2, etc.).

Raises:

  • ValueError: If n_order is less than 1.

estimate_menten_parameters

def estimate_menten_parameters(
    channel: str,
    original_dataframe: pd.DataFrame,
    contributions: xr.DataArray,
    **kwargs: Any,
) -> List[float]:

Estimates the parameters (L, k) for the Michaelis-Menten function (abacus.core._legacy_imports.michaelis_menten) for a specific channel using non-linear least squares curve fitting (scipy.optimize.curve_fit).

Parameters:

  • channel (str): The name of the channel to estimate parameters for.

  • original_dataframe (pd.DataFrame): The original DataFrame containing the spend data for the channel.

  • contributions (xr.DataArray): An xarray DataArray containing the estimated contributions (e.g., from the MMM fit_result), indexed by channel.

  • **kwargs: Optional keyword arguments passed to scipy.optimize.curve_fit or used for initial parameter guesses:

    • maxfev (int, optional): Maximum number of function evaluations for curve_fit. Defaults to 5000.

    • lam_initial_estimate (float, optional): Initial guess for the k parameter. Defaults to 0.001.

    • alpha_initial_estimate (float, optional): Initial guess for the L parameter. Defaults to the maximum observed contribution (max(y)).

    • x (array-like, optional): Overrides the spend data extracted from original_dataframe.

    • y (array-like, optional): Overrides the contribution data extracted from contributions.

Returns:

  • List[float]: A list containing the estimated parameters [L, k].

estimate_sigmoid_parameters

def estimate_sigmoid_parameters(
    channel: str,
    original_dataframe: pd.DataFrame,
    contributions: xr.DataArray,
    **kwargs: Any,
) -> List[float]:

Estimates the parameters (alpha, lam) for the sigmoid saturation function (sigmoid_saturation) for a specific channel using non-linear least squares curve fitting (scipy.optimize.curve_fit).

Parameters:

  • channel (str): The name of the channel to estimate parameters for.

  • original_dataframe (pd.DataFrame): The original DataFrame containing the spend data for the channel.

  • contributions (xr.DataArray): An xarray DataArray containing the estimated contributions, indexed by channel.

  • **kwargs: Optional keyword arguments passed to scipy.optimize.curve_fit or used for initial parameter guesses:

    • maxfev (int, optional): Maximum number of function evaluations. Defaults to 5000.

    • lam_initial_estimate (float, optional): Initial guess for the lam parameter. Defaults to 0.00001.

    • alpha_initial_estimate (float, optional): Initial guess for the alpha parameter. Defaults to 3 times the maximum observed contribution (3 * max(y)).

    • x (array-like, optional): Overrides the spend data extracted from original_dataframe.

    • y (array-like, optional): Overrides the contribution data extracted from contributions.

Returns:

  • List[float]: A list containing the estimated parameters [alpha, lam].

sigmoid_saturation

def sigmoid_saturation(
    x: Union[float, np.ndarray, npt.NDArray[np.float64]],
    alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
    lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Union[float, Any]:

Computes the value of the sigmoid saturation function: (alpha - alpha * exp(-lam * x)) / (1 + exp(-lam * x)).

Parameters:

  • x (Union[float, np.ndarray]): Input value(s) (e.g., spend).

  • alpha (Union[float, np.ndarray]): Asymptotic maximum value (saturation level). Must be > 0.

  • lam (Union[float, np.ndarray]): Parameter controlling the steepness/transition point. Must be > 0.

Returns:

  • Union[float, np.ndarray]: The calculated sigmoid saturation value(s).

Raises:

  • ValueError: If alpha or lam is less than or equal to 0.

compute_sigmoid_second_derivative

def compute_sigmoid_second_derivative(
    x: Union[float, np.ndarray, npt.NDArray[np.float64]],
    alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
    lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Union[float, Any]:

Computes the second derivative of the sigmoid_saturation function with respect to x. Used to find the inflection point.

Parameters:

  • x, alpha, lam: Same as sigmoid_saturation.

Returns:

  • Union[float, np.ndarray]: The second derivative value(s).

find_sigmoid_inflection_point

def find_sigmoid_inflection_point(
    alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
    lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Tuple[float, Union[float, np.ndarray]]:

Finds the inflection point (point of maximum growth rate) of the sigmoid_saturation function by numerically minimizing the absolute value of the second derivative using scipy.optimize.minimize_scalar.

Parameters:

  • alpha, lam: Same as sigmoid_saturation.

Returns:

  • Tuple[float, Union[float, np.ndarray]]: A tuple containing:

    • The x-coordinate of the inflection point.

    • The y-coordinate (saturation value) at the inflection point.

standardize_scenarios_dict_keys

def standardize_scenarios_dict_keys(
    d: Dict,
    keywords: List[str]
) -> None:

Standardizes keys in a dictionary d by replacing keys that contain a keyword (case-insensitive) with the keyword itself. Modifies the dictionary in-place. Useful for cleaning up scenario dictionaries.

Parameters:

  • d (Dict): The dictionary to modify.

  • keywords (List[str]): The list of standard keywords to use.

apply_sklearn_transformer_across_dim

def apply_sklearn_transformer_across_dim(
    data: xr.DataArray,
    func: Callable[[np.ndarray], np.ndarray],
    dim_name: str,
    combined: bool = False,
) -> xr.DataArray:

Applies a scikit-learn-like transformation function (func) across a specified dimension (dim_name) of an xarray DataArray. Preserves attributes.

Parameters:

  • data (xr.DataArray): The input DataArray.

  • func (Callable): The transformation function (e.g., scaler.transform, scaler.inverse_transform).

  • dim_name (str): The name of the dimension to apply the function along.

  • combined (bool, optional): Flag indicating if the input data has combined chain/draw dimensions. Defaults to False.

Returns:

  • xr.DataArray: The transformed DataArray.

create_new_spend_data

def create_new_spend_data(
    spend: np.ndarray,
    adstock_max_lag: int,
    one_time: bool,
    spend_leading_up: Optional[np.ndarray] = None,
) -> np.ndarray:

Constructs an array representing spend over time, suitable for passing to adstock functions when calculating response curves or contributions for specific spend scenarios. It includes padding for the adstock lag.

Parameters:

  • spend (np.ndarray): Array representing the spend amount(s) for each channel. Shape (n_channels,).

  • adstock_max_lag (int): The maximum adstock lag used in the model.

  • one_time (bool): If True, assumes spend is a one-time pulse, followed by zeros. If False, assumes spend is constant over the adstock period.

  • spend_leading_up (Optional[np.ndarray], optional): Spend values for the adstock_max_lag periods before the main spend. Shape (n_channels,). Defaults to zeros if None.

Returns:

  • np.ndarray: An array representing spend over time, including leading padding and the main spend pulse/period, with shape (2 * adstock_max_lag + 1, n_channels).

(See the docstring in the source code for a plot illustrating the output structure.)