Core Utilities (utils.py)¶
This module provides various utility functions used across the ABACUS core library, including Fourier mode generation, parameter estimation for response curves, sigmoid function calculations, and data manipulation helpers.
Functions¶
generate_fourier_modes¶
def generate_fourier_modes(
periods: npt.NDArray[np.float_],
n_order: int
) -> pd.DataFrame:
Generates sine and cosine features (Fourier modes) up to a specified order, typically used for modelling seasonality.
Parameters:
periods(npt.NDArray[np.float_]): An array representing the time periods, usually scaled between 0 and 1 (e.g., day of year / 365.25).n_order(int): The maximum order of Fourier modes to generate. For yearly seasonality, orderkcorresponds to a frequency ofkcycles per year. Must be >= 1.
Returns:
pd.DataFrame: A DataFrame where columns represent the generated Fourier modes (e.g.,sin_order_1,cos_order_1,sin_order_2, etc.).
Raises:
ValueError: Ifn_orderis less than 1.
estimate_menten_parameters¶
def estimate_menten_parameters(
channel: str,
original_dataframe: pd.DataFrame,
contributions: xr.DataArray,
**kwargs: Any,
) -> List[float]:
Estimates the parameters (L, k) for the Michaelis-Menten function (abacus.core._legacy_imports.michaelis_menten) for a specific channel using non-linear least squares curve fitting (scipy.optimize.curve_fit).
Parameters:
channel(str): The name of the channel to estimate parameters for.original_dataframe(pd.DataFrame): The original DataFrame containing the spend data for thechannel.contributions(xr.DataArray): An xarray DataArray containing the estimated contributions (e.g., from the MMMfit_result), indexed bychannel.**kwargs: Optional keyword arguments passed toscipy.optimize.curve_fitor used for initial parameter guesses:maxfev(int, optional): Maximum number of function evaluations forcurve_fit. Defaults to 5000.lam_initial_estimate(float, optional): Initial guess for thekparameter. Defaults to 0.001.alpha_initial_estimate(float, optional): Initial guess for theLparameter. Defaults to the maximum observed contribution (max(y)).x(array-like, optional): Overrides the spend data extracted fromoriginal_dataframe.y(array-like, optional): Overrides the contribution data extracted fromcontributions.
Returns:
List[float]: A list containing the estimated parameters[L, k].
estimate_sigmoid_parameters¶
def estimate_sigmoid_parameters(
channel: str,
original_dataframe: pd.DataFrame,
contributions: xr.DataArray,
**kwargs: Any,
) -> List[float]:
Estimates the parameters (alpha, lam) for the sigmoid saturation function (sigmoid_saturation) for a specific channel using non-linear least squares curve fitting (scipy.optimize.curve_fit).
Parameters:
channel(str): The name of the channel to estimate parameters for.original_dataframe(pd.DataFrame): The original DataFrame containing the spend data for thechannel.contributions(xr.DataArray): An xarray DataArray containing the estimated contributions, indexed bychannel.**kwargs: Optional keyword arguments passed toscipy.optimize.curve_fitor used for initial parameter guesses:maxfev(int, optional): Maximum number of function evaluations. Defaults to 5000.lam_initial_estimate(float, optional): Initial guess for thelamparameter. Defaults to 0.00001.alpha_initial_estimate(float, optional): Initial guess for thealphaparameter. Defaults to 3 times the maximum observed contribution (3 * max(y)).x(array-like, optional): Overrides the spend data extracted fromoriginal_dataframe.y(array-like, optional): Overrides the contribution data extracted fromcontributions.
Returns:
List[float]: A list containing the estimated parameters[alpha, lam].
sigmoid_saturation¶
def sigmoid_saturation(
x: Union[float, np.ndarray, npt.NDArray[np.float64]],
alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Union[float, Any]:
Computes the value of the sigmoid saturation function: (alpha - alpha * exp(-lam * x)) / (1 + exp(-lam * x)).
Parameters:
x(Union[float, np.ndarray]): Input value(s) (e.g., spend).alpha(Union[float, np.ndarray]): Asymptotic maximum value (saturation level). Must be > 0.lam(Union[float, np.ndarray]): Parameter controlling the steepness/transition point. Must be > 0.
Returns:
Union[float, np.ndarray]: The calculated sigmoid saturation value(s).
Raises:
ValueError: Ifalphaorlamis less than or equal to 0.
compute_sigmoid_second_derivative¶
def compute_sigmoid_second_derivative(
x: Union[float, np.ndarray, npt.NDArray[np.float64]],
alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Union[float, Any]:
Computes the second derivative of the sigmoid_saturation function with respect to x. Used to find the inflection point.
Parameters:
x,alpha,lam: Same assigmoid_saturation.
Returns:
Union[float, np.ndarray]: The second derivative value(s).
find_sigmoid_inflection_point¶
def find_sigmoid_inflection_point(
alpha: Union[float, np.ndarray, npt.NDArray[np.float64]],
lam: Union[float, np.ndarray, npt.NDArray[np.float64]],
) -> Tuple[float, Union[float, np.ndarray]]:
Finds the inflection point (point of maximum growth rate) of the sigmoid_saturation function by numerically minimizing the absolute value of the second derivative using scipy.optimize.minimize_scalar.
Parameters:
alpha,lam: Same assigmoid_saturation.
Returns:
Tuple[float, Union[float, np.ndarray]]: A tuple containing:The x-coordinate of the inflection point.
The y-coordinate (saturation value) at the inflection point.
standardize_scenarios_dict_keys¶
def standardize_scenarios_dict_keys(
d: Dict,
keywords: List[str]
) -> None:
Standardizes keys in a dictionary d by replacing keys that contain a keyword (case-insensitive) with the keyword itself. Modifies the dictionary in-place. Useful for cleaning up scenario dictionaries.
Parameters:
d(Dict): The dictionary to modify.keywords(List[str]): The list of standard keywords to use.
apply_sklearn_transformer_across_dim¶
def apply_sklearn_transformer_across_dim(
data: xr.DataArray,
func: Callable[[np.ndarray], np.ndarray],
dim_name: str,
combined: bool = False,
) -> xr.DataArray:
Applies a scikit-learn-like transformation function (func) across a specified dimension (dim_name) of an xarray DataArray. Preserves attributes.
Parameters:
data(xr.DataArray): The input DataArray.func(Callable): The transformation function (e.g.,scaler.transform,scaler.inverse_transform).dim_name(str): The name of the dimension to apply the function along.combined(bool, optional): Flag indicating if the inputdatahas combined chain/draw dimensions. Defaults toFalse.
Returns:
xr.DataArray: The transformed DataArray.
create_new_spend_data¶
def create_new_spend_data(
spend: np.ndarray,
adstock_max_lag: int,
one_time: bool,
spend_leading_up: Optional[np.ndarray] = None,
) -> np.ndarray:
Constructs an array representing spend over time, suitable for passing to adstock functions when calculating response curves or contributions for specific spend scenarios. It includes padding for the adstock lag.
Parameters:
spend(np.ndarray): Array representing the spend amount(s) for each channel. Shape(n_channels,).adstock_max_lag(int): The maximum adstock lag used in the model.one_time(bool): IfTrue, assumesspendis a one-time pulse, followed by zeros. IfFalse, assumesspendis constant over the adstock period.spend_leading_up(Optional[np.ndarray], optional): Spend values for theadstock_max_lagperiods before the mainspend. Shape(n_channels,). Defaults to zeros ifNone.
Returns:
np.ndarray: An array representing spend over time, including leading padding and the main spend pulse/period, with shape(2 * adstock_max_lag + 1, n_channels).
(See the docstring in the source code for a plot illustrating the output structure.)