Abacus Preprocessing: Configuration (config.py)¶
This module provides functions for loading user-defined configuration files (YAML) and preparing the detailed model_config dictionary required to initialise and configure an Abacus Marketing Mix Model (MMM). It handles merging default model settings with custom specifications and potentially calculating data-driven priors.
Functions
load_config
def load_config(config_file: str) -> tuple[bytes, dict]:
Loads the configuration settings from a specified YAML file.
This function reads the entire file content (preserving comments for potential later use, although not currently used elsewhere) and also parses the YAML structure into a Python dictionary. It includes error handling for FileNotFoundError. It also calls abacus.prepro.seas.update_config_with_prophet_components to potentially add Prophet-related settings if applicable based on the config content.
Parameters:
config_file(str): The full path to the YAML configuration file.
Returns:
tuple[bytes, dict]: A tuple containing:The raw byte content of the configuration file.
A dictionary representing the parsed YAML configuration.
setup_and_configure_model
def setup_and_configure_model(
data: pd.DataFrame,
config: dict,
directory: Optional[str] = None,
filename: str = "model_config.json"
) -> dict:
Prepares the final model_config dictionary used to instantiate an MMM, merging defaults with user specifications and potentially calculating data-driven priors.
This function performs several key steps:
Identifies channel columns from the main
config.Calculates the historical spend share for each channel from the provided
data.Calculates a data-driven prior sigma for the
beta_channelparameter based on spend shares (usingprior_sigma = HALFNORMAL_SCALE * n_channels * spend_share).Instantiates a dummy model (
DelayedSaturatedMMM) to access itsdefault_model_config.Creates a deep copy of the
default_model_config.If
config['custom_priors']exists and is truthy, it updates the copiedmodel_configwith these custom settings.If
config['custom_priors']exists andconfig['custom_sigma']isTrue, it specifically overrides thesigmavalue within thebeta_channelprior settings in the copiedmodel_configwith the calculatedprior_sigma.Recursively converts any lists within the final
model_configdictionary into NumPy arrays usingconvert_list_to_array.Optionally saves the resulting
model_configdictionary as a JSON file to the specifieddirectoryandfilenameusing a custom encoder (CustomJSONEncoder) to handle NumPy arrays.
Parameters:
data(pd.DataFrame): The DataFrame containing historical marketing spend data (used to calculate spend shares for prior sigma).config(dict): The main configuration dictionary loaded from the YAML file (e.g., viaload_config). It’s checked for keys like'media','custom_priors', and'custom_sigma'.directory(Optional[str], optional): If provided, the directory path where the finalmodel_configJSON file should be saved. Defaults toNone(no saving).filename(str, optional): The name for the saved JSON configuration file. Defaults to"model_config.json".
Returns:
dict: The fully preparedmodel_configdictionary, ready to be passed to the MMM class constructor (e.g.,DelayedSaturatedMMM(model_config=...)).
CustomJSONEncoder
class CustomJSONEncoder(json.JSONEncoder):
# ... implementation ...
A helper class inheriting from json.JSONEncoder. It overrides the default method to automatically convert NumPy arrays (np.ndarray) into standard Python lists during JSON serialisation. This allows the model_config dictionary, which may contain NumPy arrays after convert_list_to_array, to be saved correctly as JSON.
convert_list_to_array
def convert_list_to_array(config: dict) -> None:
Recursively traverses a dictionary and converts any list values into NumPy arrays (np.array). This is applied to the model_config dictionary before it’s returned by setup_and_configure_model or saved to JSON.
Parameters:
config(dict): The dictionary to process (modified in-place).
Returns:
None.