Abacus Preprocessing: Configuration (config.py)

This module provides functions for loading user-defined configuration files (YAML) and preparing the detailed model_config dictionary required to initialise and configure an Abacus Marketing Mix Model (MMM). It handles merging default model settings with custom specifications and potentially calculating data-driven priors.

Functions

load_config

def load_config(config_file: str) -> tuple[bytes, dict]:

Loads the configuration settings from a specified YAML file.

This function reads the entire file content (preserving comments for potential later use, although not currently used elsewhere) and also parses the YAML structure into a Python dictionary. It includes error handling for FileNotFoundError. It also calls abacus.prepro.seas.update_config_with_prophet_components to potentially add Prophet-related settings if applicable based on the config content.

Parameters:

  • config_file (str): The full path to the YAML configuration file.

Returns:

  • tuple[bytes, dict]: A tuple containing:

    • The raw byte content of the configuration file.

    • A dictionary representing the parsed YAML configuration.


setup_and_configure_model

def setup_and_configure_model(
    data: pd.DataFrame,
    config: dict,
    directory: Optional[str] = None,
    filename: str = "model_config.json"
) -> dict:

Prepares the final model_config dictionary used to instantiate an MMM, merging defaults with user specifications and potentially calculating data-driven priors.

This function performs several key steps:

  1. Identifies channel columns from the main config.

  2. Calculates the historical spend share for each channel from the provided data.

  3. Calculates a data-driven prior sigma for the beta_channel parameter based on spend shares (using prior_sigma = HALFNORMAL_SCALE * n_channels * spend_share).

  4. Instantiates a dummy model (DelayedSaturatedMMM) to access its default_model_config.

  5. Creates a deep copy of the default_model_config.

  6. If config['custom_priors'] exists and is truthy, it updates the copied model_config with these custom settings.

  7. If config['custom_priors'] exists and config['custom_sigma'] is True, it specifically overrides the sigma value within the beta_channel prior settings in the copied model_config with the calculated prior_sigma.

  8. Recursively converts any lists within the final model_config dictionary into NumPy arrays using convert_list_to_array.

  9. Optionally saves the resulting model_config dictionary as a JSON file to the specified directory and filename using a custom encoder (CustomJSONEncoder) to handle NumPy arrays.

Parameters:

  • data (pd.DataFrame): The DataFrame containing historical marketing spend data (used to calculate spend shares for prior sigma).

  • config (dict): The main configuration dictionary loaded from the YAML file (e.g., via load_config). It’s checked for keys like 'media', 'custom_priors', and 'custom_sigma'.

  • directory (Optional[str], optional): If provided, the directory path where the final model_config JSON file should be saved. Defaults to None (no saving).

  • filename (str, optional): The name for the saved JSON configuration file. Defaults to "model_config.json".

Returns:

  • dict: The fully prepared model_config dictionary, ready to be passed to the MMM class constructor (e.g., DelayedSaturatedMMM(model_config=...)).


CustomJSONEncoder

class CustomJSONEncoder(json.JSONEncoder):
    # ... implementation ...

A helper class inheriting from json.JSONEncoder. It overrides the default method to automatically convert NumPy arrays (np.ndarray) into standard Python lists during JSON serialisation. This allows the model_config dictionary, which may contain NumPy arrays after convert_list_to_array, to be saved correctly as JSON.


convert_list_to_array

def convert_list_to_array(config: dict) -> None:

Recursively traverses a dictionary and converts any list values into NumPy arrays (np.array). This is applied to the model_config dictionary before it’s returned by setup_and_configure_model or saved to JSON.

Parameters:

  • config (dict): The dictionary to process (modified in-place).

Returns:

  • None.