Configuration Reference

This document provides a detailed reference for all parameters available in the Abacus MMM config.yaml file.

File Structure Overview

# Data handling
raw_data_granularity: weekly # or daily
train_test_ratio: 0.9 # optional, default 0.9
data_rows: # optional
  total: 156 # optional
  start_date: '2021-01-01' # optional
  end_date: '2023-12-31' # optional

# Column mapping
ignore_cols: # optional
  - unused_column_1
date_col: date # optional, default 'date'
target_col: sales # required
control_columns: # optional
  - competitor_spend
  - is_promo_week
media: # required, list of media channels
  - display_name: TV
    impressions_col: tv_impressions
    spend_col: tv_spend
  - display_name: Search
    impressions_col: search_clicks # Can be clicks, impressions, etc.
    spend_col: search_cost
  # ... more media channels

# Model/Sampler Parameters
tune: 2000 # optional, default 2000
draws: 2000 # optional, default 2000
chains: 4 # optional, default 4
adstock_max_lag: 4 # optional, default 4
yearly_seasonality: 3 # optional, number of Fourier modes (e.g., 3) or null
target_accept: 0.95 # optional, default depends on PyMC sampler (e.g., 0.8 or 0.95)
seed: 42 # optional

# Custom Priors (Optional Section)
custom_priors:
  intercept:
    dist: Normal # e.g., Normal, LogNormal
    kwargs: { mu: 0, sigma: 2 }
  beta_channel:
    dist: HalfNormal # e.g., HalfNormal, Normal
    kwargs: { sigma: 2 }
  alpha: # Adstock decay rate
    dist: Beta
    kwargs: { alpha: 1, beta: 3 }
  lam: # Saturation parameter
    dist: Gamma
    kwargs: { alpha: 3, beta: 1 }
  likelihood:
    dist: Normal # e.g., Normal, StudentT
    kwargs:
      sigma: # Error term distribution
        dist: HalfNormal
        kwargs: { sigma: 2 }
  gamma_control: # Control variable coefficients
    dist: Normal
    kwargs: { mu: 0, sigma: 2 }
  gamma_fourier: # Seasonality coefficients
    dist: Laplace
    kwargs: { mu: 0, b: 1 }

Parameter Details

Data Handling Options

  • raw_data_granularity

    • Description: The time frequency of the input data rows.

    • Type: string

    • Allowed Values: "daily", "weekly"

    • Required: Yes

  • train_test_ratio

    • Description: Proportion of the data (ordered by date) to use for training. The rest is used for out-of-sample testing. Ignored if data_rows with start_date/end_date is used.

    • Type: float

    • Range: 0.5 to 1.0

    • Default: 0.9

    • Required: No

  • data_rows

    • Description: Defines the specific subset of data to use based on row count or dates. Overrides train_test_ratio if present.

    • Type: object

    • Required: No

    • Properties:

      • total (Optional int): Total number of rows to use from the start of the dataset.

      • start_date (Optional string): Start date (inclusive, format YYYY-MM-DD).

      • end_date (Optional string): End date (inclusive, format YYYY-MM-DD).

Column Definitions

  • ignore_cols

    • Description: A list of column names from the input CSV to exclude from any processing or modeling.

    • Type: list of string

    • Required: No

  • date_col

    • Description: The name of the column containing dates. Dates must be parsable, ideally in YYYY-MM-DD format.

    • Type: string

    • Default: "date"

    • Required: No (uses default if omitted)

  • target_col

    • Description: The name of the column representing the target variable (e.g., sales, conversions).

    • Type: string

    • Required: Yes

  • control_columns

    • Description: A list of column names representing control variables or external factors (e.g., competitor activity, promotions, economic indicators). These are included as linear regressors in the model. Ensure these columns are numeric (encode categorical variables beforehand).

    • Type: list of string

    • Required: No

  • media

    • Description: A list defining each media channel to be included in the model.

    • Type: list of object

    • Required: Yes

    • Object Properties (for each list item):

      • display_name (string, required): Name used for the channel in reports and plots.

      • impressions_col (string, required): Column name containing the volume metric for the channel (e.g., impressions, clicks, GRPs). Used in the adstock/saturation transformation.

      • spend_col (string, required): Column name containing the cost/spend data for the channel. Used for ROI calculations.

Model/Sampler Parameters

  • tune

    • Description: Number of tuning (burn-in) steps for the MCMC sampler.

    • Type: int

    • Default: 2000

    • Required: No

  • draws

    • Description: Number of sampling steps to perform after tuning for each chain.

    • Type: int

    • Default: 2000

    • Required: No

  • chains

    • Description: Number of independent MCMC chains to run. Running multiple chains (>=2) is essential for assessing convergence (e.g., using R-hat).

    • Type: int

    • Default: 4

    • Required: No

  • adstock_max_lag

    • Description: The maximum number of time periods (in units of raw_data_granularity) over which the effect of advertising spend can carry over (adstock).

    • Type: int

    • Range: 1 to 52 (Warning if > 26)

    • Default: 4

    • Required: No

  • yearly_seasonality

    • Description: The number of Fourier modes to include for modelling yearly seasonality. Set to null or omit to disable this component.

    • Type: int or null

    • Range: Typically 1 to 10 if used.

    • Required: No

  • target_accept

    • Description: The target acceptance rate for adaptive MCMC samplers like NUTS. Controls the step size adaptation.

    • Type: float

    • Range: 0.6 to 0.99

    • Default: Depends on the PyMC sampler (often 0.8 or 0.95).

    • Required: No

  • seed

    • Description: An integer seed for the random number generator used by the sampler, ensuring reproducibility.

    • Type: int

    • Required: No

Custom Priors (custom_priors)

This entire section is optional. If omitted, the default priors defined in the model class are used.

  • Description: Allows overriding the default prior distributions for model parameters. Each key corresponds to a model parameter group.

  • Type: object

  • Required: No

  • Properties (Parameter Groups):

    • intercept: Prior for the base intercept term.

    • beta_channel: Prior for the effectiveness coefficients of media channels.

    • alpha: Prior for the decay rate parameter in the adstock transformation (typically Beta distribution).

    • lam: Prior for the saturation parameter in the logistic saturation function (typically Gamma or HalfNormal).

    • likelihood: Defines the observation distribution and its parameters (e.g., sigma for Normal likelihood).

    • gamma_control: Prior for the coefficients of control variables (control_columns).

    • gamma_fourier: Prior for the coefficients of Fourier seasonality terms (only used if yearly_seasonality is set in the main parameters).

  • Structure (for each parameter group):

    • dist (string, required): Name of the PyMC distribution (e.g., “Normal”, “HalfNormal”, “Beta”, “Gamma”, “Laplace”, “LogNormal”).

    • kwargs (object, required): Dictionary of keyword arguments for the specified PyMC distribution (e.g., mu, sigma, alpha, beta, b). Refer to PyMC documentation for valid parameters for each distribution.