Guide: Configuring the Model Run¶
This guide explains how to set up the YAML configuration file required to run the Abacus Media Mix Model.
Overview¶
The configuration file tells Abacus how to handle your data, which model parameters to use, and what components (like seasonality or control variables) to include. It uses the YAML format for readability.
Quick Start¶
Copy an Example: Start with one of the Example Configurations below.
Prepare Data: Ensure your input CSV file is ready, following the Data Preparation Guide. The path to this CSV is typically provided when running the model driver script, not within this configuration file.
Set Core Options:
Define
raw_data_granularity(dailyorweekly).Specify the
target_col(your main outcome metric, e.g., sales, conversions).List your
mediachannels, providingdisplay_name,impressions_col(for model input), andspend_col(for ROI calculation) for each.
Specify Data Range: Use
train_test_ratioorstart_date/end_dateunderdata_rowsto select the data period.Add Controls/Seasonality (Optional): If needed, list
control_columnsor setyearly_seasonality.Run the Model: Execute the model using the driver script (e.g.,
demo/runme.py), pointing it to your configuration file.Iterate: Review the results and adjust configuration parameters (like priors, adstock lag, sampler settings) as needed based on model diagnostics.
Configuration File Structure¶
The YAML file typically includes these main sections:
Data Handling Options (
data_rows): Controls how the input data is read and split.Column Definitions: Maps column names in your CSV to their roles in the model (
target_col,media,control_columns,ignore_cols).Model Parameters: Sets parameters influencing model fitting and performance (sampler settings, adstock, seasonality).
Custom Priors (
custom_priors): (Optional) Allows specifying custom Bayesian priors for model parameters.
(For a detailed explanation of every parameter, see the Configuration Reference.)
Key Sections Explained¶
Data Handling Options (data_rows)¶
Use
train_test_ratio(e.g.,0.8for 80% train, 20% test) for simple time-based splits. A value of1.0uses all data for training (no hold-out set).Alternatively, use
start_dateandend_date(YYYY-MM-DD format) to select a specific time window.
Column Definitions¶
date_col: Essential. Specifies the name of the date column in your input CSV (defaults to “date” if omitted).target_col: Crucial. The main metric you want the model to predict/explain.media: Define each marketing channel here.impressions_col(volume/exposure) is used in the core model transformation, whilespend_colis needed later to calculate ROI.display_nameis for reporting.control_columns: Include external factors (e.g., competitor spend, promotions, economic indicators) that might influence the target variable but aren’t marketing channels being modelled directly. These must be numeric (encode categorical variables beforehand). Prophet-generated components are also added here automatically if enabled (see Prophet Integration (prophet)).extra_features_impact: (Optional) A dictionary mapping control column names to'negative'. If specified, the values in these columns will be multiplied by -1 during data loading. Useful if a feature inherently has an inverse relationship with the target (e.g., competitor spend). Example:extra_features_impact: {competitor_spend: 'negative'}.ignore_cols: List any columns from your CSV that should be completely ignored by the model.
Model Parameters¶
Sampler Settings (
tune,draws,chains,target_accept): These control the MCMC sampling process. See Performance Guidelines for recommendations based on data size. Tuning these affects runtime and the reliability of the results. Check convergence diagnostics after running.adstock_max_lag: Defines the maximum carry-over period for marketing effects (in units of yourraw_data_granularity). Choose based on expected duration of advertising impact (e.g., 4-12 weeks for weekly data, 7-28 days for daily data).yearly_seasonality: (Optional) Integer specifying the number of Fourier modes to include directly in the PyMC model for modelling yearly seasonality (e.g., 3 to 6). If omitted ornull, no explicit Fourier seasonality component is added (though seasonality might be captured by control variables or Prophet components). See also Prophet Integration (prophet).
Prophet Integration (prophet)¶
(Optional) This section configures the use of Meta’s Prophet library for automatic seasonality and holiday decomposition. If this section is included, Prophet runs first, and its generated components are added as features for the main PyMC model.
holiday_country: Specify the country code (e.g., ‘US’, ‘GB’) to automatically include built-in holidays for that country.include_holidays: Set totrueif you provide a separate holiday file (path specified when running the model driver) and want Prophet to use it. The country setting still applies for formatting.daily_seasonality,weekly_seasonality,yearly_seasonality: Set these totruewithin theprophetblock to enable Prophet’s decomposition for these specific seasonalities.Automatic Feature Addition: If any Prophet components are enabled (trend is implicitly always generated if Prophet runs), corresponding columns (‘trend’, ‘daily’, ‘weekly’, ‘yearly’, ‘holidays’) are automatically added to the
control_columnslist passed to the PyMC model.
Note: The top-level yearly_seasonality parameter controls Fourier terms added directly to the PyMC model, independent of Prophet’s yearly seasonality decomposition. You can use either, both, or neither.
Custom Priors (custom_priors)¶
This advanced section allows you to override the default Bayesian priors. Use this if you have strong prior beliefs about parameter values based on domain knowledge or previous studies. See the Prior Selection Guidelines below and the Configuration Reference for distribution options and formatting. Always check the impact of custom priors on model diagnostics.
Performance Guidelines¶
Adjust sampler settings based on your dataset size:
Small (< 100 rows):
tune: 1000,draws: 1000,chains: 2,target_accept: 0.8Medium (100-500 rows):
tune: 2000,draws: 2000,chains: 4,target_accept: 0.95Large (> 500 rows):
tune: 4000,draws: 4000,chains: 4(or more depending on cores),target_accept: 0.95
Be mindful of memory usage, especially when increasing draws and chains.
Prior Selection Guidelines¶
intercept: OftenLogNormalif target > 0,Normalotherwise. Scale based on target magnitude.beta_channel(Effectiveness):HalfNormalis common (assumes positive effect). Adjustsigmabased on expected effect range.alpha(Adstock Decay):Betadistribution (constrains between 0-1). Higheralpha-> faster decay.lam(Saturation):GammaorHalfNormal. Controls how quickly saturation occurs.likelihood['kwargs']['sigma'](Error Term):HalfNormalis common. Represents unexplained variance.gamma_control/gamma_fourier(Fourier modes fromyearly_seasonality): OftenNormalorLaplace(allows positive/negative effects). Adjust scale (sigmaorb) based on expected impact size.
(Refer to PyMC documentation for details on specific distributions and their parameters.)
Troubleshooting¶
Convergence Issues: Increase
tune/draws, adjusttarget_accept, check data quality, review priors.Memory Errors: Reduce
draws/chains.Poor Fit: Check data quality, add relevant
control_columns, adjust priors, verifyyearly_seasonalitysetting.
Example Configurations¶
Basic Weekly Model¶
# --- Data Handling ---
raw_data_granularity: weekly
data_rows:
train_test_ratio: 0.9 # 90% train, 10% test
# --- Column Definitions ---
target_col: sales
media:
- display_name: TV
impressions_col: tv_impressions
spend_col: tv_spend
- display_name: Radio
impressions_col: radio_impressions
spend_col: radio_spend
# control_columns: null # Explicitly none
# ignore_cols: null
# --- Model Parameters ---
# adstock_max_lag: 4 # Example: 4 weeks lag
# yearly_seasonality: null # Explicitly none
# tune: 1000 # Example sampler settings
# draws: 1000
# chains: 2
# target_accept: 0.8
# --- Custom Priors ---
# custom_priors: null # Using default priors
Daily Model with Controls and Seasonality¶
# --- Data Handling ---
raw_data_granularity: daily
data_rows:
start_date: 2022-01-01
end_date: 2023-12-31
# --- Column Definitions ---
target_col: conversions
media:
- display_name: Facebook Ads
impressions_col: fb_impressions
spend_col: fb_cost
- display_name: Google Search
impressions_col: search_clicks # Using clicks as proxy
spend_col: search_cost
control_columns:
- competitor_promo_active
- is_holiday # Manually created holiday flag
# ignore_cols: null
# --- Model Parameters ---
adstock_max_lag: 14 # 2 weeks for daily data
yearly_seasonality: 3 # Include 3 Fourier modes for yearly seasonality
tune: 2000
draws: 2000
chains: 4
target_accept: 0.9
# --- Custom Priors ---
# custom_priors: null # Using default priors