# Guide: Configuring the Model Run This guide explains how to set up the YAML configuration file required to run the Abacus Media Mix Model. ## Overview The configuration file tells Abacus how to handle your data, which model parameters to use, and what components (like seasonality or control variables) to include. It uses the YAML format for readability. ## Quick Start 1. **Copy an Example:** Start with one of the {ref}`example-configurations` below. 2. **Prepare Data:** Ensure your input CSV file is ready, following the [Data Preparation Guide](../getting-started/data_preparation.md). The path to this CSV is typically provided when running the model driver script, not within this configuration file. 3. **Set Core Options:** * Define `raw_data_granularity` (`daily` or `weekly`). * Specify the `target_col` (your main outcome metric, e.g., sales, conversions). * List your `media` channels, providing `display_name`, `impressions_col` (for model input), and `spend_col` (for ROI calculation) for each. 4. **Specify Data Range:** Use `train_test_ratio` or `start_date`/`end_date` under `data_rows` to select the data period. 5. **Add Controls/Seasonality (Optional):** If needed, list `control_columns` or set `yearly_seasonality`. 6. **Run the Model:** Execute the model using the driver script (e.g., `demo/runme.py`), pointing it to your configuration file. 7. **Iterate:** Review the [results](./interpreting_results.md) and adjust configuration parameters (like priors, adstock lag, sampler settings) as needed based on model diagnostics. ## Configuration File Structure The YAML file typically includes these main sections: * **Data Handling Options (`data_rows`):** Controls how the input data is read and split. * **Column Definitions:** Maps column names in your CSV to their roles in the model (`target_col`, `media`, `control_columns`, `ignore_cols`). * **Model Parameters:** Sets parameters influencing model fitting and performance (sampler settings, adstock, seasonality). * **Custom Priors (`custom_priors`):** (Optional) Allows specifying custom Bayesian priors for model parameters. *(For a detailed explanation of every parameter, see the [Configuration Reference](../reference/config_reference.md).)* ## Key Sections Explained ### Data Handling Options (`data_rows`) * Use `train_test_ratio` (e.g., `0.8` for 80% train, 20% test) for simple time-based splits. A value of `1.0` uses all data for training (no hold-out set). * Alternatively, use `start_date` and `end_date` (YYYY-MM-DD format) to select a specific time window. ### Column Definitions * **`date_col`**: Essential. Specifies the name of the date column in your input CSV (defaults to "date" if omitted). * **`target_col`**: Crucial. The main metric you want the model to predict/explain. * **`media`**: Define each marketing channel here. `impressions_col` (volume/exposure) is used in the core model transformation, while `spend_col` is needed later to calculate ROI. `display_name` is for reporting. * **`control_columns`**: Include external factors (e.g., competitor spend, promotions, economic indicators) that might influence the target variable but aren't marketing channels being modelled directly. These must be numeric (encode categorical variables beforehand). Prophet-generated components are also added here automatically if enabled (see {ref}`prophet-integration`). * **`extra_features_impact`**: (Optional) A dictionary mapping control column names to `'negative'`. If specified, the values in these columns will be multiplied by -1 during data loading. Useful if a feature inherently has an inverse relationship with the target (e.g., competitor spend). Example: `extra_features_impact: {competitor_spend: 'negative'}`. * **`ignore_cols`**: List any columns from your CSV that should be completely ignored by the model. ### Model Parameters * **Sampler Settings (`tune`, `draws`, `chains`, `target_accept`):** These control the MCMC sampling process. See {ref}`performance-guidelines` for recommendations based on data size. Tuning these affects runtime and the reliability of the results. Check convergence diagnostics after running. * **`adstock_max_lag`**: Defines the maximum carry-over period for marketing effects (in units of your `raw_data_granularity`). Choose based on expected duration of advertising impact (e.g., 4-12 weeks for weekly data, 7-28 days for daily data). * **`yearly_seasonality`**: (Optional) Integer specifying the number of Fourier modes to include *directly in the PyMC model* for modelling yearly seasonality (e.g., 3 to 6). If omitted or `null`, no explicit Fourier seasonality component is added (though seasonality might be captured by control variables or Prophet components). See also {ref}`prophet-integration`. (prophet-integration)= ### Prophet Integration (`prophet`) (Optional) This section configures the use of Meta's Prophet library for automatic seasonality and holiday decomposition. If this section is included, Prophet runs first, and its generated components are added as features for the main PyMC model. * **`holiday_country`**: Specify the country code (e.g., 'US', 'GB') to automatically include built-in holidays for that country. * **`include_holidays`**: Set to `true` if you provide a separate holiday file (path specified when running the model driver) and want Prophet to use it. The country setting still applies for formatting. * **`daily_seasonality`**, **`weekly_seasonality`**, **`yearly_seasonality`**: Set these to `true` within the `prophet` block to enable Prophet's decomposition for these specific seasonalities. * **Automatic Feature Addition:** If any Prophet components are enabled (trend is implicitly always generated if Prophet runs), corresponding columns ('trend', 'daily', 'weekly', 'yearly', 'holidays') are automatically added to the `control_columns` list passed to the PyMC model. *Note:* The top-level `yearly_seasonality` parameter controls Fourier terms added *directly* to the PyMC model, independent of Prophet's yearly seasonality decomposition. You can use either, both, or neither. (custom-priors)= ### Custom Priors (`custom_priors`) This advanced section allows you to override the default Bayesian priors. Use this if you have strong prior beliefs about parameter values based on domain knowledge or previous studies. See the {ref}`prior-selection-guidelines` below and the [Configuration Reference](../reference/config_reference.md) for distribution options and formatting. Always check the impact of custom priors on model diagnostics. (performance-guidelines)= ## Performance Guidelines Adjust sampler settings based on your dataset size: * **Small (< 100 rows):** `tune: 1000`, `draws: 1000`, `chains: 2`, `target_accept: 0.8` * **Medium (100-500 rows):** `tune: 2000`, `draws: 2000`, `chains: 4`, `target_accept: 0.95` * **Large (> 500 rows):** `tune: 4000`, `draws: 4000`, `chains: 4` (or more depending on cores), `target_accept: 0.95` Be mindful of memory usage, especially when increasing `draws` and `chains`. (prior-selection-guidelines)= ## Prior Selection Guidelines * **`intercept`**: Often `LogNormal` if target > 0, `Normal` otherwise. Scale based on target magnitude. * **`beta_channel`** (Effectiveness): `HalfNormal` is common (assumes positive effect). Adjust `sigma` based on expected effect range. * **`alpha`** (Adstock Decay): `Beta` distribution (constrains between 0-1). Higher `alpha` -> faster decay. * **`lam`** (Saturation): `Gamma` or `HalfNormal`. Controls how quickly saturation occurs. * **`likelihood['kwargs']['sigma']`** (Error Term): `HalfNormal` is common. Represents unexplained variance. * **`gamma_control` / `gamma_fourier`** (Fourier modes from `yearly_seasonality`): Often `Normal` or `Laplace` (allows positive/negative effects). Adjust scale (`sigma` or `b`) based on expected impact size. *(Refer to PyMC documentation for details on specific distributions and their parameters.)* ## Troubleshooting * **Convergence Issues:** Increase `tune`/`draws`, adjust `target_accept`, check data quality, review priors. * **Memory Errors:** Reduce `draws`/`chains`. * **Poor Fit:** Check data quality, add relevant `control_columns`, adjust priors, verify `yearly_seasonality` setting. (example-configurations)= ## Example Configurations ### Basic Weekly Model ```yaml # --- Data Handling --- raw_data_granularity: weekly data_rows: train_test_ratio: 0.9 # 90% train, 10% test # --- Column Definitions --- target_col: sales media: - display_name: TV impressions_col: tv_impressions spend_col: tv_spend - display_name: Radio impressions_col: radio_impressions spend_col: radio_spend # control_columns: null # Explicitly none # ignore_cols: null # --- Model Parameters --- # adstock_max_lag: 4 # Example: 4 weeks lag # yearly_seasonality: null # Explicitly none # tune: 1000 # Example sampler settings # draws: 1000 # chains: 2 # target_accept: 0.8 # --- Custom Priors --- # custom_priors: null # Using default priors ``` ### Daily Model with Controls and Seasonality ```yaml # --- Data Handling --- raw_data_granularity: daily data_rows: start_date: 2022-01-01 end_date: 2023-12-31 # --- Column Definitions --- target_col: conversions media: - display_name: Facebook Ads impressions_col: fb_impressions spend_col: fb_cost - display_name: Google Search impressions_col: search_clicks # Using clicks as proxy spend_col: search_cost control_columns: - competitor_promo_active - is_holiday # Manually created holiday flag # ignore_cols: null # --- Model Parameters --- adstock_max_lag: 14 # 2 weeks for daily data yearly_seasonality: 3 # Include 3 Fourier modes for yearly seasonality tune: 2000 draws: 2000 chains: 4 target_accept: 0.9 # --- Custom Priors --- # custom_priors: null # Using default priors