Troubleshooting Guide¶
This guide provides solutions to common issues, performance tips, and debugging strategies when working with the Abacus library.
Common Errors¶
This section documents frequently encountered errors during data preprocessing, model building, fitting, or analysis, along with their likely causes and solutions.
Data Preprocessing Errors¶
ValueError: Input contains NaN: This typically occurs during the validation step (viaabacus.diagnostics.input_validator.check_nans) if your input data contains missing values in required columns.Solution: Cleanse your input data (
.csvfile) to remove or impute NaN values before passing it to Abacus.
ValueError: Input contains duplicate columns: Triggered byabacus.diagnostics.input_validator.check_duplicate_columnsif your input data has columns with identical names.Solution: Ensure all column names in your input data are unique.
ValueError: Date column ...(various issues): Caused by issues detected byabacus.diagnostics.input_validator.check_date_column, such as the column missing, not being sortable to datetime, having gaps, or inconsistent frequency.Solution: Verify the
date_colname in your configuration matches the input data, ensure the column format is parsable (e.g.,YYYY-MM-DD), check for chronological sorting, and ensure data consistency for the specifiedraw_data_granularity.
ValueError: Column ... has zero variance: Raised byabacus.diagnostics.input_validator.check_column_varianceif a column contains only a single unique value. Constant columns generally provide no information for modelling.Solution: Review the specified column. Remove it if it’s truly constant and uninformative, or investigate why it lacks variance if it shouldn’t.
Model Fitting/Analysis Errors¶
TypeErrorduring plotting (e.g., inplot_roi): Can occur if functions expect scalar values but receive NumPy arrays or pandas Series. Often happens when extracting data from model results (e.g.,InferenceData).Solution: Ensure you extract scalar values correctly, often by using
.item()on single-element NumPy arrays or selecting specific values from pandas structures before passing them to plotting functions.
NameError: name '...' is not defined: Simple typos in function or variable names.Solution: Correct the spelling of the name based on the library’s actual API (e.g.,
plot_roi_distributioninstead ofplot_roi_disribution).
ValueErrorduring plotting (e.g., in decomposition or contribution plots): Can arise from unexpected data shapes or formats being passed to plotting helpers, sometimes after data manipulation likemelt.Solution: Check the data being passed to the plotting function. Ensure its shape and structure match the function’s expectations. Debugging might involve inspecting the DataFrame or array just before the error occurs.
Metric Calculation Errors (e.g.,
ValueError: y_true and y_pred have different dimensionsin R-squared): Occurs when comparing arrays (like true vs. predicted values) that don’t have compatible shapes for the metric being calculated.Solution: Ensure that both the true values and the predicted values are NumPy arrays with compatible shapes before passing them to scoring functions (e.g., from
sklearn.metrics). Reshaping or aggregation (like taking the mean across prediction chains) might be necessary.
Warning: “Unexpected HDI shape…” (e.g., in posterior predictive plots): ArviZ can return Highest Density Interval (HDI) arrays in different shapes (e.g.,
(2, n_obs)or(n_obs, 2)). While Abacus tries to handle common shapes, a warning indicates a potentially unhandled format was encountered. The plot might still render using means instead of intervals.Solution: Usually safe to ignore if the plot looks reasonable, but indicates the HDI interval might not be displayed. If the interval is crucial, you might need to investigate the specific shape returned by ArviZ for your model and potentially adjust the plotting code or report it as an issue.
Missing Optional Dependencies (e.g.,
ImportError: Missing optional dependency 'Jinja2'): Some features, particularly related to styling pandas DataFrames for output, might require extra libraries not installed by default.Solution: Install the missing dependency. Often, installing
abacus[dev]covers these, or you can install it directly (pip install Jinja2).
(More errors and solutions to be added based on common user issues and known error conditions)
Performance Issues¶
Guidance on identifying and addressing performance bottlenecks, particularly during model fitting (sampling).
Model Complexity: More complex models (many channels, intricate transformations, complex priors) naturally take longer to sample.
Mitigation: Start with simpler models and incrementally add complexity. Consider if all included components are necessary. Check prior specifications to ensure they are reasonable and not causing sampling difficulties.
Data Size: Larger datasets (more observations or features) increase computation time.
Mitigation: Ensure data is clean and relevant. While reducing data isn’t always feasible, be mindful of the computational cost.
Sampling Settings: The number of draws, tuning steps, and chains directly impact runtime (
chains,draws,tuneparameters infit_model). PyMC’s default sampler (NUTS) can sometimes struggle with certain model geometries.Mitigation: Experiment with sampling settings. Reducing draws/tuning steps can speed up exploration but may yield less stable estimates. Increasing
target_accept(e.g., to 0.9 or 0.95) can sometimes help with difficult posteriors but slows down sampling. If NUTS struggles (many divergences), exploring alternative samplers might be necessary, though this is an advanced topic.
Data Scaling: While Abacus includes preprocessing steps, poorly scaled input data can sometimes hinder sampler efficiency.
Mitigation: Review the scaling applied by
abacus.prepro.scaler.Scaler. Ensure target and media/control variables are reasonably scaled. Standardisation (zero mean, unit variance) is common.
(More specific tips can be added as performance characteristics are better understood)
Debugging Tips¶
Strategies and techniques for debugging Abacus models and workflows.
Inspect Intermediate Data: Before fitting a model, examine the data structures created during the driver script’s execution. Key objects often include:
processed_data(DataFrame after initial parsing/Prophet).input_data(instance ofabacus.prepro.input_data.InputData).data_to_fit(instance ofabacus.prepro.data_to_fit.DataToFit, containing scaled data). Use methods like.head(),.info(),.describe(), or the.dump()method onInputDataobjects to check values and shapes.
Visualise Priors: Before running the full sampler, use PyMC’s
model.debug()or sample from the priors (pm.sample_prior_predictive(samples=50)) within your script after the model is built (e.g., inside the driver or model class) to check if the specified priors generate reasonable values and behaviours. Unreasonable priors can significantly hinder sampling.Examine Posteriors: After sampling, use ArviZ (
az) functions (az.summary,az.plot_trace,az.plot_posterior) on theInferenceDataobject (often stored inmodel.fit_resultor loaded frommodel.nc) to diagnose issues. Look for:Poor mixing in trace plots (
model_priors_and_posteriors.png).Multi-modal posteriors (
model_priors_and_posteriors.png).Posteriors hitting the bounds of the prior distributions.
High
r_hatvalues (ideally close to 1.0) in the summary (model_summary.csv), indicating convergence issues.Low effective sample sizes (
ess_bulk,ess_tail) in the summary.
Check Model Structure: Use
model.graphviz()(requirespython-graphviz) after the model is built to visualise the PyMC model graph and confirm it matches your expectations regarding variable dependencies.Simplify the Model: If encountering persistent issues, try simplifying the model. Remove channels, controls, or complex transformations (like Prophet or specific seasonality terms) one by one to isolate the problematic component.
Use Logging: Increase logging verbosity (if available within Abacus or PyMC) to get more detailed information during execution. Python’s built-in
loggingmodule can also be used in your driver scripts.
(More specific tips can be added)
Frequently Asked Questions (FAQ)¶
Answers to common questions about using Abacus.
Q: How do I specify custom priors for model parameters?
A: Use the
custom_priorssection in your YAML configuration file. Refer to the Custom Priors (custom_priors) section in the Configuration Guide and the Configuration Reference for syntax and available distributions.
Q: Can I use different adstock/saturation transformations for different channels?
A: The core model structure currently applies the same adstock (geometric) and saturation (logistic or Michaelis-Menten, chosen implicitly by the model class) transformations across all channels, although the parameters (
alpha,lam,beta_channel) are estimated per channel. Channel-specific transformation types are not configurable via YAML.
Q: My model sampling is very slow or has many divergences. What should I check first?
A: Review the “Performance Issues” and “Debugging Tips” sections above. Key areas to check include model complexity (number of channels/controls, seasonality settings), data scaling, prior specifications (are they too narrow or unreasonable?), and sampling settings (
draws,tune,target_accept). Simplifying the model or visualising priors/posteriors can help diagnose the issue.
Q: How are holidays handled?
A: Holidays can be handled in two main ways:
Prophet Integration: If using the
prophetblock in your config, setinclude_holidays: trueand provide a holiday file path to the driver script. Prophet will generate a ‘holidays’ component added as a control variable. You can also use built-in country holidays viaholiday_country. See the Prophet Integration (prophet) section in the Configuration Guide.Manual Control Variable: Create a binary column in your input CSV indicating holiday periods (e.g., 1 for holiday, 0 otherwise) and include this column name in the
control_columnslist in your config.
Q: Where can I find the results of the budget optimisation?
A: The optimisation results are typically saved to files in the specified output directory (usually
results/). Look for files likeoptimization_results.csvand plots likebudget_optimisation.png.
(More FAQs to be added based on user feedback and further development)