Abacus Preprocessing: Seasonality (seas.py)

This module provides functions for handling seasonality and holidays in the input data, primarily by leveraging the capabilities of the Prophet library (from Meta). It allows for decomposing the time series into trend, seasonality (yearly, weekly, daily), and holiday components, which can then be used as features in the main MMM.

Functions

update_config_with_prophet_components

def update_config_with_prophet_components(config: dict) -> dict:

Updates the main configuration dictionary (config) to include Prophet component names in the extra_features_cols list based on the settings within config['prophet'].

If the config dictionary contains a 'prophet' key, this function checks which Prophet components (e.g., daily_seasonality, weekly_seasonality, yearly_seasonality, include_holidays, trend) are enabled (set to True). For each enabled component, it adds the corresponding feature name (‘daily’, ‘weekly’, ‘yearly’, ‘holidays’, ‘trend’) to the config['extra_features_cols'] list, ensuring uniqueness. This automatically incorporates the decomposed Prophet components as regressors in the subsequent MMM.

Parameters:

  • config (dict): The configuration dictionary loaded from the YAML file. Modified in-place.

Returns:

  • dict: The modified configuration dictionary.


set_holidays

def set_holidays(
    dt_transform: pd.DataFrame, dt_holidays: pd.DataFrame, config: dict
) -> pd.DataFrame:

Processes a raw holidays DataFrame (dt_holidays) to make it suitable for Prophet, based on the time granularity specified in the config.

It filters the holidays by the country specified in config['prophet']['holiday_country']. For weekly or monthly granularity (config['raw_data_granularity']), it aggregates holiday names occurring within the same week or month start date, respectively, into a single string (separated by ‘#’). For daily data, it uses the holidays as is. The date column (‘ds’) is adjusted to represent the start of the week or month for aggregated granularities.

Parameters:

  • dt_transform (pd.DataFrame): The main time series data DataFrame (used to infer date range, although not explicitly used in current implementation logic beyond potential future use).

  • dt_holidays (pd.DataFrame): DataFrame containing holiday information. Expected columns: ‘ds’ (date), ‘country’, ‘holiday’ (name).

  • config (dict): The configuration dictionary. Used to get raw_data_granularity and prophet['holiday_country'].

Returns:

  • pd.DataFrame: A DataFrame formatted for Prophet’s holidays parameter, with columns ‘ds’ and ‘holiday’.


prophet_decomp

def prophet_decomp(
    dt_transform: pd.DataFrame,
    dt_holidays: pd.DataFrame,
    config: dict,
    save_model: bool = True
) -> Optional[pd.DataFrame]:

Performs time series decomposition using Prophet based on the provided configuration.

Steps:

  1. Prepares the input data (dt_transform) by selecting the date (config['date_col']) and target (config['target_col']) columns and renaming them to ‘ds’ and ‘y’ respectively, as required by Prophet.

  2. Processes the holiday data (dt_holidays) using set_holidays.

  3. Initialises a Prophet model instance with seasonality settings (yearly_seasonality, weekly_seasonality, daily_seasonality) and the processed holidays DataFrame, based on values in config['prophet'].

  4. Fits the Prophet model to the prepared ‘ds’ and ‘y’ data.

  5. Generates a forecast (prediction) over the same time period as the input data.

  6. Extracts the decomposed components (‘trend’, ‘yearly’, ‘holidays’, ‘weekly’) from the forecast if they exist and adds them as new columns to the original dt_transform DataFrame.

  7. Sets the date column back as the index of dt_transform.

  8. Optionally saves the fitted Prophet model object to results/prophet_model.pkl using joblib.dump.

Parameters:

  • dt_transform (pd.DataFrame): The input DataFrame containing the time series data. Must include the date and target columns specified in config.

  • dt_holidays (pd.DataFrame): The raw DataFrame containing holiday dates and names.

  • config (dict): The configuration dictionary. Used for date/target column names and Prophet settings under the 'prophet' key.

  • save_model (bool, optional): If True (default), saves the fitted Prophet model to disk.

Returns:

  • Optional[pd.DataFrame]: The modified dt_transform DataFrame with added columns for the decomposed Prophet components (trend, seasonalities, holidays), indexed by date. Returns None if required date or target columns are missing in dt_transform.