Abacus Preprocessing: Seasonality (seas.py)¶
This module provides functions for handling seasonality and holidays in the input data, primarily by leveraging the capabilities of the Prophet library (from Meta). It allows for decomposing the time series into trend, seasonality (yearly, weekly, daily), and holiday components, which can then be used as features in the main MMM.
Functions
update_config_with_prophet_components
def update_config_with_prophet_components(config: dict) -> dict:
Updates the main configuration dictionary (config) to include Prophet component names in the extra_features_cols list based on the settings within config['prophet'].
If the config dictionary contains a 'prophet' key, this function checks which Prophet components (e.g., daily_seasonality, weekly_seasonality, yearly_seasonality, include_holidays, trend) are enabled (set to True). For each enabled component, it adds the corresponding feature name (‘daily’, ‘weekly’, ‘yearly’, ‘holidays’, ‘trend’) to the config['extra_features_cols'] list, ensuring uniqueness. This automatically incorporates the decomposed Prophet components as regressors in the subsequent MMM.
Parameters:
config(dict): The configuration dictionary loaded from the YAML file. Modified in-place.
Returns:
dict: The modified configuration dictionary.
set_holidays
def set_holidays(
dt_transform: pd.DataFrame, dt_holidays: pd.DataFrame, config: dict
) -> pd.DataFrame:
Processes a raw holidays DataFrame (dt_holidays) to make it suitable for Prophet, based on the time granularity specified in the config.
It filters the holidays by the country specified in config['prophet']['holiday_country']. For weekly or monthly granularity (config['raw_data_granularity']), it aggregates holiday names occurring within the same week or month start date, respectively, into a single string (separated by ‘#’). For daily data, it uses the holidays as is. The date column (‘ds’) is adjusted to represent the start of the week or month for aggregated granularities.
Parameters:
dt_transform(pd.DataFrame): The main time series data DataFrame (used to infer date range, although not explicitly used in current implementation logic beyond potential future use).dt_holidays(pd.DataFrame): DataFrame containing holiday information. Expected columns: ‘ds’ (date), ‘country’, ‘holiday’ (name).config(dict): The configuration dictionary. Used to getraw_data_granularityandprophet['holiday_country'].
Returns:
pd.DataFrame: A DataFrame formatted for Prophet’sholidaysparameter, with columns ‘ds’ and ‘holiday’.
prophet_decomp
def prophet_decomp(
dt_transform: pd.DataFrame,
dt_holidays: pd.DataFrame,
config: dict,
save_model: bool = True
) -> Optional[pd.DataFrame]:
Performs time series decomposition using Prophet based on the provided configuration.
Steps:
Prepares the input data (
dt_transform) by selecting the date (config['date_col']) and target (config['target_col']) columns and renaming them to ‘ds’ and ‘y’ respectively, as required by Prophet.Processes the holiday data (
dt_holidays) usingset_holidays.Initialises a
Prophetmodel instance with seasonality settings (yearly_seasonality,weekly_seasonality,daily_seasonality) and the processedholidaysDataFrame, based on values inconfig['prophet'].Fits the Prophet model to the prepared ‘ds’ and ‘y’ data.
Generates a forecast (prediction) over the same time period as the input data.
Extracts the decomposed components (‘trend’, ‘yearly’, ‘holidays’, ‘weekly’) from the forecast if they exist and adds them as new columns to the original
dt_transformDataFrame.Sets the date column back as the index of
dt_transform.Optionally saves the fitted Prophet model object to
results/prophet_model.pklusingjoblib.dump.
Parameters:
dt_transform(pd.DataFrame): The input DataFrame containing the time series data. Must include the date and target columns specified inconfig.dt_holidays(pd.DataFrame): The raw DataFrame containing holiday dates and names.config(dict): The configuration dictionary. Used for date/target column names and Prophet settings under the'prophet'key.save_model(bool, optional): IfTrue(default), saves the fitted Prophet model to disk.
Returns:
Optional[pd.DataFrame]: The modifieddt_transformDataFrame with added columns for the decomposed Prophet components (trend, seasonalities, holidays), indexed by date. ReturnsNoneif required date or target columns are missing indt_transform.