Abacus Preprocessing: Validation (valid.py)ΒΆ
This module provides decorators and classes designed for validating the structure and content of input data (features X and target y) before model fitting. These validation steps help ensure data integrity and prevent common errors during the modelling process. The classes are likely intended to be used as mixins or components within a larger preprocessing or model class structure.
Validation Decorators
These decorators tag methods to indicate whether they perform validation on the target variable (y) or the feature matrix (X). This tagging mechanism likely facilitates the automatic discovery and execution of validation steps within the Abacus framework.
validation_method_y
def validation_method_y(method):
Decorator to mark a method as a validation step specifically for the target variable (y). Adds _tags['validation_y'] = True to the decorated method.
validation_method_X
def validation_method_X(method):
Decorator to mark a method as a validation step specifically for the feature data (X). Adds _tags['validation_X'] = True to the decorated method.
Validation Classes
These classes encapsulate specific validation logic. They typically expect certain attributes (like channel_columns, date_column, control_columns) to be defined in the class they are mixed into or used by.
ValidateTargetColumn
class ValidateTargetColumn:
@validation_method_y
def validate_target(self, data: pd.Series) -> None:
# ... implementation ...
Contains a method validate_target (tagged for y) that checks if the target data Series is empty.
Raises:
ValueError: If the inputdataSeries has zero length.
ValidateDateColumn
class ValidateDateColumn:
date_column: str
@validation_method_X
def validate_date_col(self, data: pd.DataFrame) -> None:
# ... implementation ...
Contains a method validate_date_col (tagged for X) that validates the date column specified by the self.date_column attribute.
Checks:
Ensures the
self.date_columnexists in the inputdataDataFrame.Ensures the values in the
self.date_columnare unique.
Raises:
ValueError: If the date column is missing or contains duplicate values.
ValidateChannelColumns
class ValidateChannelColumns:
channel_columns: Union[List[str], Tuple[str]]
@validation_method_X
def validate_channel_columns(self, data: pd.DataFrame) -> None:
# ... implementation ...
Contains a method validate_channel_columns (tagged for X) that validates the channel columns specified by the self.channel_columns attribute.
Checks:
Ensures
self.channel_columnsis a non-empty list or tuple.Ensures all specified
channel_columnsexist in the inputdataDataFrame.Ensures there are no duplicate names within
self.channel_columns.Ensures that the data within the specified
channel_columnsdoes not contain any negative values.
Raises:
ValueError: If any of the validation checks fail.
ValidateControlColumns
class ValidateControlColumns:
control_columns: Optional[List[str]]
@validation_method_X
def validate_control_columns(self, data: pd.DataFrame) -> None:
# ... implementation ...
Contains a method validate_control_columns (tagged for X) that validates the control columns specified by the self.control_columns attribute, if it is not None.
Checks (only if self.control_columns is not None):
Ensures
self.control_columnsis a non-empty list or tuple.Ensures all specified
control_columnsexist in the inputdataDataFrame.Ensures there are no duplicate names within
self.control_columns.
Raises:
ValueError: Ifself.control_columnsis provided and any of the validation checks fail.