Abacus Preprocessing: Validation (valid.py)ΒΆ

This module provides decorators and classes designed for validating the structure and content of input data (features X and target y) before model fitting. These validation steps help ensure data integrity and prevent common errors during the modelling process. The classes are likely intended to be used as mixins or components within a larger preprocessing or model class structure.

Validation Decorators

These decorators tag methods to indicate whether they perform validation on the target variable (y) or the feature matrix (X). This tagging mechanism likely facilitates the automatic discovery and execution of validation steps within the Abacus framework.

validation_method_y

def validation_method_y(method):

Decorator to mark a method as a validation step specifically for the target variable (y). Adds _tags['validation_y'] = True to the decorated method.


validation_method_X

def validation_method_X(method):

Decorator to mark a method as a validation step specifically for the feature data (X). Adds _tags['validation_X'] = True to the decorated method.

Validation Classes

These classes encapsulate specific validation logic. They typically expect certain attributes (like channel_columns, date_column, control_columns) to be defined in the class they are mixed into or used by.

ValidateTargetColumn

class ValidateTargetColumn:
    @validation_method_y
    def validate_target(self, data: pd.Series) -> None:
        # ... implementation ...

Contains a method validate_target (tagged for y) that checks if the target data Series is empty.

Raises:

  • ValueError: If the input data Series has zero length.


ValidateDateColumn

class ValidateDateColumn:
    date_column: str
    @validation_method_X
    def validate_date_col(self, data: pd.DataFrame) -> None:
        # ... implementation ...

Contains a method validate_date_col (tagged for X) that validates the date column specified by the self.date_column attribute.

Checks:

  • Ensures the self.date_column exists in the input data DataFrame.

  • Ensures the values in the self.date_column are unique.

Raises:

  • ValueError: If the date column is missing or contains duplicate values.


ValidateChannelColumns

class ValidateChannelColumns:
    channel_columns: Union[List[str], Tuple[str]]
    @validation_method_X
    def validate_channel_columns(self, data: pd.DataFrame) -> None:
        # ... implementation ...

Contains a method validate_channel_columns (tagged for X) that validates the channel columns specified by the self.channel_columns attribute.

Checks:

  • Ensures self.channel_columns is a non-empty list or tuple.

  • Ensures all specified channel_columns exist in the input data DataFrame.

  • Ensures there are no duplicate names within self.channel_columns.

  • Ensures that the data within the specified channel_columns does not contain any negative values.

Raises:

  • ValueError: If any of the validation checks fail.


ValidateControlColumns

class ValidateControlColumns:
    control_columns: Optional[List[str]]
    @validation_method_X
    def validate_control_columns(self, data: pd.DataFrame) -> None:
        # ... implementation ...

Contains a method validate_control_columns (tagged for X) that validates the control columns specified by the self.control_columns attribute, if it is not None.

Checks (only if self.control_columns is not None):

  • Ensures self.control_columns is a non-empty list or tuple.

  • Ensures all specified control_columns exist in the input data DataFrame.

  • Ensures there are no duplicate names within self.control_columns.

Raises:

  • ValueError: If self.control_columns is provided and any of the validation checks fail.