Abacus Preprocessing: Scaler (scaler.py)

This module provides a custom scaler implementation, SerializableScaler, designed for use within the Abacus preprocessing pipeline. It offers functionality similar to scikit-learn’s MaxAbsScaler but includes methods for serialization to and deserialization from Python dictionaries, making it easier to save and load fitted scaler parameters. It also defines a custom error for cases where scaling is attempted before fitting.

SerializableScaler Class

A scaler class that scales data by dividing by the maximum absolute value found during fitting. It supports serialization to and deserialization from dictionaries.

class SerializableScaler:
    multiply_by: float | np.ndarray | None = None
    divide_by: float | np.ndarray | None = None
    _is_fitted: bool = False

    def __init__(self, multiply_by=None, divide_by=None, **kwargs):
        # ... implementation ...

    def fit(self, data):
        # ... implementation ...

    def transform(self, data):
        # ... implementation ...

    def inverse_transform(self, data):
        # ... implementation ...

    def to_dict(self) -> dict:
        # ... implementation ...

    @classmethod
    def from_dict(cls, input_dict: dict) -> 'SerializableScaler':
        # ... implementation ...

    # Internal methods: _check_is_fitted, _robust_scaling_divide_operation (static, unused)

Attributes:

  • multiply_by (float | np.ndarray | None): The factor to multiply data by during transformation. Typically 1.0 for MaxAbs scaling logic.

  • divide_by (float | np.ndarray | None): The factor to divide data by during transformation. Calculated as the maximum absolute value(s) during fit. Can be a scalar or an array if fitting multi-dimensional data.

  • _is_fitted (bool): Internal flag indicating whether the scaler has been fitted or initialised with parameters.

Methods:

  • __init__(self, multiply_by=None, divide_by=None, **kwargs): Initialises the scaler. Can optionally accept pre-calculated multiply_by and divide_by factors, in which case the scaler is considered fitted immediately. Defaults to multiply_by=1.0, divide_by=1.0 if no factors are provided.

  • fit(self, data): Calculates the divide_by factor(s) as the maximum absolute value along axis 0 of the input data. Sets multiply_by to 1.0. Handles cases where the maximum absolute value is zero to prevent division by zero in transform. Sets _is_fitted to True. Returns self.

  • transform(self, data): Applies the scaling transformation: (data * self.multiply_by) / self.divide_by. Raises NotFittedScalerError if the scaler hasn’t been fitted.

  • inverse_transform(self, data): Applies the inverse scaling transformation: (data * self.divide_by) / self.multiply_by. Raises NotFittedScalerError if the scaler hasn’t been fitted. Handles potential division by zero if multiply_by is zero.

  • to_dict(self) -> dict: Serializes the scaler’s state (multiply_by, divide_by) into a dictionary. Converts NumPy arrays to lists for JSON compatibility. Raises NotFittedScalerError if called before fitting.

  • from_dict(cls, input_dict: dict) -> 'SerializableScaler': A class method to create a new SerializableScaler instance from a dictionary previously generated by to_dict.


NotFittedScalerError Exception

class NotFittedScalerError(Exception):
    """Custom exception for scaler not being fitted."""
    pass

A custom exception class raised by SerializableScaler methods (transform, inverse_transform, to_dict) when they are called before the scaler instance has been fitted (i.e., the fit method hasn’t been called or parameters weren’t provided during initialisation).