Abacus Preprocessing: Scaler (scaler.py)¶
This module provides a custom scaler implementation, SerializableScaler, designed for use within the Abacus preprocessing pipeline. It offers functionality similar to scikit-learn’s MaxAbsScaler but includes methods for serialization to and deserialization from Python dictionaries, making it easier to save and load fitted scaler parameters. It also defines a custom error for cases where scaling is attempted before fitting.
SerializableScaler Class
A scaler class that scales data by dividing by the maximum absolute value found during fitting. It supports serialization to and deserialization from dictionaries.
class SerializableScaler:
multiply_by: float | np.ndarray | None = None
divide_by: float | np.ndarray | None = None
_is_fitted: bool = False
def __init__(self, multiply_by=None, divide_by=None, **kwargs):
# ... implementation ...
def fit(self, data):
# ... implementation ...
def transform(self, data):
# ... implementation ...
def inverse_transform(self, data):
# ... implementation ...
def to_dict(self) -> dict:
# ... implementation ...
@classmethod
def from_dict(cls, input_dict: dict) -> 'SerializableScaler':
# ... implementation ...
# Internal methods: _check_is_fitted, _robust_scaling_divide_operation (static, unused)
Attributes:
multiply_by(float | np.ndarray | None): The factor to multiply data by during transformation. Typically1.0for MaxAbs scaling logic.divide_by(float | np.ndarray | None): The factor to divide data by during transformation. Calculated as the maximum absolute value(s) duringfit. Can be a scalar or an array if fitting multi-dimensional data._is_fitted(bool): Internal flag indicating whether the scaler has been fitted or initialised with parameters.
Methods:
__init__(self, multiply_by=None, divide_by=None, **kwargs): Initialises the scaler. Can optionally accept pre-calculatedmultiply_byanddivide_byfactors, in which case the scaler is considered fitted immediately. Defaults tomultiply_by=1.0,divide_by=1.0if no factors are provided.fit(self, data): Calculates thedivide_byfactor(s) as the maximum absolute value along axis 0 of the inputdata. Setsmultiply_byto1.0. Handles cases where the maximum absolute value is zero to prevent division by zero intransform. Sets_is_fittedtoTrue. Returnsself.transform(self, data): Applies the scaling transformation:(data * self.multiply_by) / self.divide_by. RaisesNotFittedScalerErrorif the scaler hasn’t been fitted.inverse_transform(self, data): Applies the inverse scaling transformation:(data * self.divide_by) / self.multiply_by. RaisesNotFittedScalerErrorif the scaler hasn’t been fitted. Handles potential division by zero ifmultiply_byis zero.to_dict(self) -> dict: Serializes the scaler’s state (multiply_by,divide_by) into a dictionary. Converts NumPy arrays to lists for JSON compatibility. RaisesNotFittedScalerErrorif called before fitting.from_dict(cls, input_dict: dict) -> 'SerializableScaler': A class method to create a newSerializableScalerinstance from a dictionary previously generated byto_dict.
NotFittedScalerError Exception
class NotFittedScalerError(Exception):
"""Custom exception for scaler not being fitted."""
pass
A custom exception class raised by SerializableScaler methods (transform, inverse_transform, to_dict) when they are called before the scaler instance has been fitted (i.e., the fit method hasn’t been called or parameters weren’t provided during initialisation).