pysatl_tsp.implementations.processor.time_series_cross_validator
Module Contents
Classes
A handler that implements expanding window cross-validation for time series data. |
API
- class pysatl_tsp.implementations.processor.time_series_cross_validator.TimeSeriesCrossValidator(min_train_size: int, val_size: int, source: Optional[pysatl_tsp.core.Handler[Any, pysatl_tsp.core.T]] = None)[source]
Bases:
pysatl_tsp.core.Handler[pysatl_tsp.core.T,tuple[pysatl_tsp.core.scrubber.ScrubberWindow[pysatl_tsp.core.T],pysatl_tsp.core.scrubber.ScrubberWindow[pysatl_tsp.core.T]]]A handler that implements expanding window cross-validation for time series data.
This handler produces a sequence of train-validation splits suitable for time series validation, where each split preserves the temporal order of data. It implements an expanding window approach, where the training set grows over time while the validation set has a fixed size and slides forward.
The handler ensures that: 1. The training set always has at least min_train_size points 2. The validation set always has exactly val_size points 3. The validation set always follows the training set temporally 4. Each new split adds val_size points to the training set
This approach respects the temporal nature of time series data and prevents data leakage from future to past.
- Parameters:
min_train_size – Minimum number of points in the initial training set
val_size – Number of points in each validation set
source – The handler providing input data, defaults to None
- Example:
import numpy as np import matplotlib.pyplot as plt # Generate a synthetic time series np.random.seed(42) ts = np.cumsum(np.random.normal(0, 1, 100)) # Random walk data_source = SimpleDataProvider(ts) # Create a cross-validator with min_train_size=50 and val_size=10 cv = TimeSeriesCrossValidator(min_train_size=50, val_size=10, source=data_source) # Visualize the different train-validation splits plt.figure(figsize=(14, 8)) x = np.arange(len(ts)) plt.plot(x, ts, "k-", alpha=0.3, label="Full time series") for i, (train, val) in enumerate(cv): train_indices = list(train.indices) val_indices = list(val.indices) # Plot each split plt.plot(train_indices, [ts[i] for i in train_indices], "b-", linewidth=2, alpha=0.7 - i * 0.1) plt.plot(val_indices, [ts[i] for i in val_indices], "r-", linewidth=2, alpha=0.7 - i * 0.1) # Add markers at the split point split_idx = train_indices[-1] plt.axvline(x=split_idx, color="g", linestyle="--", alpha=0.5) # Print information about this split print(f"Split {i + 1}:") print(f" Train: {len(train)} points (indices {train_indices[0]}..{train_indices[-1]})") print(f" Validation: {len(val)} points (indices {val_indices[0]}..{val_indices[-1]})") plt.title("Time Series Cross-Validation: Expanding Window Approach") plt.xlabel("Time") plt.ylabel("Value") # Add custom legend from matplotlib.lines import Line2D custom_lines = [ Line2D([0], [0], color="k", alpha=0.3), Line2D([0], [0], color="b", linewidth=2), Line2D([0], [0], color="r", linewidth=2), Line2D([0], [0], color="g", linestyle="--"), ] plt.legend( custom_lines, ["Full time series", "Training sets", "Validation sets", "Split points"], loc="upper left" ) plt.grid(True, alpha=0.3) plt.show() # Example model evaluation with each split from sklearn.linear_model import LinearRegression for i, (train, val) in enumerate(cv): # Prepare data train_indices = list(train.indices) train_X = np.array(train_indices).reshape(-1, 1) train_y = np.array(list(train.values)) val_indices = list(val.indices) val_X = np.array(val_indices).reshape(-1, 1) val_y = np.array(list(val.values)) # Train a simple model model = LinearRegression() model.fit(train_X, train_y) # Evaluate on validation set val_pred = model.predict(val_X) mse = np.mean((val_pred - val_y) ** 2) print(f"Split {i + 1} - Validation MSE: {mse:.4f}")
Initialization
Initialize a time series cross-validator.
- Parameters:
min_train_size – Minimum number of points in the initial training set
val_size – Number of points in each validation set
source – The handler providing input data, defaults to None
- __iter__() collections.abc.Iterator[tuple[pysatl_tsp.core.scrubber.ScrubberWindow[pysatl_tsp.core.T], pysatl_tsp.core.scrubber.ScrubberWindow[pysatl_tsp.core.T]]][source]
Create an iterator that yields train-validation splits for time series cross-validation.
This method creates splits where: 1. The first split has exactly min_train_size points for training 2. Each subsequent split adds val_size points to the training set 3. Each validation set has exactly val_size points and follows the training set
- Returns:
Iterator yielding tuples of (training_window, validation_window)
- Raises:
ValueError – If no source has been set