PySATL-TSP

PySATL Time Series Processing subproject (abbreviated pysatl-tsp) is a module designed for adaptive processing of time series data with a focus on streaming architecture. It implements a chain of responsibility pattern that enables building complex data processing pipelines with minimal boilerplate code, making it suitable for real-time applications and large dataset analysis.

Requirements

Python 3.10+
Poetry 1.8.0+

Installation

Clone the repository:

git clone https://github.com/PySATL/pysatl-tsp

Install dependencies:

poetry install

Basic Pipeline Example:

from pysatl_tsp.core.data_providers import SimpleDataProvider
from pysatl_tsp.core.processor import MappingHandler
from pysatl_tsp.core.scrubber import LinearScrubber

# Create a data source
data = [i for i in range(100)]
provider = SimpleDataProvider(data)

# Define a simple processing pipeline:
# 1. Create windows of 10 elements with 50% overlap
# 2. Calculate the average of each window
pipeline = (
    provider
    | LinearScrubber(window_length=10, shift_factor=0.5)
    | MappingHandler(map_func=lambda window: sum(window.values) / len(window))
)

# Process the data
results = []
for avg in pipeline:
    results.append(avg)

print(f"Number of windows: {len(results)}")
print(f"First 3 window averages: {results[:3]}")

# Visualize results
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(results)
plt.title('Window Averages')
plt.xlabel('Window Index')
plt.ylabel('Average Value')
plt.grid(True)
plt.show()

Development

Install requirements

poetry install --with dev

Pre-commit

Install pre-commit hooks:

poetry run pre-commit install

Starting manually:

poetry run pre-commit run --all-files --color always --verbose --show-diff-on-failure