Time-Aware Tabular Prototype¶

A practical prototype for leakage-safe auto feature engineering on time-aware tabular data.

Why this example matters¶

Most real feature engineering failures are not caused by weak transformations. They come from:

random train/test splits on temporal data
future information leaking into features
offline features that cannot be reproduced later

This example shows a safer baseline:

sort by time
split by time
fit features on the training slice only
transform the holdout slice separately
compare against a plain model baseline

Script¶

See:

examples/time_aware_tabular_prototype.py

Core pattern¶

engineer = AutoFeatureEngineer(
    engines=["tabular"],
    max_features=30,
    selection_methods=["mutual_info", "importance"],
    correlation_threshold=0.9,
    leakage_guard="warn",
)

X_train_fe = engineer.fit_transform(
    X_train,
    y_train,
    target_name="churned",
    apply_selection=True,
)
X_test_fe = engineer.transform(X_test)

Leakage guard¶

AutoFeatureEngineer now supports a lightweight leakage_guard option:

"warn" — default, warns if suspicious columns are present
"raise" — fail fast when likely leakage columns are detected
"off" — disable the check

This is intentionally conservative. It does not prove your pipeline is safe. It just catches obvious foot-guns such as columns named like:

target
label
outcome
future_*

Recommendation¶

For a real project, start with this workflow before trying more advanced LLM or agent-based feature generation. If the time-aware baseline is not trustworthy, more automation only makes the mistake faster.