FeatCopilot¶
🚀 Next-Generation LLM-Powered Auto Feature Engineering Framework
Automatically generate, select, and explain predictive features using semantic understanding of your data
Benchmark Highlights¶
-
+197% Max Improvement
Simple models benchmark on delays_zurich dataset
-
+420% with LLM Engine
LLM-enhanced feature generation boosts results
-
48% Datasets Improved
Tabular engine improves 20/42 datasets tested
-
+8.55% AutoML Best
FLAML benchmark improvement on abalone dataset
Two Modes of Operation¶
Sub-second feature engineering using rule-based transformations:
from featcopilot import AutoFeatureEngineer
# Fast, deterministic feature engineering
engineer = AutoFeatureEngineer(
engines=['tabular'],
max_features=50
)
X_transformed = engineer.fit_transform(X, y) # <1 second
Best for: Production pipelines, real-time inference, reproducible results
Domain-aware semantic feature generation with any LLM provider:
from featcopilot import AutoFeatureEngineer
# LLM-powered semantic features
engineer = AutoFeatureEngineer(
engines=['tabular', 'llm'],
max_features=50
)
X_transformed = engineer.fit_transform(
X, y,
column_descriptions={'age': 'Patient age in years'},
task_description='Predict heart disease risk'
) # 30-60 seconds
Best for: Exploratory analysis, domain-specific features, maximum accuracy
What is FeatCopilot?¶
FeatCopilot is a Python library for automated feature engineering powered by large language models. It analyzes column meanings and descriptions to generate domain-aware features, applies intelligent selection to keep only the most predictive ones, and provides human-readable explanations for every feature it creates.
-
Multi-Engine Architecture
Tabular, time series, relational, and text feature engines in one unified API
-
LLM-Powered Intelligence
Semantic feature discovery, domain-aware generation, and automatic code synthesis
-
Intelligent Selection
Statistical testing, importance ranking, and redundancy elimination
-
Sklearn Compatible
Drop-in replacement for scikit-learn transformers in your ML pipelines
Why FeatCopilot?¶
| Feature | FeatCopilot | Featuretools | TSFresh | AutoFeat | OpenFE | CAAFE |
|---|---|---|---|---|---|---|
| Tabular Features | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Time Series | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Relational | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| LLM-Powered | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Semantic Understanding | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Code Generation | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Sklearn Compatible | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Interpretable | ✅ | ⚠️ | ⚠️ | ⚠️ | ❌ | ✅ |
Installation¶
# Basic installation
pip install featcopilot
# With LLM capabilities
pip install featcopilot[llm]
# Full installation with all extras
pip install featcopilot[full]
Getting Started¶
-
Installation
Install FeatCopilot and set up your environment
-
Quick Start
Get up and running in 5 minutes
-
Authentication
Set up LLM providers for AI features
-
Benchmarks
See performance improvements across datasets
License¶
FeatCopilot is released under the MIT License.