Feature Engineering Engines¶

FeatCopilot provides multiple specialized engines for different types of data and feature generation strategies.

TabularEngine¶

Generates features from numeric tabular data through mathematical transformations.

Features Generated¶

Polynomial features: x², x³, etc.
Interaction features: x₁ × x₂
Mathematical transforms: log, sqrt, exp, sin, cos
Ratio features: x₁ / x₂
Difference features: x₁ - x₂

Usage¶

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,      # Max polynomial degree
    interaction_only=False,   # Include powers, not just interactions
    include_transforms=['log', 'sqrt', 'square'],
    max_features=50,
    verbose=True
)

X_transformed = engine.fit_transform(X, y)

Configuration Options¶

Parameter	Type	Default	Description
`polynomial_degree`	int	2	Maximum polynomial degree (1-4)
`interaction_only`	bool	False	Only interactions, no powers
`include_transforms`	list	['log', 'sqrt', 'square']	Math transforms to apply
`max_features`	int	None	Maximum features to generate
`min_unique_values`	int	5	Minimum unique values for continuous

Available Transforms¶

TRANSFORMS = [
    'log',      # log(1 + |x|)
    'log10',    # log10(|x| + 1)
    'sqrt',     # sqrt(|x|)
    'square',   # x²
    'cube',     # x³
    'reciprocal', # 1/x
    'exp',      # e^x (clipped)
    'tanh',     # tanh(x)
    'sin',      # sin(x)
    'cos',      # cos(x)
]

TimeSeriesEngine¶

Extracts statistical and frequency-domain features from time series data.

Features Generated¶

Basic statistics: mean, std, min, max, median
Distribution: skewness, kurtosis, quantiles
Autocorrelation: lag correlations
Trends: slope, change rate
Frequency: FFT coefficients
Peaks: count, locations

Usage¶

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['basic_stats', 'distribution', 'autocorrelation', 'trends'],
    window_sizes=[5, 10, 20],
    n_fft_coefficients=10,
    verbose=True
)

X_features = engine.fit_transform(time_series_df)

Configuration Options¶

Parameter	Type	Default	Description
`features`	list	['basic_stats', 'distribution', 'autocorrelation']	Feature groups
`window_sizes`	list	[5, 10, 20]	Rolling window sizes
`n_fft_coefficients`	int	10	FFT coefficients to extract
`n_autocorr_lags`	int	10	Autocorrelation lags

Feature Groups¶

Group	Features
`basic_stats`	mean, std, min, max, range, median, sum, var, cv
`distribution`	skewness, kurtosis, q10, q25, q75, q90, iqr
`autocorrelation`	autocorr_lag1, autocorr_lag2, ...
`peaks`	n_peaks, n_troughs, peak_mean, trough_mean
`trends`	slope, intercept, change, mean_abs_change
`rolling`	rolling window statistics
`fft`	fft_coeff_1, fft_coeff_2, ..., spectral_energy

RelationalEngine¶

Generates aggregation features from related tables, inspired by Featuretools.

Features Generated¶

Aggregations: mean, sum, count, min, max per group
Self-aggregations: statistics by categorical columns

Usage¶

from featcopilot.engines import RelationalEngine

engine = RelationalEngine(
    aggregation_functions=['mean', 'sum', 'count', 'max', 'min'],
    max_depth=2
)

# Define relationships
engine.add_relationship(
    child_table='orders',
    parent_table='customers',
    key_column='customer_id'
)

# Transform with related tables
X_features = engine.fit_transform(
    orders_df,
    related_tables={'customers': customers_df}
)

Configuration Options¶

Parameter	Type	Default	Description
`aggregation_functions`	list	['mean', 'sum', 'count', 'max', 'min']	Aggregations
`max_depth`	int	2	Depth for feature synthesis

Aggregation Functions¶

AGGREGATIONS = [
    'mean', 'sum', 'min', 'max', 'count',
    'std', 'median', 'first', 'last', 'nunique'
]

TextEngine¶

Extracts features from text columns.

Features Generated¶

Length features: character count, word count
Character statistics: uppercase ratio, digit ratio
Word statistics: average word length, unique word ratio
TF-IDF: reduced dimensionality text embeddings

Usage¶

from featcopilot.engines import TextEngine

engine = TextEngine(
    features=['length', 'word_count', 'char_stats', 'tfidf'],
    max_vocab_size=5000,
    n_components=50
)

X_features = engine.fit_transform(
    text_df,
    text_columns=['description', 'title']
)

Configuration Options¶

Parameter	Type	Default	Description
`features`	list	['length', 'word_count', 'char_stats']	Feature types
`max_vocab_size`	int	5000	TF-IDF vocabulary size
`n_components`	int	50	SVD components for TF-IDF

Combining Engines¶

Use multiple engines together:

from featcopilot import AutoFeatureEngineer

engineer = AutoFeatureEngineer(
    engines=['tabular', 'timeseries', 'text'],
    max_features=100
)

# All engines run and features are combined
X_transformed = engineer.fit_transform(X, y)

Features from all engines are:

Generated independently
Combined into a single DataFrame
Selected based on importance
Deduplicated by correlation