Domain-Specific Examples¶

FeatCopilot's LLM engine can generate domain-aware features. Here are examples for common domains.

Healthcare¶

Diabetes Risk Prediction¶

from featcopilot import AutoFeatureEngineer

engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    llm_config={
        'model': 'gpt-5.2',
        'domain': 'healthcare',
        'max_suggestions': 15
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Patient age in years',
        'bmi': 'Body Mass Index',
        'glucose_fasting': 'Fasting blood glucose mg/dL',
        'hba1c': 'Hemoglobin A1c percentage',
        'blood_pressure': 'Systolic blood pressure mmHg',
        'family_history': 'Family history of diabetes (0/1)',
    },
    task_description="Predict Type 2 diabetes risk within 5 years"
)

Expected LLM-Generated Features¶

bmi_glucose_interaction: BMI × glucose interaction
metabolic_age_score: Combined metabolic risk indicators
prediabetes_indicator: Based on glucose/HbA1c thresholds
cardiovascular_risk: Blood pressure and metabolic markers

Finance¶

Credit Default Prediction¶

engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    llm_config={
        'domain': 'finance',
        'max_suggestions': 15
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={
        'income': 'Annual income in USD',
        'debt': 'Total outstanding debt',
        'credit_score': 'FICO credit score (300-850)',
        'employment_years': 'Years at current employer',
        'loan_amount': 'Requested loan amount',
        'num_accounts': 'Number of credit accounts',
        'late_payments': 'Number of late payments in last 2 years',
    },
    task_description="Predict loan default probability"
)

Expected LLM-Generated Features¶

debt_to_income: Debt-to-income ratio
loan_to_income: Loan amount relative to income
credit_utilization: Estimated credit utilization
payment_reliability: Based on late payment history
employment_stability: Employment tenure score

Retail / E-commerce¶

Customer Churn Prediction¶

engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    llm_config={
        'domain': 'retail',
        'max_suggestions': 15
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={
        'days_since_purchase': 'Days since last purchase',
        'total_orders': 'Total number of orders',
        'total_spend': 'Total amount spent',
        'avg_order_value': 'Average order value',
        'returns_count': 'Number of returns',
        'customer_tenure_days': 'Days since first purchase',
        'email_opens': 'Number of marketing emails opened',
    },
    task_description="Predict customer churn in next 90 days"
)

Expected LLM-Generated Features¶

recency_score: Based on days since purchase
frequency_score: Orders per time period
monetary_score: Spending patterns
rfm_combined: Combined RFM score
engagement_rate: Email open rate
return_rate: Returns as percentage of orders
customer_lifetime_value: Estimated CLV

Telecom¶

Service Churn Prediction¶

engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    llm_config={
        'domain': 'telecom',
        'max_suggestions': 15
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={
        'tenure_months': 'Months as customer',
        'monthly_charges': 'Monthly bill amount',
        'total_charges': 'Total charges to date',
        'contract_type': 'Month-to-month, 1-year, or 2-year',
        'num_services': 'Number of subscribed services',
        'support_tickets': 'Support tickets in last 6 months',
        'data_usage_gb': 'Average monthly data usage',
    },
    task_description="Predict telecom customer churn"
)

Expected LLM-Generated Features¶

charges_per_service: Monthly charges per service
contract_risk: Risk based on contract type
support_intensity: Support tickets relative to tenure
usage_trend: Data usage patterns
customer_value: Revenue per customer metrics

Manufacturing¶

Equipment Failure Prediction¶

engineer = AutoFeatureEngineer(
    engines=['tabular', 'timeseries', 'llm'],
    llm_config={
        'domain': 'manufacturing',
        'max_suggestions': 15
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={
        'temperature': 'Operating temperature (°C)',
        'vibration': 'Vibration level (mm/s)',
        'pressure': 'Operating pressure (PSI)',
        'runtime_hours': 'Total runtime hours',
        'maintenance_days_ago': 'Days since last maintenance',
        'power_consumption': 'Power consumption (kW)',
        'error_count': 'Error events in last 24 hours',
    },
    task_description="Predict equipment failure within 7 days"
)

Expected LLM-Generated Features¶

operating_stress: Combined temp/pressure/vibration
maintenance_overdue: Days past maintenance schedule
efficiency_degradation: Power vs expected consumption
error_rate: Errors per runtime hour
wear_indicator: Based on runtime and maintenance

Custom Domain¶

For domains not in the preset list:

engineer = AutoFeatureEngineer(
    engines=['llm'],
    llm_config={
        'model': 'gpt-5.2',
        'max_suggestions': 20
    }
)

X_fe = engineer.fit_transform(
    X, y,
    column_descriptions={...},
    task_description="""
    [Detailed task description]

    Domain: [Your specific domain]

    Business Context:
    - [Key business objectives]
    - [Important domain knowledge]
    - [Relevant industry standards]

    Prediction Goal:
    - [What exactly to predict]
    - [Time horizon]
    - [Success metrics]
    """
)

Best Practices for Domain Features¶

1. Be Specific in Descriptions¶

# ❌ Vague
'revenue': 'Revenue'

# ✅ Specific
'revenue': 'Monthly recurring revenue in USD, includes all subscription tiers'

2. Include Units¶

column_descriptions = {
    'temperature': 'Operating temperature in Celsius (normal range: 20-80)',
    'pressure': 'System pressure in PSI (max rated: 150)',
}

3. Note Constraints¶

task_description = """
Predict customer churn.

Constraints:
- Features must be explainable to business stakeholders
- Avoid using sensitive demographic data
- Focus on behavioral indicators
"""

4. Iterate and Refine¶

# Generate initial features
X_fe = engineer.fit_transform(X, y, ...)

# Request more specific features
additional = engineer.generate_custom_features(
    prompt="Generate features focusing on customer engagement patterns"
)