LLM-Powered Features¶
FeatCopilot's LLM integration via LiteLLM enables semantic understanding and intelligent feature generation with support for 100+ LLM providers including GitHub Models and GitHub Copilot.
Overview¶
The LLM engine provides:
- Semantic Feature Discovery: Understands column meanings from names/descriptions
- Domain-Aware Generation: Tailored features for specific domains
- Feature Explanations: Human-readable descriptions for stakeholders
- Code Generation: Automatic Python code for custom features
- Iterative Refinement: Improve features based on feedback
LLM Backend Options¶
FeatCopilot supports two LLM backends:
- GitHub Copilot SDK (default): Native integration with GitHub Copilot
- LiteLLM: Universal interface supporting 100+ providers including:
- OpenAI (GPT-4o, GPT-4, GPT-3.5)
- Anthropic (Claude 3 Opus, Sonnet, Haiku)
- Azure OpenAI
- Google (Gemini Pro, Gemini Ultra)
- GitHub Models (via
github/prefix) - Llama, Phi, Mixtral, etc. - GitHub Copilot (via
github_copilot/prefix) - GPT-4, GPT-5.1-codex - AWS Bedrock
- Ollama (local models)
- And many more...
SemanticEngine¶
The core LLM-powered engine:
from featcopilot.llm import SemanticEngine
# Using GitHub Copilot SDK (default)
engine = SemanticEngine(
model='gpt-5.2',
max_suggestions=20,
validate_features=True,
domain='healthcare',
verbose=True
)
# Using LiteLLM backend
engine = SemanticEngine(
model='gpt-4o',
backend='litellm',
max_suggestions=20,
domain='healthcare',
verbose=True
)
X_features = engine.fit_transform(
X, y,
column_descriptions={
'age': 'Patient age in years',
'bmi': 'Body Mass Index',
'glucose': 'Fasting blood glucose mg/dL',
'hba1c': 'Hemoglobin A1c percentage'
},
task_description='Predict Type 2 diabetes diagnosis'
)
GitHub Models via LiteLLM¶
Access models from GitHub Marketplace Models using the github/ prefix:
import os
os.environ['GITHUB_API_KEY'] = 'your-github-token'
from featcopilot import AutoFeatureEngineer
# GitHub Marketplace Models via LiteLLM
engineer = AutoFeatureEngineer(
engines=['tabular', 'llm'],
llm_config={
'model': 'github/gpt-4o', # GitHub Marketplace GPT-4o
'backend': 'litellm',
'max_suggestions': 15,
'domain': 'healthcare',
}
)
Available GitHub Marketplace Models¶
| Model | Usage |
|---|---|
| GPT-4o | github/gpt-4o |
| GPT-4o Mini | github/gpt-4o-mini |
| Llama 3.2 11B Vision | github/Llama-3.2-11B-Vision-Instruct |
| Llama 3.1 70B | github/Llama-3.1-70b-Versatile |
| Phi-4 | github/Phi-4 |
| Mixtral 8x7B | github/Mixtral-8x7b-32768 |
Note: All GitHub Marketplace models are supported. Just use
github/<model-name>prefix.
Setting Up GitHub API Key¶
GitHub Copilot via LiteLLM¶
Access GitHub Copilot Chat API using the github_copilot/ prefix. This uses OAuth device flow authentication:
from featcopilot import AutoFeatureEngineer
# GitHub Copilot Chat API via LiteLLM
engineer = AutoFeatureEngineer(
engines=['tabular', 'llm'],
llm_config={
'model': 'github_copilot/gpt-4', # GitHub Copilot's GPT-4
'backend': 'litellm',
'max_suggestions': 15,
'domain': 'healthcare',
}
)
Available GitHub Copilot Models¶
| Model | Usage |
|---|---|
| GPT-4 | github_copilot/gpt-4 |
| GPT-5.1 Codex | github_copilot/gpt-5.1-codex |
| Text Embedding | github_copilot/text-embedding-3-small |
Authentication¶
GitHub Copilot uses OAuth device flow: 1. On first use, LiteLLM displays a device code and verification URL 2. Visit the URL and enter the code to authenticate 3. Credentials are stored locally for future use
Note: Requires a paid GitHub Copilot subscription.
Configuration¶
LLM Config Options¶
llm_config = {
'model': 'gpt-5.2', # Model: gpt-5.2, github/gpt-4o, github_copilot/gpt-4, etc.
'backend': 'copilot', # Backend: 'copilot' or 'litellm'
'max_suggestions': 20, # Max features to suggest
'domain': 'healthcare', # Domain context
'validate_features': True, # Validate generated code
'temperature': 0.3, # Generation temperature
'api_key': None, # API key (for litellm backend)
'api_base': None, # Custom API base URL (for litellm)
}
engineer = AutoFeatureEngineer(
engines=['llm'],
llm_config=llm_config
)
Listing Available Models¶
Use LiteLLM to discover supported models:
from featcopilot.utils import list_models, get_model_info, get_model_names
# List all available models
models = list_models()
for m in models:
model_id = m.get('id') or m.get('name')
print(f"{model_id}: {m.get('description', '')}")
# List with verbose output
list_models(verbose=True)
# Get just the model names/identifiers
model_names = get_model_names()
print(model_names) # ['gpt-4', 'claude-3-sonnet', ...]
# Filter by provider
openai_models = list_models(provider='OpenAI')
anthropic_models = list_models(provider='Anthropic')
# Get info about a specific model
info = get_model_info('gpt-4')
if info:
print(info)
# Check if a model is valid
from featcopilot.utils import is_valid_model
if is_valid_model('gpt-4'):
print("Model is available")
Supported Models¶
Common models supported through LiteLLM:
| Model | Provider | Description |
|---|---|---|
gpt-4o |
OpenAI | GPT-4o (recommended) |
gpt-4-turbo |
OpenAI | Faster GPT-4 variant |
gpt-3.5-turbo |
OpenAI | Fast and efficient |
azure/gpt-4 |
Azure OpenAI | Azure-hosted GPT-4 |
github/gpt-4o |
GitHub Models | GPT-4o via GitHub Marketplace |
github/Llama-3.2-11B-Vision-Instruct |
GitHub Models | Llama 3.2 via GitHub |
github/Phi-4 |
GitHub Models | Microsoft Phi-4 via GitHub |
github_copilot/gpt-4 |
GitHub Copilot | GPT-4 via Copilot Chat API |
github_copilot/gpt-5.1-codex |
GitHub Copilot | GPT-5.1 Codex via Copilot |
claude-3-opus |
Anthropic | Premium quality |
claude-3-sonnet |
Anthropic | Balanced performance |
claude-3-haiku |
Anthropic | Fast and efficient |
gemini-pro |
Google's Gemini Pro | |
ollama/llama2 |
Ollama (local) | Local LLaMA 2 |
Note: See LiteLLM docs for the full list of supported models and providers.
Choosing a Model¶
from featcopilot.utils import get_default_model, get_model_names
# Use the default model (gpt-5.2)
llm_config = {'model': get_default_model()}
# Or choose from available models
available = get_model_names()
print(f"Available models: {available}")
# Pick based on your needs:
# - gpt-5.2: Best all-around choice (default)
# - gpt-5.1-codex: Best for code generation tasks
# - claude-sonnet-4: Alternative with strong reasoning
# - gpt-4.1 / claude-haiku-4.5: Faster, lower cost
# - claude-opus-4.5: Premium quality (slower)
Providing Context¶
Column Descriptions¶
Help the LLM understand your data:
column_descriptions = {
'age': 'Customer age in years (18-100)',
'income': 'Annual household income in USD',
'tenure_months': 'Number of months as customer',
'monthly_charges': 'Monthly subscription fee',
'total_charges': 'Cumulative charges to date',
'contract_type': 'Month-to-month, One year, or Two year',
'support_tickets': 'Number of support tickets filed'
}
Task Description¶
Describe what you're trying to predict:
task_description = """
Predict customer churn for a telecommunications company.
Churn is defined as canceling service within the next 30 days.
Key business goals: reduce churn rate, identify at-risk customers early.
"""
Domain Context¶
Specify the domain for relevant features:
# Supported domains
domains = [
'healthcare', # Medical features, risk scores
'finance', # Financial ratios, risk metrics
'retail', # RFM, customer lifetime value
'telecom', # Usage patterns, churn indicators
'manufacturing', # Quality metrics, efficiency
]
engineer = AutoFeatureEngineer(
engines=['llm'],
llm_config={'domain': 'finance'}
)
Feature Explanations¶
Get human-readable explanations:
# After fit_transform
explanations = engineer.explain_features()
for feature, explanation in explanations.items():
print(f"📊 {feature}")
print(f" {explanation}\n")
Example Output:
📊 age_bmi_ratio
Ratio of age to BMI, may indicate metabolic age vs chronological age
📊 glucose_hba1c_interaction
Interaction between fasting glucose and HbA1c captures glucose control
📊 cholesterol_risk_score
Composite score from total, HDL, and LDL cholesterol levels
Generated Code¶
Access the Python code for features:
feature_code = engineer.get_feature_code()
for name, code in feature_code.items():
print(f"# {name}")
print(code)
print()
Example Output:
# age_bmi_ratio
result = df['age'] / (df['bmi'] + 1e-8)
# glucose_hba1c_interaction
result = df['glucose'] * df['hba1c']
# cholesterol_risk_score
result = (df['cholesterol_total'] - df['cholesterol_hdl']) / (df['cholesterol_ldl'] + 1)
Custom Feature Generation¶
Request specific features:
# Generate features for a specific focus area
custom_features = engineer.generate_custom_features(
prompt="Create risk stratification features for cardiac patients",
n_features=5
)
for feature in custom_features:
print(f"Name: {feature['name']}")
print(f"Code: {feature['code']}")
print(f"Explanation: {feature['explanation']}\n")
FeatureCodeGenerator¶
Generate features from natural language:
from featcopilot.llm import FeatureCodeGenerator
generator = FeatureCodeGenerator(model='gpt-5.2')
# Generate single feature
feature = generator.generate(
description="Calculate BMI from height in meters and weight in kg",
columns={'height_m': 'float', 'weight_kg': 'float'}
)
print(feature.code)
# result = df['weight_kg'] / (df['height_m'] ** 2)
# Generate domain-specific features
healthcare_features = generator.generate_domain_features(
domain='healthcare',
columns={'age': 'int', 'bmi': 'float', 'glucose': 'float'},
n_features=5
)
FeatureExplainer¶
Generate reports:
from featcopilot.llm import FeatureExplainer
explainer = FeatureExplainer(model='gpt-5.2')
# Generate comprehensive report
report = explainer.generate_feature_report(
features=engineer._feature_set,
X=X_transformed,
column_descriptions=column_descriptions,
task_description=task_description
)
# Save as markdown
with open('feature_report.md', 'w') as f:
f.write(report)
Best Practices¶
1. Provide Rich Context¶
# ❌ Minimal context
engineer.fit_transform(X, y)
# ✅ Rich context
engineer.fit_transform(
X, y,
column_descriptions={...},
task_description="...",
)
2. Validate Generated Features¶
3. Review Generated Code¶
# Always review LLM-generated code before production
for name, code in engineer.get_feature_code().items():
print(f"Review: {name}")
print(code)
4. Iterate and Refine¶
# Generate more features in specific areas
more_features = engineer.generate_custom_features(
prompt="Focus on interaction effects between age and metabolic markers"
)
Fallback Behavior¶
When LLM is unavailable: