Configuration
| Option | Description | Default |
|---|---|---|
| Method | Selection algorithm (see table below) | — |
| Target Column | The column being predicted (required for supervised methods) | — |
| K | Number of top features to keep (supervised methods) | 10 |
| Threshold | Variance or correlation threshold (unsupervised methods) | 0.0 / 0.9 |
| Estimator (RFE) | Model used internally to rank features: Random Forest, Logistic Regression, Linear Regression | Random Forest |
Methods
| Method | Type | How it ranks features |
|---|---|---|
| Variance Threshold | Unsupervised | Drops features whose variance is below the threshold |
| Correlation Threshold | Unsupervised | Drops one of each pair of features correlated above the threshold |
| Select K Best (Chi2) | Supervised | Ranks features by chi-squared statistic (non-negative values only) |
| Select K Best (F-score) | Supervised | Ranks features by ANOVA F-score |
| Select K Best (Mutual Info) | Supervised | Ranks features by mutual information with the target |
| RFE | Supervised | Recursively removes the least important features using an estimator |
| Lasso (L1) | Supervised | Drops features whose Lasso coefficient is zero |
Input / Output
| Type | |
|---|---|
| Input | DataFrame |
| Output | DataFrame (selected features only) |