> ## Documentation Index
> Fetch the complete documentation index at: https://docs.clickml.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Exploratory data analysis (EDA) component

> Use the EDA component to profile, visualize, and summarize your dataset with charts and statistics before building a machine learning model.

The **EDA** component runs statistical analyses on your DataFrame and returns charts and summaries you can explore in the canvas. Use it to understand your data's structure, spot problems, and decide what preprocessing to apply.

## Configuration

| Option            | Description                                                                      |
| ----------------- | -------------------------------------------------------------------------------- |
| **Analysis Type** | The analysis to run (see below).                                                 |
| **Features**      | Columns to include in the analysis (used by Distribution and Outlier Detection). |

### Analysis types

| Type                   | What you get                                                                                                                                                                   |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Correlation Matrix** | Heatmap of Pearson correlations between all numerical features. Shows the top 10 most correlated pairs.                                                                        |
| **Missing Values**     | Bar chart of missing value counts and percentages per column. Flags completely missing columns.                                                                                |
| **Statistics**         | Descriptive statistics table: count, mean, std, min, quartiles, max, skewness, kurtosis for numerical columns; top value, frequency, and unique count for categorical columns. |
| **Distribution**       | Histogram with summary stats (mean, median, std, skewness) for each selected feature.                                                                                          |
| **Outlier Detection**  | Box-plot style outlier summary using the IQR method (1.5× IQR rule). Shows outlier count and percentage per feature.                                                           |

## Input / Output

|        | Type                                                             |
| ------ | ---------------------------------------------------------------- |
| Input  | DataFrame                                                        |
| Output | Analytics data (displayed in the canvas — not passed downstream) |

<Tip>
  Run EDA right after loading your data to get a quick overview before deciding which preprocessing components to add.
</Tip>
