Selected topic
Skewness And Kurtosis
Prefer practical output? Use related tools below while reading.
In Exploratory Data Analysis (EDA), two important measures are used to describe the shape of your data distribution: Skewness and Kurtosis. These metrics help you understand whether your data is symmetric, heavy-tailed, or has other characteristics.
Example:
Suppose we have a dataset of house prices in a city. The distribution of prices might be:
| Price | Frequency |
| --- | --- |
| 200K | 10 |
| 300K | 20 |
| 400K | 30 |
| 500K | 40 |
In this example, the prices are skewed to the right (positive skewness), as there are more extreme values on the higher side (e.g., $500K).
Example:
Suppose we have a dataset of incomes. The distribution might be leptokurtic, as there are more people earning high salaries and fewer people earning very low or moderate salaries.
To calculate Skewness and Kurtosis in Python, you can use the following libraries:
python
import pandas as pd
from scipy import stats# Load your dataset into a Pandas DataFrame
df = pd.read_csv('your_data.csv')
# Calculate Skewness
skewness = df['column_name'].skew()
print(f'Skewness: {skewness:.2f}')
# Calculate Kurtosis
kurtosis = df['column_name'].kurtosis()
print(f'Kurtosis: {kurtosis:.2f}')
By understanding and visualizing Skewness and Kurtosis, you can better interpret the characteristics of your data distribution. This can help you: