Selected topic
Data Binning
Prefer practical output? Use related tools below while reading.
Data binning, also known as data discretization or data quantization, is a preprocessing technique used to transform continuous or numerical variables into categorical or discrete variables by dividing them into distinct intervals or bins. The goal of binning is to reduce the dimensionality of data and make it more interpretable for analysis.
| Customer ID | Age |
| --- | --- |
| 1 | 25 |
| 2 | 42 |
| 3 | 28 |
| 4 | 50 |
| ... | ... |
We decide to use equal-width binning with 5 bins:
| Customer ID | Age (binned) |
| --- | --- |
| 1 | Bin 1 (young adults) |
| 2 | Bin 3 (middle-aged) |
| 3 | Bin 1 (young adults) |
| 4 | Bin 4 (seniors) |
| ... | ... |
pandas library to bin age values:python
import pandas as pd# Sample dataset
data = {
'Customer ID': [1, 2, 3, 4],
'Age': [25, 42, 28, 50]
}
df = pd.DataFrame(data)
# Binning using equal-width binning with 5 bins
bins = [0, 30, 40, 50, 60]
labels = ['young adults', 'adults', 'middle-aged', 'seniors']
df['Age (binned)'] = pd.cut(df['Age'], bins=bins, labels=labels)
print(df)
Customer ID Age Age (binned)
0 1 25 young adults
1 2 42 middle-aged
2 3 28 young adults
3 4 50 seniors