Selected topic
Data Smoothing
Prefer practical output? Use related tools below while reading.
================
Data smoothing is a technique used in data preprocessing to reduce noise and variability in a dataset. It involves applying a mathematical formula or algorithm to smooth out the irregularities in the data, resulting in a more stable and consistent output.
Suppose we have a dataset of daily sales figures for an e-commerce company:
| Date | Sales |
| --- | --- |
| 2022-01-01 | 100 |
| 2022-01-02 | 120 |
| 2022-01-03 | 110 |
| 2022-01-04 | 130 |
| 2022-01-05 | 140 |
To smooth out the fluctuations, we can apply a simple moving average with a window size of 3 days:
100(100 + 120 + 110) / 3 = 110(120 + 110 + 130) / 3 = 120| Date | Sales | Smoothed |
| --- | --- | --- |
| 2022-01-01 | 100 | 100 |
| 2022-01-02 | 120 | 110 |
| 2022-01-03 | 110 | 120 |
| 2022-01-04 | 130 | 126.67 |
| 2022-01-05 | 140 | 132.33 |
Here's a Python example using pandas and numpy libraries:
python
import pandas as pd
import numpy as np# Create sample dataset
data = {
'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'Sales': [100, 120, 110, 130, 140]
}
df = pd.DataFrame(data)
# Apply simple moving average with window size of 3
window_size = 3
df['Smoothed'] = df['Sales'].rolling(window_size).mean()
print(df)