Selected topic
Lag Features
Prefer practical output? Use related tools below while reading.
In Exploratory Data Analysis (EDA), lag features are a type of feature engineering technique used to extract information from time series data. A lag feature is essentially a shifted version of the original variable, where each value is replaced by its corresponding value at some previous time point.
### Why Lag Features?
Lag features can help reveal patterns and relationships in time series data that may not be immediately apparent when looking at the raw data. They can also be used to:
There are several types of lag features, including:
t-n.Suppose we have a dataset with daily sales data for an e-commerce company. We want to analyze the effect of last week's sales on this week's sales.
python
import pandas as pd# Sample data
data = {'Date': ['2022-01-01', '2022-01-02', ..., '2022-12-31'],
'Sales': [100, 120, 110, ..., 500]}
df = pd.DataFrame(data)
# Create lag feature (last week's sales)
df['Lag_Sales'] = df.groupby('Date')['Sales'].transform(lambda x: x.shift(7))
# Create lead feature (next week's sales)
df['Lead_Sales'] = df.groupby('Date')['Sales'].transform(lambda x: x.shift(-7))
Lag_Sales represents the sales from last week for each corresponding day, while Lead_Sales represents the sales for next week.### Advice
When working with lag features: