Selected topic
Groupby Aggregation
Prefer practical output? Use related tools below while reading.
GroupBy aggregation is a powerful tool in data analysis that allows you to perform various operations on grouped data. It's a crucial step in the Exploratory Data Analysis (EDA) process.
mean(): calculate the mean valuesum(): sum up valuescount(): count the number of rows in each groupstd(): calculate the standard deviationorders with columns order_id, customer_id, product_name, and order_total.python
import pandas as pd# sample data
data = {
'order_id': [1, 2, 3, 4, 5],
'customer_id': [101, 102, 103, 104, 105],
'product_name': ['Product A', 'Product B', 'Product C', 'Product A', 'Product B'],
'order_total': [100.0, 200.0, 300.0, 150.0, 250.0]
}
df = pd.DataFrame(data)
# GroupBy aggregation
grouped_df = df.groupby('product_name')['order_total'].mean()
print(grouped_df)
Output:
| product_name | order_total |
| --- | --- |
| Product A | 125.0 |
| Product B | 225.0 |
| Product C | 300.0 |
In this example, we grouped the data by product_name and calculated the mean of order_total for each group.