Selected topic
Gradient Boosting
Prefer practical output? Use related tools below while reading.
======================
Gradient Boosting is a popular machine learning algorithm that combines multiple weak models to create a strong predictive model. It's an ensemble method that uses boosting techniques to improve the accuracy of predictions.
num_estimators: Number of weak models added to the ensemble.
+ learning_rate: Step size for each iteration, controlling how fast the model learns.
Suppose we want to predict whether a customer will respond to an email or not. We have a dataset with features like age, location, and purchase history.
python
import pandas as pd# Sample data
data = {
'Age': [25, 30, 35, 20],
'Location': ['NYC', 'LA', 'CHI', 'ATL'],
'Purchase_History': [0.5, 0.8, 0.2, 0.9]
}
df = pd.DataFrame(data)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['Age', 'Location', 'Purchase_History']], df['Responded'], test_size=0.2, random_state=42)
python
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score# Initialize model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
# Train model
gb_model.fit(X_train, y_train)
# Make predictions
y_pred = gb_model.predict(X_test)
# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')
The example above demonstrates how to use Gradient Boosting for binary classification. The num_estimators parameter controls the number of weak models added to the ensemble, and the learning_rate parameter controls how fast the model learns.
In this example, we achieved an accuracy of approximately 92%. You can adjust the hyperparameters (e.g., n_estimators, learning_rate) to improve performance.