Selected topic
Feature Engineering
Prefer practical output? Use related tools below while reading.
Feature engineering is the process of selecting, transforming, and extracting relevant features from raw data to improve the performance and accuracy of machine learning models. It's an essential step in the machine learning pipeline that can significantly impact model performance.
price: The original price of the housebedrooms: Number of bedrooms in the housesqft: Square footage of the houselocation: Zip code or neighborhoodtype: Type of property (single-family home, condo, etc.)price and sqft as relevant features.price to reduce the effect of extreme values.bedrooms_per_sqft, by dividing bedrooms by sqft.location_score, based on the zip code or neighborhood.python
import pandas as pd# Load data
df = pd.read_csv('house_prices.csv')
# Select relevant features
selected_features = ['price', 'sqft']
# Apply logarithmic transformation to price
df['log_price'] = np.log(df['price'])
# Create new feature: bedrooms_per_sqft
df['bedrooms_per_sqft'] = df['bedrooms'] / df['sqft']
# Generate location_score based on zip code or neighborhood
location_scores = pd.get_dummies(df['location'])
df = pd.concat([df, location_scores], axis=1)
# Drop original location feature
df.drop('location', axis=1, inplace=True)