Selected topic
Data Transformation
Prefer practical output? Use related tools below while reading.
Data transformation is the process of converting raw data into a suitable format for analysis. It involves modifying or cleaning up data to make it more meaningful and interpretable. The goal of data transformation is to ensure that data is consistent, accurate, and relevant for further processing.
| Student ID | Name | Exam Score | City |
| --- | --- | --- | --- |
| 1 | John | 85.0 | New York |
| 2 | Emma | NaN | Los Angeles|
| 3 | Max | 90.5 | Chicago |
| ... | ... | ... | ... |
python
import pandas as pd# Load the dataset
df = pd.read_csv('student_scores.csv')
# Print the first few rows of the dataset
print(df.head())
# Impute missing values with mean score (data transformation)
df['Exam Score'] = df['Exam Score'].fillna(df['Exam Score'].mean())
# One-hot encode categorical variable 'City' (data transformation)
df = pd.get_dummies(df, columns=['City'])
# Print the first few rows of the transformed dataset
print(df.head())
In this example, we use Pandas to load and manipulate a dataset. We impute missing values in the 'Exam Score' column with the mean score and apply one-hot encoding to the categorical variable 'City'. The resulting transformed dataset is then printed.
Data transformation is an essential step in data preprocessing, as it ensures that data is clean, consistent, and suitable for analysis.