Selected topic
Data Wrangling
Prefer practical output? Use related tools below while reading.
Data wrangling, also known as data munging or data cleaning, is the process of transforming raw data from various sources into a clean, consistent, and usable format. This process involves several steps to ensure that the data is accurate, complete, and in a suitable form for analysis.
| Customer ID | Name | Email | Phone |
| --- | --- | --- | --- |
| 1 | John Smith | john.smith@example.com | 123-4567 |
| 2 | Jane Doe | jane.doe@example.com | 234-5678 |
The data is collected from various sources, but it contains some errors and inconsistencies:
| Customer ID | Name | Email | Phone |
| --- | --- | --- | --- |
| 1 | John Smith | john.smith@example.com | 1234567 |
| 2 | Jane Doe | jane.doe@example.com | 2345678 |
| Customer ID | Name | Email | Phone (numeric) |
| --- | --- | --- | --- |
| 1 | John Smith | john.smith@example.com | 1234567 |
| 2 | Jane Doe | jane.doe@example.com | 2345678 |
| Customer ID | Order Date | Product Name |
| --- | --- | --- |
| 1 | 2022-01-01 | iPhone |
| 1 | 2022-02-01 | Laptop |
| 2 | 2022-03-01 | Tablet |
After data wrangling, we have a clean and integrated dataset:
| Customer ID | Name | Email | Phone (numeric) | Order Date | Product Name |
| --- | --- | --- | --- | --- | --- |
| 1 | John Smith | john.smith@example.com | 1234567 | 2022-01-01 | iPhone |
| 1 | John Smith | john.smith@example.com | 1234567 | 2022-02-01 | Laptop |
| 2 | Jane Doe | jane.doe@example.com | 2345678 | 2022-03-01 | Tablet |
The data is now ready for analysis, and we can perform various statistical and machine learning tasks to gain insights from the data.