Selected topic

Data Distribution

Data Distribution

Prefer practical output? Use related tools below while reading.

What is Data Distribution?

Data distribution refers to the way the values in a dataset are spread out. It describes the shape, center, and dispersion of the data.

Mean: The average value of the data.
Median: The middle value of the data when it's sorted in ascending or descending order.
Mode: The most frequently occurring value in the data.
Standard Deviation (SD): A measure of the spread or dispersion of the data from its mean.

Symmetric: The data is evenly distributed on both sides of the mean, with a bell-shaped curve.
Skewed: The data is not symmetric, with most values clustering around one side of the mean.

Suppose we have a dataset of exam scores for 100 students:

| Student | Score |
| --- | --- |
| 1 | 80 |
| 2 | 70 |
| 3 | 90 |
| ... | ... |
| 100 | 60 |

The data distribution for this example would be symmetric, with most scores clustering around the mean and median.

Data distribution can help us:

In summary, understanding data distribution is essential in EDA to gain insights into the nature and behavior of the data.