Selected topic

Univariate Analysis

Univariate Analysis

Prefer practical output? Use related tools below while reading.

Open developer tools Try JDE log analyzer Use OFDM simulator

Univariate analysis is a fundamental step in exploratory data analysis (EDA) that involves examining the distribution and properties of a single variable. This type of analysis helps to identify trends, patterns, and relationships within the data.

Example:

Suppose we have a dataset containing information about customer orders, including the order amount (in dollars). We want to perform univariate analysis on this variable to understand its characteristics.

Descriptive Statistics:

Mean: Calculate the average order amount.
Median: Find the middle value of the order amounts (i.e., the 50th percentile).
Mode: Determine the most frequently occurring order amount.

python
import pandas as pd
# Sample dataset
data = {
    &#39;Order Amount&#39;: [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
}
df = pd.DataFrame(data)
# Calculate descriptive statistics
mean_order_amount = df[&#39;Order Amount&#39;].mean()
median_order_amount = df[&#39;Order Amount&#39;].median()
mode_order_amount = df[&#39;Order Amount&#39;].mode().values[0]print(&quot;Mean Order Amount:&quot;, mean_order_amount)
print(&quot;Median Order Amount:&quot;, median_order_amount)
print(&quot;Mode Order Amount:&quot;, mode_order_amount)

Output:

Mean Order Amount: 500.0
Median Order Amount: 500.0
Mode Order Amount: 100

Visualization:

To further understand the distribution of the order amounts, we can create a histogram:

python
import matplotlib.pyplot as plt# Create a histogram
plt.hist(df[&#39;Order Amount&#39;], bins=10, edgecolor=&#39;black&#39;)
plt.xlabel(&#39;Order Amount (in dollars)&#39;)
plt.ylabel(&#39;Frequency&#39;)
plt.title(&#39;Distribution of Order Amounts&#39;)
plt.show()

This histogram reveals that the order amounts are clustered around $500, with a few outliers. The univariate analysis helps us identify patterns in the data and provides a foundation for further analysis.

Additional Measures:

Depending on the research question or problem statement, additional measures may be used in univariate analysis, such as:

Standard Deviation: A measure of the spread of the data.
Interquartile Range (IQR): The difference between the 75th and 25th percentiles.
Skewness: A measure of the asymmetry of the distribution.

By applying these measures, you can gain a deeper understanding of your data and make informed decisions about further analysis or visualization.

Download PDF Back to topic options Back to blog home