Selected topic

Correlation Analysis

Correlation Analysis

Prefer practical output? Use related tools below while reading.

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two continuous variables. It's a crucial step in Exploratory Data Analysis (EDA) to understand the relationships within your data.

### What does correlation mean?

Correlation measures how much one variable changes when another variable changes. A strong correlation means that as one variable increases, the other variable tends to increase or decrease as well.

### Types of Correlation

There are three types of correlations:

  • Positive Correlation: When both variables tend to move in the same direction (e.g., as one variable increases, so does the other).
  • Negative Correlation: When one variable tends to increase while the other decreases.
  • No Correlation: When there is no apparent relationship between the two variables.
### Example: Correlation Analysis using Python
python
import pandas as pd
from scipy.stats import pearsonr

# Create a sample dataset (in this case, height and weight)
data = {
'Height': [175, 180, 165, 190, 182, 168],
'Weight': [70, 80, 65, 95, 85, 60]
}
df = pd.DataFrame(data)

# Perform Pearson Correlation Coefficient analysis
corr_coef, _ = pearsonr(df['Height'], df['Weight'])

print(f'Correlation Coefficient: {corr_coef:.2f}')

if corr_coef > 0:
print('Positive correlation')
elif corr_coef < 0:
print('Negative correlation')
else:
print('No correlation')


Interpretation of the Results


  • The correlation coefficient (Pearson's r) is 0.96, indicating a strong positive correlation between height and weight.
  • This means that as height increases, so does weight.

Correlation analysis provides insights into relationships within your data, helping you identify trends, patterns, or potential issues to explore further in your EDA.