Introduction to Statistical Tests

A Practical Guide with R and Python

1 Introduction

This document provides a practical guide to several fundamental statistical tests, demonstrating their implementation in both R and Python. We will cover the independent two-sample t-test, the paired t-test, ANOVA, and correlation tests (Pearson and Spearman). For each test, we will explain the underlying theory, assumptions, and how to interpret the results.

2 Independent Two-Sample t-Test

The independent two-sample t-test is used to determine whether there is a statistically significant difference between the means of two independent groups.

Null Hypothesis (H0): The means of the two groups are equal. Alternative Hypothesis (H1): The means of the two groups are not equal.

R
Python

Code

# Sample data for two independent groups
class_A <- c(85, 88, 90, 85, 87, 91, 89, 100)
class_B <- c(80, 82, 84, 79, 81, 83, 78)

# Calculate means and variances
mean_A <- mean(class_A)
mean_B <- mean(class_B)
var_A <- var(class_A)
var_B <- var(class_B)

cat(paste("Mean of Class A:", round(mean_A, 2), "\n"))

Mean of Class A: 89.38

Code

cat(paste("Mean of Class B:", round(mean_B, 2), "\n"))

Mean of Class B: 81

Code

cat(paste("Variance of Class A:", round(var_A, 2), "\n"))

Variance of Class A: 23.12

Code

cat(paste("Variance of Class B:", round(var_B, 2), "\n"))

Variance of Class B: 4.67

Code

# Perform the t-test
t_test_result <- t.test(class_A, class_B)
print(t_test_result)


    Welch Two Sample t-test

data:  class_A and class_B
t = 4.4404, df = 9.9817, p-value = 0.001259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  4.171513 12.578487
sample estimates:
mean of x mean of y 
   89.375    81.000

Interpretation:

p-value: The p-value is very small (0.0001), which is less than the common alpha level of 0.05.
Conclusion: We reject the null hypothesis. There is a statistically significant difference between the mean scores of Class A and Class B.

2.1 Assumptions of the t-Test

Normality: The data in each group should be approximately normally distributed.
Independence: The two groups must be independent of each other.
Equal Variances (Homogeneity of Variances): The variances of the two groups should be equal. The Welch’s t-test (the default in R) does not assume equal variances.

2.1.1 Normality Test (Shapiro-Wilk)

Code

shapiro.test(class_A)


    Shapiro-Wilk normality test

data:  class_A
W = 0.8249, p-value = 0.05252

Code

shapiro.test(class_B)


    Shapiro-Wilk normality test

data:  class_B
W = 0.978, p-value = 0.9493

Result: Both p-values are greater than 0.05, so we fail to reject the null hypothesis. The data appears to be normally distributed.

2.1.2 Equal Variance Test (Levene’s Test)

Code

# Combine data for Levene's test
score <- c(class_A, class_B)
group <- c(rep("A", length(class_A)), rep("B", length(class_B)))
data <- data.frame(score, group)

car::leveneTest(score ~ group, data = data)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.9926 0.3373
      13

Result: The p-value is greater than 0.05, so we fail to reject the null hypothesis. The variances are assumed to be equal.

Code

import numpy as np
from scipy.stats import ttest_ind

# Sample data
group_a_scores = np.array([88, 92, 85, 91, 87])
group_b_scores = np.array([78, 75, 80, 73, 77])

Code

# Perform Independent Two-Sample t-Test
t_stat, p_value = ttest_ind(group_a_scores, group_b_scores)

print(f"T-statistic: {t_stat:.4f}")

T-statistic: 6.7937

Code

print(f"P-value: {p_value:.4f}")

P-value: 0.0001

Interpretation:

p-value: The p-value is very small (0.0001), which is less than 0.05.
Conclusion: We reject the null hypothesis. There is a statistically significant difference between the means of the two groups.

3 Paired t-Test

The paired t-test is used to compare the means of two related groups to determine if there is a statistically significant difference between them.

Null Hypothesis (H0): The true mean difference between the paired samples is zero. Alternative Hypothesis (H1): The true mean difference is not zero.

R
Python

Code

# Sample data
before <- c(100, 102, 98, 95, 101)
after <- c(102, 104, 99, 97, 103)

# Calculate means
mean_before <- mean(before)
mean_after <- mean(after)

cat(paste("Mean before:", round(mean_before, 2), "\n"))

Mean before: 99.2

Code

cat(paste("Mean after:", round(mean_after, 2), "\n"))

Mean after: 101

Code

# Perform the paired t-test
t_test_paired <- t.test(before, after, paired = TRUE)
print(t_test_paired)


    Paired t-test

data:  before and after
t = -9, df = 4, p-value = 0.0008438
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.355289 -1.244711
sample estimates:
mean difference 
           -1.8

Interpretation:

p-value: The p-value is very small (0.0008), which is less than 0.05.
Conclusion: We reject the null hypothesis. There is a statistically significant increase in scores after the intervention.

3.1 Assumption: Normality of Differences

The paired t-test assumes that the differences between the pairs are normally distributed.

Code

diff <- after - before
shapiro.test(diff)


    Shapiro-Wilk normality test

data:  diff
W = 0.55218, p-value = 0.000131

Result: The p-value is greater than 0.05, so the differences are normally distributed.

Code

import numpy as np
from scipy.stats import ttest_rel

# Sample data
before = np.array([72, 75, 78, 70, 74])
after = np.array([78, 80, 82, 76, 79])

Code

# Perform the paired t-test
t_stat, p_val = ttest_rel(before, after)

print("Paired t-Test Results:")

Paired t-Test Results:

Code

print(f"T-statistic: {t_stat:.4f}")

T-statistic: -13.8976

Code

print(f"P-value: {p_val:.4f}")

P-value: 0.0002

Interpretation:

p-value: The p-value is very small, which is less than 0.05.
Conclusion: We reject the null hypothesis. There is a statistically significant increase in scores.

4 ANOVA (Analysis of Variance)

ANOVA is used to compare the means of three or more groups to see if at least one group is different from the others.

Null Hypothesis (H0): The means of all groups are equal. Alternative Hypothesis (H1): At least one group mean is different.

R
Python

Code

# Create 3 groups
group1 <- c(85, 88, 90, 85, 87, 91, 89, 100)
group2 <- c(80, 88, 84, 89, 81, 83, 88, 100)
group3 <- c(120, 200, 200, 200, 100, 200, 100, 100)

# Combine into a data frame
value <- c(group1, group2, group3)
group <- factor(rep(c("Group1", "Group2", "Group3"), each = 8))
data <- data.frame(group, value)

Code

# Perform ANOVA
anova_result <- aov(value ~ group, data = data)
summary(anova_result)

            Df Sum Sq Mean Sq F value   Pr(>F)    
group        2  22218   11109   12.41 0.000277 ***
Residuals   21  18796     895                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: The p-value is very small, so we reject the null hypothesis. At least one group mean is different.

4.1 Post-Hoc Test (Tukey HSD)

If the ANOVA is significant, we use a post-hoc test like Tukey’s Honestly Significant Difference (HSD) to find out which specific groups are different from each other.

Code

TukeyHSD(anova_result)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = value ~ group, data = data)

$group
                diff       lwr       upr     p adj
Group2-Group1 -2.750 -40.45414  34.95414 0.9815566
Group3-Group1 63.125  25.42086 100.82914 0.0010731
Group3-Group2 65.875  28.17086 103.57914 0.0006950

Interpretation: The results show that Group 3 is significantly different from both Group 1 and Group 2.

Code

import scipy.stats as stats
import pandas as pd
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Sample data
group1 = [85, 90, 88, 75, 95, 90]
group2 = [70, 65, 80, 72, 68, 90]
group3 = [120, 200, 200, 200, 100, 120]

# Combine data
scores = group1 + group2 + group3
methods = ['Method1'] * len(group1) + ['Method2'] * len(group2) + ['Method3'] * len(group3)
df = pd.DataFrame({'score': scores, 'method': methods})

Code

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_statistic:.4f}")

F-statistic: 14.5233

Code

print(f"P-value: {p_value:.4f}")

P-value: 0.0003

Interpretation: The p-value is very small, so we reject the null hypothesis.

4.2 Post-Hoc Test (Tukey HSD)

Code

tukey = pairwise_tukeyhsd(endog=df['score'], groups=df['method'], alpha=0.05)
print(tukey)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
========================================================
 group1  group2 meandiff p-adj   lower    upper   reject
--------------------------------------------------------
Method1 Method2    -13.0 0.7148 -55.7563  29.7563  False
Method1 Method3     69.5  0.002  26.7437 112.2563   True
Method2 Method3     82.5 0.0004  39.7437 125.2563   True
--------------------------------------------------------

Interpretation: Method 3 is significantly different from Method 1 and Method 2.

5 Correlation

Correlation tests measure the strength and direction of the relationship between two continuous variables.

5.1 Pearson Correlation

Measures the linear relationship between two variables. It assumes that the variables are normally distributed.

R
Python

Code

# Sample data
data <- data.frame(
  x = c(10, 20, 30, 40, 50, 10),
  y = c(15, 25, 35, 45, 55, 5)
)

Code

# Compute Pearson correlation
correlation <- cor.test(data$x, data$y, method = "pearson")
print(correlation)


    Pearson's product-moment correlation

data:  data$x and data$y
t = 10.392, df = 4, p-value = 0.0004841
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.8392446 0.9981104
sample estimates:
      cor 
0.9819805

Interpretation: The correlation coefficient is 0.83, indicating a strong positive linear relationship.

Code

import scipy.stats as stats

# Example data
x = [10, 20, 30, 40, 50, 77, 89]
y = [15, 25, 35, 45, 55, 70, 80]

Code

# Calculate Pearson correlation
corr_coef, p_value = stats.pearsonr(x, y)

print(f"Pearson correlation coefficient: {corr_coef:.4f}")

Pearson correlation coefficient: 0.9927

Code

print(f"P-value: {p_value:.4f}")

P-value: 0.0000

Interpretation: The correlation coefficient is 0.99, indicating a very strong positive linear relationship.

5.2 Spearman Correlation

Measures the monotonic relationship between two variables. It does not assume normality and is based on the ranks of the data.

R
Python

Code

# Sample data
data <- data.frame(
  x = c(10, 20, 30, 40, 50, 10),
  y = c(15, 25, 35, 45, 55, 5)
)

Code

# Compute Spearman correlation
correlation <- cor.test(data$x, data$y, method = "spearman")
print(correlation)


    Spearman's rank correlation rho

data:  data$x and data$y
S = 0.50362, p-value = 0.0003091
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.9856108

Interpretation: The Spearman correlation coefficient is 0.83, indicating a strong positive monotonic relationship.

Code

import scipy.stats as stats

# Example data
x = [10, 20, 30, 40, 50]
y = [1, 2, 3, 4, 5]

Code

# Calculate Spearman correlation
corr_coef, p_value = stats.spearmanr(x, y)

print(f"Spearman correlation coefficient: {corr_coef:.4f}")

Spearman correlation coefficient: 1.0000

Code

print(f"P-value: {p_value:.4f}")

P-value: 0.0000

Interpretation: The Spearman correlation coefficient is 1.0, indicating a perfect positive monotonic relationship.

6 Conclusion

This document has provided a practical overview of several key statistical tests. By understanding the principles behind these tests and how to implement them in R and Python, you can gain valuable insights from your data and make informed, data-driven decisions.