Lab 5: Hypothesis Testing

Author

Your Name Here

Published

March 25, 2026

Overview

In this lab you will run and interpret the four hypothesis tests covered in lecture — the same tests you’ll use for hw8. By the end, you should be able to pick the right test for your research question and write up the result.

Setup

library(tidyverse)
library(here)

options(scipen = 999)

attain <- read_csv(here("data", "attain.csv"))

attain <- attain |>
  mutate(
    union_member = if_else(union %in% c("r belong", "r and sp"), 1L, 0L),
    college      = if_else(degree %in% c("bachelor", "graduate"), 1L, 0L),
    college_lab  = if_else(college == 1, "College+", "No college")
  )

Test 1: Proportion z-Test

Research question: Is the proportion of U.S. adults with a college degree equal to 25%?

\(H_0: \pi = 0.25\)
\(H_1: \pi \neq 0.25\) (two-tailed), \(\alpha = 0.05\)

x <- sum(attain$college, na.rm = TRUE)   # number with college degree
n <- sum(!is.na(attain$degree))          # sample size

prop.test(x, n, p = 0.25, correct = FALSE)


    1-sample proportions test without continuity correction

data:  x out of n, null probability 0.25
X-squared = 1.8891, df = 1, p-value = 0.1693
alternative hypothesis: true p is not equal to 0.25
95 percent confidence interval:
 0.2241340 0.2547398
sample estimates:
        p 
0.2391013

Note

p = 0.25 sets the null hypothesis value. When testing, R uses \(SE_0 = \sqrt{\pi_0(1-\pi_0)/n}\) — the null value, not the sample proportion.

Question 1. Based on the p-value, do you reject or fail to reject \(H_0\)? Write one sentence interpreting the result (include the sample proportion and p-value).

Your answer:

Test 2: One-Sample t-Test

Research question: Is the mean years of education among U.S. adults equal to 12 (a high school diploma)?

\(H_0: \mu = 12\)
\(H_1: \mu \neq 12\) (two-tailed), \(\alpha = 0.05\)

educ_clean <- attain |> filter(!is.na(educ))

t.test(educ_clean$educ, mu = 12)


    One Sample t-test

data:  educ_clean$educ
t = 21.291, df = 2984, p-value < 0.00000000000000022
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
 13.05116 13.26442
sample estimates:
mean of x 
 13.15779

Note

mu = 12 sets the null hypothesis value. The output reports the sample mean, t-statistic, degrees of freedom, p-value, and 95% CI.

Question 2. Do you reject or fail to reject \(H_0\)? Write one sentence interpreting the result (include the sample mean, t-statistic, and p-value).

Your answer:

Test 3: Two-Sample t-Test

Research question: Do married people work a different number of hours per week than never-married people?

\(H_0: \mu_{\text{married}} = \mu_{\text{never married}}\)
\(H_1: \mu_{\text{married}} \neq \mu_{\text{never married}}\) (two-tailed), \(\alpha = 0.05\)

married_hrs <- attain |> filter(marital == "married",  !is.na(hrs1)) |> pull(hrs1)
nevmar_hrs  <- attain |> filter(marital == "never ma", !is.na(hrs1)) |> pull(hrs1)

t.test(married_hrs, nevmar_hrs)


    Welch Two Sample t-test

data:  married_hrs and nevmar_hrs
t = 4.6299, df = 819.44, p-value = 0.000004253
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.214214 5.473413
sample estimates:
mean of x mean of y 
 42.97255  39.12874

Note

The output shows the mean for each group, the t-statistic, and the 95% CI for the difference in means. If the CI contains 0, that is consistent with failing to reject \(H_0\).

Question 3. Do you reject or fail to reject \(H_0\)? Write 2 sentences reporting the result: include both group means, the t-statistic, the p-value, and your decision.

Your answer:

Test 4: Chi-Squared Test

Research question: Is race associated with having a college degree?

\(H_0\): Race and college attainment are independent
\(H_1\): They are not independent, \(\alpha = 0.05\)

attain_deg <- attain |> filter(!is.na(race), !is.na(degree))

# Contingency table (table() is needed for chisq.test)
tab <- table(attain_deg$race, attain_deg$college_lab)
addmargins(tab)

       
        College+ No college  Sum
  black       43        341  384
  other       31         90  121
  white      639       1838 2477
  Sum        713       2269 2982

# College rate within each racial group
attain_deg |>
  group_by(race) |>
  summarize(college_rate = mean(college, na.rm = TRUE))

# A tibble: 3 × 2
  race  college_rate
  <chr>        <dbl>
1 black        0.112
2 other        0.256
3 white        0.258

# Chi-squared test
chisq.test(tab, correct = FALSE)


    Pearson's Chi-squared test

data:  tab
X-squared = 39.152, df = 2, p-value = 0.000000003149

Note

The chi-squared test tells you whether an association exists — the group rates above tell you where the differences are.

Question 4. Do you reject or fail to reject \(H_0\)? Write 2 sentences: state your decision (with the \(\chi^2\) statistic and p-value), then describe which racial group has the highest/lowest college rate.

Your answer:

Before You Leave

You can run all four tests and read the output
You know which test to use based on your variable types
You understand that “fail to reject \(H_0\)” ≠ “accept \(H_0\)”