library(tidyverse)
library(here)
options(scipen = 999)
attain <- read_csv(here("data", "attain.csv"))
attain <- attain |>
mutate(
union_member = if_else(union %in% c("r belong", "r and sp"), 1L, 0L),
college = if_else(degree %in% c("bachelor", "graduate"), 1L, 0L),
college_lab = if_else(college == 1, "College+", "No college")
)Lab 5: Hypothesis Testing
Overview
In this lab you will run and interpret the four hypothesis tests covered in lecture — the same tests you’ll use for hw8. By the end, you should be able to pick the right test for your research question and write up the result.
Setup
Test 1: Proportion z-Test
Research question: Is the proportion of U.S. adults with a college degree equal to 25%?
- \(H_0: \pi = 0.25\)
- \(H_1: \pi \neq 0.25\) (two-tailed), \(\alpha = 0.05\)
x <- sum(attain$college, na.rm = TRUE) # number with college degree
n <- sum(!is.na(attain$degree)) # sample size
prop.test(x, n, p = 0.25, correct = FALSE)
1-sample proportions test without continuity correction
data: x out of n, null probability 0.25
X-squared = 1.8891, df = 1, p-value = 0.1693
alternative hypothesis: true p is not equal to 0.25
95 percent confidence interval:
0.2241340 0.2547398
sample estimates:
p
0.2391013
p = 0.25 sets the null hypothesis value. When testing, R uses \(SE_0 = \sqrt{\pi_0(1-\pi_0)/n}\) — the null value, not the sample proportion.
Question 1. Based on the p-value, do you reject or fail to reject \(H_0\)? Write one sentence interpreting the result (include the sample proportion and p-value).
Your answer:
Test 2: One-Sample t-Test
Research question: Is the mean years of education among U.S. adults equal to 12 (a high school diploma)?
- \(H_0: \mu = 12\)
- \(H_1: \mu \neq 12\) (two-tailed), \(\alpha = 0.05\)
educ_clean <- attain |> filter(!is.na(educ))
t.test(educ_clean$educ, mu = 12)
One Sample t-test
data: educ_clean$educ
t = 21.291, df = 2984, p-value < 0.00000000000000022
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
13.05116 13.26442
sample estimates:
mean of x
13.15779
mu = 12 sets the null hypothesis value. The output reports the sample mean, t-statistic, degrees of freedom, p-value, and 95% CI.
Question 2. Do you reject or fail to reject \(H_0\)? Write one sentence interpreting the result (include the sample mean, t-statistic, and p-value).
Your answer:
Test 3: Two-Sample t-Test
Research question: Do married people work a different number of hours per week than never-married people?
- \(H_0: \mu_{\text{married}} = \mu_{\text{never married}}\)
- \(H_1: \mu_{\text{married}} \neq \mu_{\text{never married}}\) (two-tailed), \(\alpha = 0.05\)
married_hrs <- attain |> filter(marital == "married", !is.na(hrs1)) |> pull(hrs1)
nevmar_hrs <- attain |> filter(marital == "never ma", !is.na(hrs1)) |> pull(hrs1)
t.test(married_hrs, nevmar_hrs)
Welch Two Sample t-test
data: married_hrs and nevmar_hrs
t = 4.6299, df = 819.44, p-value = 0.000004253
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.214214 5.473413
sample estimates:
mean of x mean of y
42.97255 39.12874
The output shows the mean for each group, the t-statistic, and the 95% CI for the difference in means. If the CI contains 0, that is consistent with failing to reject \(H_0\).
Question 3. Do you reject or fail to reject \(H_0\)? Write 2 sentences reporting the result: include both group means, the t-statistic, the p-value, and your decision.
Your answer:
Test 4: Chi-Squared Test
Research question: Is race associated with having a college degree?
- \(H_0\): Race and college attainment are independent
- \(H_1\): They are not independent, \(\alpha = 0.05\)
attain_deg <- attain |> filter(!is.na(race), !is.na(degree))
# Contingency table (table() is needed for chisq.test)
tab <- table(attain_deg$race, attain_deg$college_lab)
addmargins(tab)
College+ No college Sum
black 43 341 384
other 31 90 121
white 639 1838 2477
Sum 713 2269 2982
# College rate within each racial group
attain_deg |>
group_by(race) |>
summarize(college_rate = mean(college, na.rm = TRUE))# A tibble: 3 × 2
race college_rate
<chr> <dbl>
1 black 0.112
2 other 0.256
3 white 0.258
# Chi-squared test
chisq.test(tab, correct = FALSE)
Pearson's Chi-squared test
data: tab
X-squared = 39.152, df = 2, p-value = 0.000000003149
The chi-squared test tells you whether an association exists — the group rates above tell you where the differences are.
Question 4. Do you reject or fail to reject \(H_0\)? Write 2 sentences: state your decision (with the \(\chi^2\) statistic and p-value), then describe which racial group has the highest/lowest college rate.
Your answer: