
Sociology 106: Quantitative Sociological Methods
February 24, 2026
HW 3: Great job overall. Two main comments:
filter(!is.na(degree), !is.na(marital)) |>)Main idea: Today is the bridge from probability language to inferential statistics.
dbinom(), pbinom(), pnorm(), qnorm()Imagine we’re conducting a survey about voter turnout. Each person either voted (1) or didn’t vote (0).
From prior research — like census records or previous surveys — we know that about 60% of eligible voters in this population actually vote. So we’ll set the probability of voting at \(\pi = 0.6\).
Today we’ll ask three questions about this scenario, and each one leads to a different probability distribution:
| Question | Distribution | Sample size |
|---|---|---|
| Did this one person vote? | Bernoulli | 1 person |
| Out of 10 people surveyed, how many voted? | Binomial | 10 people |
| Out of 100 people surveyed, roughly how many voted? | Normal (approximation) | 100 people |
These three distributions are connected — each is the same voting question at a different scale:
Same event, different scale: one person → a small sample → a large sample. We’ll build up to this step by step.
Connect sample spaces and probability to data:
A random variable assigns a number to each outcome of a random process
A probability distribution tells us how likely each value is
R uses a consistent naming pattern for working with probability distributions. Once you learn the prefixes, you can apply them to any distribution:
| Prefix | What it does | Question it answers | Example |
|---|---|---|---|
d |
density / probability at a point | “Probability of exactly k successes?” | dbinom(6, size = 10, prob = 0.6) |
p (Binomial) |
cumulative count probability | “Probability of at most k successes in n trials?” | pbinom(6, size = 10, prob = 0.6) |
p (Normal) |
cumulative area probability | “Proportion of values at or below x?” | pnorm(65, mean = 71, sd = 8) |
q |
quantile (inverse of p) |
“What value has this percentile?” | qnorm(0.95, mean = 60, sd = 4.9) |
The suffix tells you the distribution: binom for Binomial, norm for Normal.
We’ll use each of these as we go. Keep this pattern in mind!
The simplest probability distribution: one trial, two outcomes
In our survey, each person’s response is either 0 (didn’t vote) or 1 (voted). Since we know from prior research that 60% of this population votes:
In general, for any binary outcome with probability \(\pi\):
\[P(X=1)=\pi, \quad P(X=0)=1-\pi\]
This is the Bernoulli model. The parameter \(\pi\) is just the probability that the event occurs.

Just as we computed summary statistics from data in Weeks 3–4, probability models have their own summary measures. These tell us what to expect before we collect data:
The long-run average of this random variable is just the probability itself:
\[E[X] = \pi = 0.6\]
If we survey many people one at a time, the average of all their 0/1 responses will be close to 0.6 — the proportion who voted.
This is why estimating a proportion is estimating a mean.
How much do individual outcomes vary around that average?
\[\text{Var}(X) = \pi(1-\pi) = 0.6 \times 0.4 = 0.24\]
\[\text{SD}(X) = \sqrt{\pi(1-\pi)} = \sqrt{0.24} \approx 0.49\]
| From data (what we compute) |
From the model (what we assume) |
Voting example | |
|---|---|---|---|
| Center | Sample proportion: \(\hat{p}\) | \(E[X]=\pi\) | \(\hat{p}\) vs. \(\pi = 0.6\) |
| Spread | Sample variance: \(s^2\) | \(\pi(1-\pi)\) | \(s^2\) vs. \(0.6 \times 0.4 = 0.24\) |
We compute \(\hat{p}\) from our data (the fraction of respondents who said they voted). The model says the true probability is \(\pi = 0.6\).
The goal of statistical inference: use the sample statistic (\(\hat{p}\)) to learn about the model parameter (\(\pi\)).
Moving from one person to many: what happens when we add up Bernoulli outcomes?
Now suppose we survey 10 people. Each person independently voted with probability 0.6.
What’s the probability that exactly 6 of the 10 voted? Or 4? Or 8?
We’re adding up 10 independent Bernoulli outcomes. The total, \(Y\), follows a Binomial distribution:
\[Y \sim \text{Binomial}(n=10, \; \pi=0.6)\]
The chart shows the probability of every possible outcome from 0 to 10 voters. The tallest bar (k = 6) is the single most likely result.

The Binomial model requires four things to be true. Here’s what each means for our voting example:
In real social data, independence is the assumption most likely to fail — people in the same household, neighborhood, or social network influence each other.
For our sample of 10 people with \(\pi = 0.6\):
Expected number of voters:
\[E[Y]=n\pi = 10 \times 0.6 = 6 \text{ voters}\]
Standard deviation (how much the count varies from sample to sample):
\[\text{SD}(Y)=\sqrt{n\pi(1-\pi)} = \sqrt{10 \times 0.6 \times 0.4} \approx 1.55\]
We’d expect about 6 voters, give or take about 1.5. In most samples, we’d see somewhere between 4 and 8 voters.
Question: What’s the probability that exactly 6 of our 10 people voted?
We need two ingredients:
Putting it together:
\[P(Y=6) = \underbrace{210}_{\text{arrangements}} \times \underbrace{0.6^6 \times 0.4^4}_{\text{probability of each}} = 0.2508\]
In R, dbinom() does this calculation for us:
Often we want to know the probability of getting at most some number of successes. This is a cumulative probability: \(P(Y \le k)\).
Example: What’s the probability that at most 4 of our 10 people voted?
What about “at least” 7 voters? Use the complement: \(P(Y \ge k) = 1 - P(Y \le k-1)\)
Let’s some scenarios that are similar to HW questions.
Lisa makes free throws 75% of the time and shoots 8 free throws at practice. What is the probability she makes at least 7?
Same scenario: Lisa, n = 8, π = 0.75. What is the probability she makes between 4 and 6 free throws?
| From data (what we compute) |
From the model (what we assume) |
Voting example (\(n = 10\)) | |
|---|---|---|---|
| Center | Sample count of voters | \(E[Y] = n\pi\) | observed count vs. \(10 \times 0.6 = 6\) |
| Spread | Sample SD of counts | \(\text{SD}(Y) = \sqrt{n\pi(1-\pi)}\) | sample SD vs. \(\approx 1.55\) |
If we repeated our survey of 10 people many times, the average number of voters across all surveys would be close to 6, and the counts would typically range from about 4 to 8.
As sample size grows, the Binomial starts to look like a smooth bell curve
What if we surveyed 100 people instead of 10? As \(n\) grows, the Binomial distribution becomes smoother and starts to resemble a bell-shaped curve:
This is an early step toward central limit thinking: sums of many random components often look approximately Normal.
For \(n = 100\) voters with \(\pi = 0.6\), the Binomial has:
What does SD = 4.9 mean? In most surveys of 100 people, the number of voters will be within about 5 of 60 — typically between 55 and 65.
Instead of computing exact Binomial probabilities across 101 values, we approximate with a Normal distribution: \(N(60, 4.9)\).

For continuous models, probabilities come from areas under the curve.
This rule gives us a quick way to judge whether an observed result is “typical” or “unusual.”

| From data (what we compute) | From the model (what we assume) | Voting example (\(n = 100\)) | |
|---|---|---|---|
| Center | Sample count of voters | \(\mu = n\pi\) | observed count vs. \(100 \times 0.6 = 60\) |
| Spread | Sample SD | \(\sigma = \sqrt{n\pi(1-\pi)}\) | sample SD vs. \(\approx 4.9\) |
If we repeated our survey of 100 people many times, the counts would center around 60 and mostly fall between about 50 and 70 (within 2 SDs).
A z-score tells you how many standard deviations a value is from the mean:
\[z = \frac{\text{observed} - \text{mean}}{\text{SD}}\]
Example: We observed 70 voters out of 100. How unusual is that?
\[z = \frac{70 - 60}{4.9} \approx 2.04\]
That’s about 2 SDs above the mean. By the 68-95-99.7 rule, only about 2.5% of samples would show 70+ voters.

What’s the probability of observing between 55 and 65 voters?
What’s the probability of observing at most 50 voters?
What’s the probability of observing more than 70 voters?
These examples are similar to what you’ll see on HW #5. Let’s work through them together.
Andre’s height is 75 inches. Heights in his population are Normal with mean 69 inches and SD 3 inches. Calculate his z-score.
\[z = \frac{75 - 69}{3} = 2.0\]
Andre is 2 standard deviations above the mean. By the 68-95-99.7 rule, he is taller than about 97.5% of the population — only 2.5% of people are taller.

Vehicle speeds on I-5 are approximately Normal with mean 71 mph and SD 8 mph. The speed limit is 65 mph. What proportion of vehicles are at or below the speed limit?
SAT scores are approximately Normal with mean 1026 and SD 209. The NCAA requires a score above 820 to compete. What proportion of students qualify?
Using the same SAT distribution, what proportion score between 720 and 820 (partial qualifiers)?
| Statistical question | R function |
|---|---|
| Exact Binomial probability \(P(Y=k)\) | dbinom(k, size = n, prob = pi) |
| Cumulative Binomial probability \(P(Y\le k)\) | pbinom(k, size = n, prob = pi) |
| Normal cumulative probability \(P(Y\le x)\) | pnorm(x, mean = mu, sd = sigma) |
| Normal percentile / quantile | qnorm(q, mean = mu, sd = sigma) |
A Bernoulli model describes one yes/no outcome with parameter \(\pi\); summing many Bernoulli outcomes gives a Binomial model, and as trials grow large, that Binomial model is often well approximated by a Normal distribution.
This connects:
One trial → many trials → smooth approximation, all with \(\pi = 0.6\):
Same event, three levels of analysis. This is the logic that powers the rest of the course.
One trial (Bernoulli) → many trials (Binomial) → smooth large-sample pattern (Normal)
Today is not isolated content. It is core infrastructure for the rest of SOC 106.
Due: Thursday, March 5 by 11:59 PM
Format: Similar to last week’s weekly assignment: mostly a problem set
Important: You will use a little bit of R, so please submit your assignment (and your code!) on bCourses
A two-page double-spaced proposal for your final paper is due on bCourses by Thursday, February 26 at 11:59 PM. Here’s an example.
Your proposal should include:
Note: You do not need to discuss statistical techniques at this point.
Let’s practice some of what we learned today:
lab3.qmd from bCourse under “assignments” > “Lab #3”lab3.qmd in your labs folder.Explorer button on the left to find and open lab3.qmd