Sociology 106: Quantitative Sociological Methods
February 3, 2026
Weekly assignments
HW #1: due Tuesday, February 10, 11:59 PM – Extended deadline - any questions?
HW #2: due Thursday, February 12, 11:59 PM
Let’s jump back in to Lab #1:
Source or Visual to interact with the scriptWe want data that will help us say something about sociological theories / social processes
This encompasses a wide variety of data!
Not sure if your data source/topic is appropriate?
Feel free to ask!
Online repositories contain multiple datasets, and you can generally search for datasets related to your topic of interest
Features:
| Repository | Focus | URL |
|---|---|---|
| SDA Berkeley | Opinion data (GSS, ANES, racial attitudes) | sda.berkeley.edu |
| ICPSR | Comprehensive social science | icpsr.umich.edu |
| ARDA | Religion, health, public opinion | thearda.com |
| IPUMS | Census, CPS, health, time use | ipums.org |
| Roper Center | Public opinion polls | ropercenter.cornell.edu |
| CEIC | Economic data | ceicdata.com |
| Google Dataset Search | General search tool | datasetsearch.research.google.com |
| Level | Focus | URL |
|---|---|---|
| Federal | Health, education, climate, agriculture | data.gov |
| California | State demographics, economy, environment | data.ca.gov |
| San Francisco | City services, transportation, housing | datasf.org |
| Berkeley | Local permits, public works, demographics | data.cityofberkeley.info |
Most important first step in analyzing data is to take a close look at the data itself
An easy way to take a close look at the data is to visualize it with a figure:
Four common levels of measurement we use in data analysis.
| Level | Key Property | Examples | Type |
|---|---|---|---|
| Ratio | True zero point | Age, income, years of education | Continuous |
| Interval | Equal distances, no true zero | IQ, SAT scores, SES | Continuous |
| Ordinal | Ordered categories | Letter grades, liberal-conservative | Categorical |
| Nominal | Unordered categories | Gender, religion, political party | Categorical |
Why this matters: How we visualize a variable depends on its level of measurement
ggplot plotggplot follows the grammar of graphics. Every plot builds up from three components: data, mapping, and layers. Let’s look at an example:
Add aes() to connect columns to visual properties (x, y, color). Now we have axes.
Add a geom_point() to display the data as points. Use + to add layers.
Which data visualization should you use with which combinations of variables?
| # of Variables | Variable Type | Visualization |
|---|---|---|
| 1 | Categorical | Bar chart |
| 1 | Continuous | Histogram, Density plot, Box plot |
| 2 | Continuous by Continuous | Line chart (over time) |
| 2 | Continuous by Continuous | Scatterplot |
| 2 | Continuous by Categorical | Boxplot |
For categorical (nominal or ordinal) variables, we can visualize their distribution in a dataset using a bar chart.
geom_bar() functionFor continuous (interval or ratio) variables, we can visualize their distribution in a dataset using a histogram.
geom_histogram() functionR will automatically ‘bin’ a continuous variable into groups and then plot their frequency.
It defaults to 30.
You can override geom_histogram()’s default by specifying the number of bins(). Here we set it to 100.
Alternatively, you can make the bins wider to include more observations. Here, we set the binwidth = 10, which makes the bins span 10 units.
A line chart is similar to a bar graph but the tops of the bars are represented by points joined by lines.
Y variable over your X variablegeom_line() functionHere’s a look at unemployment rates over time. Note that if we have data where we have average values of Y for ordered values of X, we can also use line charts.
A scatterplot is a two-dimensional rectangular plot for visualizing the relationship between two continuous variables.
geom_point() functionThe boxplot compactly displays the distribution of a continuous variable, usually by a categorical one. It visualises five summary statistics (the median, 25th and 75th percentile form the box), and all “outlying” points individually.
Most useful to see the entire distribution of a variable, by another categorical one
geom_boxplot() functionYou’ll be provided a hw#.qmd each week that you’ll use as a template to complete your homework assignment. It will be available in bCourses under the “Assignments” folder for that week
Some weeks you’ll just answer specific questions, but in weeks that require you to use your own data, you’ll approach the assignment like a research paper, which you can use to build into the final paper.
Let’s look at an example to better understand what I’m asking.
Assignment:
Using a dataset of your choice, choose a research question and construct a plot that helps answer it.
HW #2: due Thursday, February 12, 11:59 PM
A two-page double-spaced proposal for your final paper is due on bCourses by Thursday, February 26 at 11:59 PM. Here’s an example.
Your proposal should include:
Note: You do not need to discuss statistical techniques at this point.