Week 2

Sociology 106: Quantitative Sociological Methods

January 27, 2026

Housekeeping

Office hours this week

Tuesdays, 11:30 AM-1:00 PM, 444 Social Science Buliding
Tuesdays, 4:00 PM-4:30 PM, 444 Social Science Buliding
- Sign up at https://www.wejoinin.com/kaseyzapatka@berkeley.edu
If you can’t make either, e-mail me and we’ll find a time

HW #1: due Thursday, February 5 by 11:59 PM

Learing Goals

Gain some familiarity and basic proficiencywith R and Positron
Understand core data concepts: variable types, logical operators, and common data structures
Understand what the Tidyverse is and how it differs from base R
Import, manipulate, and export data using fundamental R commands
Prepare and submit homework assignments using Quarto

Agenda

Introduction to Positron IDE
Working with Quarto HW #0: Setting up your folder structure and practicing with .qmd files
R basics: variables, data types, and logical operators
Data structures and file paths vectors and dataframes
The Tidyverse and key functions
In-class Lab: Practicing with the Tidyverse

What is Positron?

Positron is an Integrated Development Environment (IDE)—essentially a wrapper to make working with a particular software more user-friendly.

An IDE provides:

Run code with a simpler interface
Gives you a file browser
Debugging tools
Version control (Git)

About Positron:

Open-source (free to use)
Next-generation descendant of RStudio
Developed by Posit (creators of RStudio)
Designed for R and Python

Why Positron?

Bilingual: Work with R and Python in the same environment
Modern: Streamlined workflow for data science
Easy extensions: Add extensions for formatting, Git, Quarto, and more

Learn more: Andrew Heiss’ blog

Positron

Script pane: Left Top
Console pane: Left Bottom
Variables pane: Right top
Plot pane: Right bottom

Positron

Let’s explore Positron together on your laptop:

File explorer
Console
Interpreter and Folder
New
Open

Why Quarto?

Quarto (.qmd) files let you combine code, output, and written analysis in one document. Check out Quarto’s website for more!

Why I want you to use them for homework:

Show your work: Code and results appear together—I can see exactly what you did
Explain your thinking: Add text to interpret results and answer questions
Reproducible: I can re-run your code to verify it works
Professional skill: This is how data scientists share analysis in the real world

Anatomy of a `.qmd` File

A .qmd file has three main parts and knits code + text together (unlike .R which runs code only):

1. YAML Header (metadata)

---
title: "My Analysis"
author: "Your Name"
format: html
---

2. Markdown Text (narrative)

Use **bold**, *italics*, # Headings to organize your script and write analyses in between code.

More markdown basics

3. Code Chunks (analysis)

Insert R (or python) code directly in your analyses.

```{{r}}
data <- read_csv("data.csv")
summary(data)
```

Rendering Quarto Documents

What is Rendering?

Rendering (or “knitting”) converts your .qmd file into a finished document (HTML, PDF, or Word) by:

Running all your R code chunks
Combining code output with your text
Formatting everything into your chosen output format

How to Render

Click the “Preview” button at the top of your .qmd file
Render on save so you can see all changes
I usually work with it open

Rendering Quarto Documents

Source vs Visual Mode

Positron provides two ways to edit .qmd files. Toggle between them using the buttons at the top-left of your editor.

Source
Visual

Source, which is the source actual code:

Shows raw markdown syntax
See the actual code: **bold**, ## Heading
More control over formatting
Better for experienced users

Visual, which looks a lot like jupyter notebooks and allows you to edit code and text in a more interactive manner:

WYSIWYG editor (What You See Is What You Get)
Formatting toolbar (like Microsoft Word)
Live preview of how text will look
Easier for beginners

Rendering Quarto Documents

Which Should You Use?

Visual Mode: Great when starting out or writing lots of text
Source Mode: Better for precise control and learning markdown

Switch between them! Use Visual for writing, Source for troubleshooting. Just be sure to save when switching in between

Rendering Quarto Documents

Tips for making it work

Save your .qmd file before rendering
All code must run without errors to render successfully
Check the “Render” tab for error messages if rendering fails

Most common “knitting” errors:

Quarto can’t execute code
Code chunks have the same name

Should be fine if you don’t do anything fancy and don’t mess with quarto settings

Practicing Quarto with HW #0

Set up your folder:

Download hw0.qmd, attain.csv, and _quarto.yml from bCourse under “assignment” > “HW #0”
Create soc106 folder on your desktop
Create data and assignments folders inside
Place files as shown:

soc106/
├── _quarto.yml
├── data/
│   └── attain.csv
└── assignments/
    └── hw0.qmd

Open in Positron:

Click Open Folder → navigate to soc106
Trust the authors → click “Yes”
Use Explorer button on left to find hw0.qmd

Practicing Quarto with HW #0

Edit and render:

Uncomment rows 50 and 51 and click Run Cell
Ensure packages downloaded correctly
Replace “Karl Marx” with your name
Check Render on Save
Save and hit “Preview”
SUCCESS if you see your name!
Upload .qmd and .html to bCourses

Writing with `script.R`

Best practice to write all code in a script file

Reproducible code
Documents exactly what you have done
Allows you to easily correct mistakes

Two types of files we’ll work with

R files (.R): accept R code and are very similar to text files except that they read R code.
Quarto files (.qmd): allow to intersperse R/Python (other languages too!) with text. It’s great for putting text next to analysis.

Writing with `script.R`

Use a hashtag (#) at the start of lines in your script to tell R not to run that line

Useful for making comments about what your code is doing:
Helps you remember what you did
Helps others (like me, reading your homework) see what you were doing

Check out this example:

#| eval: false
#| echo: true
# load data
census_data <- read_csv("data/acs2021.csv")

Variable assignment in `script.R`

R is an object-oriented programming language, meaning you can create new objects from any type of value, store it R’s memory, and call it when necessary

Define variables using the assignment operator: <-
Give variables meaningful names when assigning
Variable names must be unique—reusing a name will overwrite it

#| eval: false
#| echo: true
# load data
acs <- read_csv("data/acs2021.csv")
acs <- read_csv("data/acs2022.csv")

Variable types

Type	Description	Examples
Numeric	Real numbers; supports math operations (`+`, `-`, `*`, `/`)	`3`, `4.5`, `8.98443`
Character	Text; always in quotation marks	`"Welcome!"`, `"foo"`
Boolean	Logical values; R stores as 1 (`TRUE`) or 0 (`FALSE`)	`TRUE`, `FALSE`

Numeric
Character
Boolean

a <- 10
b <- 3
a + b

[1] 13

a * b

[1] 30

first_name <- "John"
last_name <- "Doe"
paste(first_name, last_name)

[1] "John Doe"

# R treats TRUE as 1, FALSE as 0
is_enrolled <- TRUE
has_laptop <- FALSE
sum(is_enrolled, has_laptop)

[1] 1

Logical operators

Use logical operators to compare two variables or objects
Useful for manipulating data

Operators
Example

Operator	Meaning
`==`	Equal to
`!=`	Not equal to
`<`	Less than
`>`	Greater than
`<=`	Less than or equal to
`>=`	Greater than or equal to

x <- 10
y <- 5

x == y

[1] FALSE

x != y

[1] TRUE

x > y

[1] TRUE

x < y

[1] FALSE

x >= 10

[1] TRUE

Questions?

Data structures

Data structures are simply collections of the data objects we have been working with:

Numeric
Character
Boolean variables

While there are many types of data structures in R, we will focus on two for our purposes: the vector and the dataframe

Vectors

Vectors are ordered collections of the same type of objects

Ordered: position matters
Same type: e.g., can’t mix numeric and character

We create vectors using the c() operator:

# Numeric vector
ages <- c(22, 25, 19, 30)

# Character vector
names <- c("Alice", "Bob", "Charlie", "Diana")

The c() operator works on already-existing vectors as well as variables (if of same type)

Vectors

Use bracket notation to access individual elements: vector[i]

# create a vector of values
ages <- c(22, 25, 19, 30)

# Get the first element (indexing starts at 1)
ages[1]

[1] 22

# Get the third element
ages[3]

[1] 19

# Get multiple elements
ages[c(1, 3)]

[1] 22 19

Dataframes

A dataframe is a list of equal-length vectors. We’ll focus on tidy dataframes, which organize data so that:

Each column is a variable
Each row is an observation

# A tibble: 3 × 4
  Observation name      age enrolled
  <chr>       <chr>   <dbl> <lgl>   
1 Obs1        Alice      22 TRUE    
2 Obs2        Bob        25 TRUE    
3 Obs3        Charlie    19 FALSE

Inspecting dataframes

glimpse(): view data and summary info of dataframe
head(): looks at first few observations (similar to print())
str(): returns structure of dataframe
colnames(): returns column names

glimpse()
head()
str()
colnames()

glimpse(students)

Rows: 3
Columns: 4
$ Observation <chr> "Obs1", "Obs2", "Obs3"
$ name        <chr> "Alice", "Bob", "Charlie"
$ age         <dbl> 22, 25, 19
$ enrolled    <lgl> TRUE, TRUE, FALSE

head(students)

# A tibble: 3 × 4
  Observation name      age enrolled
  <chr>       <chr>   <dbl> <lgl>   
1 Obs1        Alice      22 TRUE    
2 Obs2        Bob        25 TRUE    
3 Obs3        Charlie    19 FALSE

str(students)

tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
 $ Observation: chr [1:3] "Obs1" "Obs2" "Obs3"
 $ name       : chr [1:3] "Alice" "Bob" "Charlie"
 $ age        : num [1:3] 22 25 19
 $ enrolled   : logi [1:3] TRUE TRUE FALSE

colnames(students)

[1] "Observation" "name"        "age"         "enrolled"

File paths and the `here` package

To load data, R needs to know where to look—but file paths break across different computers/operating systems.

The here package solves this by building paths relative to your project root (home folder of the project).

The Problem
The Solution

# These break on different computers:
read_csv("C:\Users\YourName\Documents\project\data\mydata.csv") # windows
read_csv("~/Desktop/project/data/mydata.csv")                   # mac

library(here)

# here() finds your project root and builds paths from there
here()  # check where here thinks your project root is

# Loading data
#data <- read_csv("~/Desktop/project/data/raw/mydata.csv")     
data <- read_csv(here("data", "raw", "mydata.csv"))

# Saving data
write_csv(results, here("output", "results.csv"))

The `Tidyverse`

What is it?

A collection of R packages designed for data science that share a common philosophy and grammar
By far the most common approach to working with R
Most intuitive way to do data science

Core Packages we’ll be using

readr: for importing data (read_csv())
dplyr: for manipulating data (filter(), select(), mutate(), summarize())
ggplot2: for data visualization

The `Tidyverse`

Key Features

Consistent syntax: functions work similarly across packages
Human-readable: code reads like sentences
Pipe operator |>:¹ chains functions together for readable code

# |eval=FALSE
# |echo=TRUE

# Base R
mean(subset(dataframe, age > 18)$score)

# Tidyverse - "take dataframe, filter where age > 18,
# then pull score, then calculate mean"
dataframe |> filter(age > 18) |> pull(score) |> mean()

The `Tidyverse`

How the packages work together

# |eval=FALSE
# |echo=TRUE

# Load data
data <- read_csv("data.csv") # read in csv

# Process
data |>
  filter(year == 2020) |>  # filter rows based on conditions
  select(name, score)  |>  # select specific columns
  arrange(desc(score)) |>  # order columns
  glimpse()                # glimpse

Key functions

Tidyverse Function	Purpose	Example
`read_csv()`	Import CSV data	`read_csv("data.csv")`
`glimpse()`	Preview dataframe structure	`df \|> glimpse()`
`colnames()`	Get column names	`colnames(df)`
`select()`	Choose columns	`df \|> select(name, age)`
`filter()`	Choose rows by condition	`df \|> filter(age > 20)`
`slice()`	Choose rows by position	`slice(df, 1:10)`
`arrange()`	Sort rows	`arrange(df, desc(age))`
`mutate()`	Create/modify columns	`df \|> mutate(age_2x = age * 2)`
`rename()`	Rename columns	`df \|> rename(new = old)`

Questions?

Tidyverse in Practice (in-class Lab)

Let’s practice using the folder structure you already set up:

Download lab1.qmd from bCourse under “assignments” > “Lab #1”
Create a labs folder inside your soc106 folder
Place lab1.qmd in the labs folder. Your folder structure should now look like this:

soc106/
├── _quarto.yml
├── data/
│   └── attain.csv
├── assignments/
│   └── hw0.qmd
└── labs/
    └── lab1.qmd

Use the Explorer button on the left to find and open lab1.qmd
Let’s work through it together!

Homework #1

For this week only, I will provide a dataset for you to use on your homework assignment

attain.csv: data from the General Social Survey
in bCourses, Assignments → HW# 1

Turn in your assignment on bCourses

HW #1: due Tuesday, February 10, 11:59 PM
Turn in BOTH hw1.qmd and hw1.html files
Write code and text response intermittenly in file

Appendix

Other data types

You can also load Excel files into R

First, load the readxl package:

library(readxl)

Then, use the read_excel function:

# load
data <- read_excel("file_name.xlsx")

In fact, R can import all types of data files

Googling “import SAS/STATA/SPSS files into R” will point you in the right direction

Some helpful R commands

?[function_name] brings up a help page
??[function_name] will (sometimes) lead to helpful examples of how to use the function
ls() lists all objects in the global environment
rm(object_name) removes an object
rm(list=ls()) removes all objects

Want to know more R?

Lecture and lab are based on the first two RFundamentals classes, which are run at the D-Lab

D-Lab trainings: https://dlab.berkeley.edu/training
- If you want to see the RFundamentals class notes: https://github.com/dlab-berkeley/R-Fundamentals
- R Data Visualization workshop on Feb 11, 2026
Otherwise: Google is your friend!

`.R` vs `.qmd` Files

So to summarize .R and .qmd scripts:

Feature	`.R` Script	`.qmd` Document
Purpose	Run code	Create reports
Content	Code only	Code + text + output
Output	Console results	Formatted document (e.g., HTML/PDF/Word)
Comments	`# comment`	Full markdown formatting
Best for	Data cleaning, functions	Analysis reports, presentations, hw

Rendering Quarto Documents

Output Formats

For this course, make sure html is specified in your YAML header:

---
title: "My Analysis"
format: html        # or pdf, docx
---

Can format many different formats (e.g., pdf, docx, odt, epubs, pptx)

Week 2

Housekeeping

Learing Goals

Agenda

What is Positron?

Why Positron?

Positron

Positron

Why Quarto?

Anatomy of a .qmd File

Rendering Quarto Documents

Rendering Quarto Documents

Rendering Quarto Documents

Rendering Quarto Documents

Practicing Quarto with HW #0

Practicing Quarto with HW #0

Writing with script.R

Writing with script.R

Variable assignment in script.R

Variable types

Logical operators

Questions?

Data structures

Vectors

Vectors

Dataframes

Inspecting dataframes

File paths and the here package

The Tidyverse

The Tidyverse

The Tidyverse

Key functions

Questions?

Tidyverse in Practice (in-class Lab)

Homework #1

Appendix

Other data types

Some helpful R commands

Want to know more R?

.R vs .qmd Files

Rendering Quarto Documents

Anatomy of a `.qmd` File

Writing with `script.R`

Writing with `script.R`

Variable assignment in `script.R`

File paths and the `here` package

The `Tidyverse`

The `Tidyverse`

The `Tidyverse`

`.R` vs `.qmd` Files