Week 2

Sociology 106: Quantitative Sociological Methods

January 27, 2026

Housekeeping

Office hours this week

HW #1: due Thursday, February 5 by 11:59 PM

Learing Goals

  1. Gain some familiarity and basic proficiencywith R and Positron
  2. Understand core data concepts: variable types, logical operators, and common data structures
  3. Understand what the Tidyverse is and how it differs from base R
  4. Import, manipulate, and export data using fundamental R commands
  5. Prepare and submit homework assignments using Quarto

Agenda

  1. Introduction to Positron IDE
  2. Working with Quarto HW #0: Setting up your folder structure and practicing with .qmd files
  3. R basics: variables, data types, and logical operators
  4. Data structures and file paths vectors and dataframes
  5. The Tidyverse and key functions
  6. In-class Lab: Practicing with the Tidyverse

What is Positron?

Positron is an Integrated Development Environment (IDE)—essentially a wrapper to make working with a particular software more user-friendly.

An IDE provides:

  • Run code with a simpler interface
  • Gives you a file browser
  • Debugging tools
  • Version control (Git)

About Positron:

  • Open-source (free to use)
  • Next-generation descendant of RStudio
  • Developed by Posit (creators of RStudio)
  • Designed for R and Python

Why Positron?

  • Bilingual: Work with R and Python in the same environment
  • Modern: Streamlined workflow for data science
  • Easy extensions: Add extensions for formatting, Git, Quarto, and more

Learn more: Andrew Heiss’ blog

Positron

  • Script pane: Left Top
  • Console pane: Left Bottom
  • Variables pane: Right top
  • Plot pane: Right bottom

Positron

Let’s explore Positron together on your laptop:

  • File explorer
  • Console
  • Interpreter and Folder
  • New
  • Open

Why Quarto?

Quarto (.qmd) files let you combine code, output, and written analysis in one document. Check out Quarto’s website for more!

Why I want you to use them for homework:

  • Show your work: Code and results appear together—I can see exactly what you did
  • Explain your thinking: Add text to interpret results and answer questions
  • Reproducible: I can re-run your code to verify it works
  • Professional skill: This is how data scientists share analysis in the real world

Anatomy of a .qmd File

A .qmd file has three main parts and knits code + text together (unlike .R which runs code only):

1. YAML Header (metadata)

---
title: "My Analysis"
author: "Your Name"
format: html
---

2. Markdown Text (narrative)

Use **bold**, *italics*, # Headings to organize your script and write analyses in between code.

More markdown basics

3. Code Chunks (analysis)

Insert R (or python) code directly in your analyses.

```{{r}}
data <- read_csv("data.csv")
summary(data)
```

Rendering Quarto Documents

What is Rendering?

Rendering (or “knitting”) converts your .qmd file into a finished document (HTML, PDF, or Word) by:

  1. Running all your R code chunks
  2. Combining code output with your text
  3. Formatting everything into your chosen output format

How to Render

  • Click the “Preview” button at the top of your .qmd file
  • Render on save so you can see all changes
  • I usually work with it open

Rendering Quarto Documents

Source vs Visual Mode

Positron provides two ways to edit .qmd files. Toggle between them using the buttons at the top-left of your editor.


Source, which is the source actual code:

  • Shows raw markdown syntax
  • See the actual code: **bold**, ## Heading
  • More control over formatting
  • Better for experienced users

Visual, which looks a lot like jupyter notebooks and allows you to edit code and text in a more interactive manner:

  • WYSIWYG editor (What You See Is What You Get)
  • Formatting toolbar (like Microsoft Word)
  • Live preview of how text will look
  • Easier for beginners

Rendering Quarto Documents

Which Should You Use?

  • Visual Mode: Great when starting out or writing lots of text
  • Source Mode: Better for precise control and learning markdown

Switch between them! Use Visual for writing, Source for troubleshooting. Just be sure to save when switching in between

Rendering Quarto Documents

Tips for making it work

  • Save your .qmd file before rendering
  • All code must run without errors to render successfully
  • Check the “Render” tab for error messages if rendering fails

Most common “knitting” errors:

  • Quarto can’t execute code
  • Code chunks have the same name

Should be fine if you don’t do anything fancy and don’t mess with quarto settings

Practicing Quarto with HW #0

Set up your folder:

  1. Download hw0.qmd, attain.csv, and _quarto.yml from bCourse under “assignment” > “HW #0”
  2. Create soc106 folder on your desktop
  3. Create data and assignments folders inside
  4. Place files as shown:
soc106/
├── _quarto.yml
├── data/
│   └── attain.csv
└── assignments/
    └── hw0.qmd

Open in Positron:

  1. Click Open Folder → navigate to soc106
  2. Trust the authors → click “Yes”
  3. Use Explorer button on left to find hw0.qmd

Practicing Quarto with HW #0

Edit and render:

  1. Uncomment rows 50 and 51 and click Run Cell
  2. Ensure packages downloaded correctly
  3. Replace “Karl Marx” with your name
  4. Check Render on Save
  5. Save and hit “Preview”
  6. SUCCESS if you see your name!
  7. Upload .qmd and .html to bCourses

Writing with script.R

Best practice to write all code in a script file

  • Reproducible code
  • Documents exactly what you have done
  • Allows you to easily correct mistakes

Two types of files we’ll work with

  • R files (.R): accept R code and are very similar to text files except that they read R code.
  • Quarto files (.qmd): allow to intersperse R/Python (other languages too!) with text. It’s great for putting text next to analysis.

Writing with script.R

Use a hashtag (#) at the start of lines in your script to tell R not to run that line

  • Useful for making comments about what your code is doing:
  • Helps you remember what you did
  • Helps others (like me, reading your homework) see what you were doing

Check out this example:

#| eval: false
#| echo: true
# load data
census_data <- read_csv("data/acs2021.csv")

Variable assignment in script.R

R is an object-oriented programming language, meaning you can create new objects from any type of value, store it R’s memory, and call it when necessary

  • Define variables using the assignment operator: <-
  • Give variables meaningful names when assigning
  • Variable names must be unique—reusing a name will overwrite it
#| eval: false
#| echo: true
# load data
acs <- read_csv("data/acs2021.csv")
acs <- read_csv("data/acs2022.csv")

Variable types

Type Description Examples
Numeric Real numbers; supports math operations (+, -, *, /) 3, 4.5, 8.98443
Character Text; always in quotation marks "Welcome!", "foo"
Boolean Logical values; R stores as 1 (TRUE) or 0 (FALSE) TRUE, FALSE
a <- 10
b <- 3
a + b
[1] 13
a * b
[1] 30
first_name <- "John"
last_name <- "Doe"
paste(first_name, last_name)
[1] "John Doe"
# R treats TRUE as 1, FALSE as 0
is_enrolled <- TRUE
has_laptop <- FALSE
sum(is_enrolled, has_laptop)
[1] 1

Logical operators

  • Use logical operators to compare two variables or objects
  • Useful for manipulating data
Operator Meaning
== Equal to
!= Not equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
x <- 10
y <- 5

x == y
[1] FALSE
x != y
[1] TRUE
x > y
[1] TRUE
x < y
[1] FALSE
x >= 10
[1] TRUE

Questions?

Data structures

Data structures are simply collections of the data objects we have been working with:

  • Numeric
  • Character
  • Boolean variables

While there are many types of data structures in R, we will focus on two for our purposes: the vector and the dataframe

Vectors

Vectors are ordered collections of the same type of objects

  • Ordered: position matters
  • Same type: e.g., can’t mix numeric and character

We create vectors using the c() operator:

# Numeric vector
ages <- c(22, 25, 19, 30)

# Character vector
names <- c("Alice", "Bob", "Charlie", "Diana")

The c() operator works on already-existing vectors as well as variables (if of same type)

Vectors

Use bracket notation to access individual elements: vector[i]

# create a vector of values
ages <- c(22, 25, 19, 30)

# Get the first element (indexing starts at 1)
ages[1]
[1] 22
# Get the third element
ages[3]
[1] 19
# Get multiple elements
ages[c(1, 3)]
[1] 22 19

Dataframes

A dataframe is a list of equal-length vectors. We’ll focus on tidy dataframes, which organize data so that:

  • Each column is a variable
  • Each row is an observation
# A tibble: 3 × 4
  Observation name      age enrolled
  <chr>       <chr>   <dbl> <lgl>   
1 Obs1        Alice      22 TRUE    
2 Obs2        Bob        25 TRUE    
3 Obs3        Charlie    19 FALSE   

Inspecting dataframes

  • glimpse(): view data and summary info of dataframe
  • head(): looks at first few observations (similar to print())
  • str(): returns structure of dataframe
  • colnames(): returns column names
glimpse(students)
Rows: 3
Columns: 4
$ Observation <chr> "Obs1", "Obs2", "Obs3"
$ name        <chr> "Alice", "Bob", "Charlie"
$ age         <dbl> 22, 25, 19
$ enrolled    <lgl> TRUE, TRUE, FALSE
head(students)
# A tibble: 3 × 4
  Observation name      age enrolled
  <chr>       <chr>   <dbl> <lgl>   
1 Obs1        Alice      22 TRUE    
2 Obs2        Bob        25 TRUE    
3 Obs3        Charlie    19 FALSE   
str(students)
tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
 $ Observation: chr [1:3] "Obs1" "Obs2" "Obs3"
 $ name       : chr [1:3] "Alice" "Bob" "Charlie"
 $ age        : num [1:3] 22 25 19
 $ enrolled   : logi [1:3] TRUE TRUE FALSE
colnames(students)
[1] "Observation" "name"        "age"         "enrolled"   

File paths and the here package

To load data, R needs to know where to look—but file paths break across different computers/operating systems.

The here package solves this by building paths relative to your project root (home folder of the project).

# These break on different computers:
read_csv("C:\Users\YourName\Documents\project\data\mydata.csv") # windows
read_csv("~/Desktop/project/data/mydata.csv")                   # mac
library(here)

# here() finds your project root and builds paths from there
here()  # check where here thinks your project root is

# Loading data
#data <- read_csv("~/Desktop/project/data/raw/mydata.csv")     
data <- read_csv(here("data", "raw", "mydata.csv"))

# Saving data
write_csv(results, here("output", "results.csv"))

The Tidyverse

What is it?

  • A collection of R packages designed for data science that share a common philosophy and grammar
  • By far the most common approach to working with R
  • Most intuitive way to do data science

Core Packages we’ll be using

  • readr: for importing data (read_csv())
  • dplyr: for manipulating data (filter(), select(), mutate(), summarize())
  • ggplot2: for data visualization

The Tidyverse

Key Features

  • Consistent syntax: functions work similarly across packages
  • Human-readable: code reads like sentences
  • Pipe operator |>:1 chains functions together for readable code
# |eval=FALSE
# |echo=TRUE

# Base R
mean(subset(dataframe, age > 18)$score)

# Tidyverse - "take dataframe, filter where age > 18,
# then pull score, then calculate mean"
dataframe |> filter(age > 18) |> pull(score) |> mean()

The Tidyverse

How the packages work together

# |eval=FALSE
# |echo=TRUE

# Load data
data <- read_csv("data.csv") # read in csv

# Process
data |>
  filter(year == 2020) |>  # filter rows based on conditions
  select(name, score)  |>  # select specific columns
  arrange(desc(score)) |>  # order columns
  glimpse()                # glimpse

Key functions

Tidyverse Function Purpose Example
read_csv() Import CSV data read_csv("data.csv")
glimpse() Preview dataframe structure df |> glimpse()
colnames() Get column names colnames(df)
select() Choose columns df |> select(name, age)
filter() Choose rows by condition df |> filter(age > 20)
slice() Choose rows by position slice(df, 1:10)
arrange() Sort rows arrange(df, desc(age))
mutate() Create/modify columns df |> mutate(age_2x = age * 2)
rename() Rename columns df |> rename(new = old)

Questions?

Tidyverse in Practice (in-class Lab)

Let’s practice using the folder structure you already set up:

  1. Download lab1.qmd from bCourse under “assignments” > “Lab #1”
  2. Create a labs folder inside your soc106 folder
  3. Place lab1.qmd in the labs folder. Your folder structure should now look like this:
soc106/
├── _quarto.yml
├── data/
│   └── attain.csv
├── assignments/
│   └── hw0.qmd
└── labs/
    └── lab1.qmd
  1. Use the Explorer button on the left to find and open lab1.qmd
  2. Let’s work through it together!

Homework #1

For this week only, I will provide a dataset for you to use on your homework assignment

  • attain.csv: data from the General Social Survey
  • in bCourses, Assignments → HW# 1

Turn in your assignment on bCourses

Appendix

Other data types

You can also load Excel files into R

First, load the readxl package:

library(readxl)

Then, use the read_excel function:

# load
data <- read_excel("file_name.xlsx")

In fact, R can import all types of data files

Googling “import SAS/STATA/SPSS files into R” will point you in the right direction

Some helpful R commands

  • ?[function_name] brings up a help page
  • ??[function_name] will (sometimes) lead to helpful examples of how to use the function
  • ls() lists all objects in the global environment
  • rm(object_name) removes an object
  • rm(list=ls()) removes all objects

Want to know more R?

Lecture and lab are based on the first two RFundamentals classes, which are run at the D-Lab

.R vs .qmd Files

So to summarize .R and .qmd scripts:

Feature .R Script .qmd Document
Purpose Run code Create reports
Content Code only Code + text + output
Output Console results Formatted document (e.g., HTML/PDF/Word)
Comments # comment Full markdown formatting
Best for Data cleaning, functions Analysis reports, presentations, hw

Rendering Quarto Documents

Output Formats

For this course, make sure html is specified in your YAML header:

---
title: "My Analysis"
format: html        # or pdf, docx
---

Can format many different formats (e.g., pdf, docx, odt, epubs, pptx)