── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
here() starts at /Users/Dora/git/teaching/soc106
Weekly Assignment Example
Instructions
Using a dataset of your choice, choose a research question and construct a plot that helps answer it. Make sure to answer the questions below and that your document renders without errors before submitting.
Research question
What is your research question that you want to answer with this plot? (Remember a research question should end with a “?”)
Is Berkeley Sociology’s graduate program more appealing to men than women?
Prediction and brief justification
State what you expect to find and why. The justification can be based on theory or previous research. Even better (where possible) is to set up competing predictions based on competing theoretical premises: e.g., “If Marx is right, we should find P; but if Weber is right, then we should find Q.
Although the Berkeley Sociology Department is committed to gender balance and despite its reputation as a center for gender research, there seems to be a recent trend toward disproportionately male graduate student cohorts. Based on informal interviews with faculty, I believe that this is not simply a fluke and that Berkeley now disproportionately attracts male students. I therefore predict that men who are admitted to Berkeley will be more likely to accept an offer than women who are admitted.
Data
Describe the data in one or two sentences. Then state in one or two sentences why you’re using this data instead of different data.
I use the count of women and men in the 2010 Berkeley incoming sociology cohort to test this hypothesis. These data come from a list I received from the department in April 2010. Using data from multiple years would allow more accurate estimates but they would apply to a longer period. I use only the most recent year of data in order to assess the current state of the program.
Analytic strategy
Describe the exact procedure you are going to use to analyze the data, including the names of distributions and specific R commands. This can be a short as two or three sentences and no longer than 1/3 of a page in any case.
If gender is irrelevant to accepting an offer from Berkeley and equal numbers of both genders are admitted, then a reasonable model of cohort composition is one that follows a binomial distribution with Pr(male) = Pr(female) = .5. I use Stata’s “bitest” function to estimate how probable it is that the observed data were generated by this model. If the model does not fit the data, this would provide evidence that gender does, in fact, play a role in whether or not an admitted student accepts an offer.
Code
Build your plot here. If you need to add more code chunks, feel free to add more, but you can probably do it all in one code chunk.
New names:
Rows: 2992 Columns: 44
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(24): wrkstat, marital, degree, sex, race, partfull, region, xnorcsiz, s... dbl
(20): ...1, id, hrs1, prestg80, agewed, papres80, mapres80, sibs, childs...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# data
# ----------
students_summary <- data.frame(
gender = c("male", "female"),
count = c(8, 3)
)
# plot
# ----------
students_summary |>
ggplot(aes(x = gender, y = count)) +
geom_col(fill = "#0d467cff", width = 0.6) +
scale_y_continuous(breaks = 0:9, limits = c(0, 9)) +
labs(
x = NULL,
y = "Number of students",
caption = "Percentage of male students = 72.2%"
) +
theme_bw() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.caption = element_text(size = 14, hjust = 0.5, margin = margin(t = 10))
)Results
Present the results of the analysis here. This is not the place for extended commentary, just the basic results of your analysis. Should only be a few sentences.
In 2010, 15 out of 21 (or 71%) of the incoming students are men, which is a rather large gender imbalance. Based on the model above, the probability that this occurred by chance is .039, which is the probability of observing 15 or more male students given 21 independent binomial trials with Pr(male) = .5. Using the conventional benchmark for statistical significance (p < .05) we can reject the model that assumes that gender is irrelevant to the process of accepting an offer from Berkeley.
Discussion of results
Here’s where you talk about how your predictions went and what the findings mean about the answer to your research question. One to three sentences will do.
My initial predictions were supported—there are more men than women in the 2010 cohort and this difference does not appear to be the result of chance alone. This suggests that Berkeley sociology may indeed appeal more to men than to women and that any model of the processes that lead to accepting an offer must include gender as a factor.
Limitations and alternative explanations
This is where you talk about what you might have done differently or better or what other people should do next time. All research is flawed in dozens of ways, so for this assignment please talk about one limitation of your data, one limitation of your analysis, and one possible alternative explanation for your findings. Three to five sentences is fine.
The conclusions we can draw from the data are limited because they come from a single year. Perhaps gender matters this year for some reason but this is not a general trend. The analysis may also be flawed if the assumption that men and women are equally likely to be admitted is inaccurate. Other factors associated with gender might also be the real culprits rather than gender differences in the appeal of the program. Perhaps Berkeley is equally appealing to both genders but women are more likely than men to receive multiple offers from top schools. This would make it less likely that women would accept an offer from any program, not just Berkeley.