library(tidyverse)
library(scales)
library(ggthemes)
Take a sad plot, and make it better
Suggested answers
These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.
Take a sad plot, and make it better
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.
Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.
<- read_csv("data/instructional-staff.csv")
staff staff
# A tibble: 5 × 12
faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Full-Time Tenu… 29 27.6 25 24.8 21.8 20.3 19.3 17.8 17.2
2 Full-Time Tenu… 16.1 11.4 10.2 9.6 8.9 9.2 8.8 8.2 8
3 Full-Time Non-… 10.3 14.1 13.6 13.6 15.2 15.5 15 14.8 14.9
4 Part-Time Facu… 24 30.4 33.1 33.2 35.5 36 37 39.3 40.5
5 Graduate Stude… 20.5 16.5 18.1 18.8 18.7 19 20 19.9 19.5
# ℹ 2 more variables: `2009` <dbl>, `2011` <dbl>
Recreate the visualization
In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the wide format to long format.
Your turn: Reshape the data so we have one row per faculty type and year, and the percentage of hires as a single column.
<- staff |>
staff_long pivot_longer(
cols = -faculty_type,
names_to = "year",
values_to = "percentage"
) staff_long
# A tibble: 55 × 3
faculty_type year percentage
<chr> <chr> <dbl>
1 Full-Time Tenured Faculty 1975 29
2 Full-Time Tenured Faculty 1989 27.6
3 Full-Time Tenured Faculty 1993 25
4 Full-Time Tenured Faculty 1995 24.8
5 Full-Time Tenured Faculty 1999 21.8
6 Full-Time Tenured Faculty 2001 20.3
7 Full-Time Tenured Faculty 2003 19.3
8 Full-Time Tenured Faculty 2005 17.8
9 Full-Time Tenured Faculty 2007 17.2
10 Full-Time Tenured Faculty 2009 16.8
# ℹ 45 more rows
Your turn: Attempt to recreate the original bar chart as best as you can. Don’t worry about theming or color palettes right now. The most important aspects to incorporate:
- Faculty type on the \(y\)-axis with bar segments color-coded based on the year of the survey
- Percentage of instructional staff employees on the \(x\)-axis
- Begin the \(x\)-axis at 5%
- Label the \(x\)-axis at 5% increments
- Match the order of the legend
forcats contains many functions for defining and adjusting the order of levels for factor variables. Factors are often used to enforce specific ordering of categorical variables in charts.
|>
staff_long # convert faculty_type to factor to ensure correct order
mutate(faculty_type = fct_relevel(
.f = faculty_type,
"Full-Time Tenured Faculty",
"Full-Time Tenure-Track Faculty",
"Full-Time Non-Tenure-Track Faculty",
"Part-Time Faculty",
"Graduate Student Employees"
|>
)) ggplot(mapping = aes(x = percentage, y = faculty_type, fill = year)) +
# position dodge to separate the bars
geom_col(position = "dodge", color = "white") +
# generate a sequence of breaks from 5 to 45
scale_x_continuous(
breaks = seq(from = 5, to = 45, by = 5),
labels = label_percent(scale = 1)
+
) # reverse the legend values
guides(fill = guide_legend(reverse = TRUE)) +
# no labels on the chart
labs(
x = NULL,
y = NULL,
fill = NULL
+
) # crop the chart to begin at an origin of 5
coord_cartesian(xlim = c(5, 45), expand = FALSE) +
# attempt to match the visual design
theme_minimal() +
theme(
panel.grid.minor = element_blank()
)
Let’s make it better
The original plot is not very informative. It’s hard to compare the trends for across each faculty type.
Your turn: Improve the chart by using a relative frequency bar chart with year on the \(y\)-axis and faculty type encoded using color.
|>
staff_long mutate(faculty_type = fct_relevel(
.f = faculty_type, "Full-Time Tenured Faculty",
"Full-Time Tenure-Track Faculty",
"Full-Time Non-Tenure-Track Faculty",
"Part-Time Faculty",
"Graduate Student Employees"
|>
)) ggplot(mapping = aes(x = percentage, y = year, fill = faculty_type)) +
geom_col(position = "fill") +
scale_x_continuous(labels = label_percent()) +
labs(
x = NULL,
y = NULL,
fill = NULL
+
) theme_minimal()
What are this chart’s advantages and disadvantages? Add response here
This distorts the intervals for the year
variable. It makes it appear as if the survey was conducted at regular intervals, which is not the case.
Now we want a line chart
Your turn: Let’s instead use a line chart. Graph the data with year on the \(x\)-axis and percentage of employees on the \(y\)-axis. Distinguish each faculty type using an appropriate aesthetic mapping.
|>
staff_long ggplot(mapping = aes(
x = year, y = percentage,
group = faculty_type,
color = faculty_type
+
)) geom_line() +
theme_minimal()
Ooops, it still is equal intervals because we never ensured year
was converted to a numeric variable after pivoting it. Let’s fix that.
<- staff |>
staff_long pivot_longer(
cols = -faculty_type,
names_to = "year",
values_to = "percentage",
names_transform = parse_number
)
|>
staff_long ggplot(mapping = aes(
x = year, y = percentage,
color = faculty_type
+
)) geom_line() +
theme_minimal()
Your turn: Now we want to clean it up.
- Add a proper title and labelling to the chart
- Use an optimized color palette1
- Order the legend values by the final value of the
percentage
variable
|>
staff_long mutate(
faculty_type = fct_reorder(
.f = faculty_type,
.x = percentage,
.fun = last,
.desc = TRUE
)|>
) ggplot(
mapping = aes(
x = year, y = percentage,
color = faculty_type
)+
) geom_line() +
scale_y_continuous(labels = label_percent(scale = 1)) +
scale_color_viridis_d(end = 0.9) +
labs(
title = "Instructional staff employment trends",
x = NULL, y = NULL, color = NULL,
caption = "Source: AAUP"
+
) theme_minimal()
Goal: even more improvement!
Colleges and universities have come to rely more heavily on non-tenure track faculty members over time, in particular part-time faculty (e.g. contingent faculty, adjuncts). We want to show how academia is increasingly relying on part-time faculty.
Your turn: With your peers, sketch/design a chart that highlights the trend for part-time faculty. What type of geom would you use? What elements would you include? What would you remove?
Here’s my attempt.
Your turn: Create the chart you designed above using ggplot2. Post your completed chart to this discussion thread.
When you render the document, your plot images are automatically saved as PNG files in the ae-05-sad-plot_files/figure-html
directory. You can use these images to post your chart to the discussion thread, or use the ggsave()
function to directly save your plot as an image file. For example,
ggsave(
filename = "images/part-time-faculty.png",
plot = last_plot(),
width = 8, height = 6, bg = "white"
)
saves the last generated plot to a file named part-time-faculty.png
in the images
directory. It has a defined height and width (in “inches”) with a white background.
|>
staff_long mutate(
part_time = if_else(faculty_type == "Part-Time Faculty",
"Part-Time Faculty", "Other Faculty"
)|>
) ggplot(
mapping = aes(
x = year,
y = percentage,
group = faculty_type,
color = part_time
)+
) geom_line() +
scale_color_manual(
values = c("gray", "red"),
guide = guide_legend(reverse = TRUE)
+
) scale_y_continuous(labels = label_percent(scale = 1, accuracy = 1)) +
theme_minimal() +
labs(
title = "Academia is increasingly relying on part-time faculty",
subtitle = "As a percentage of all instructional staff employees",
x = NULL, y = NULL, color = NULL,
caption = "Source: AAUP"
+
) theme(legend.position = "bottom")
ggsave(
filename = "images/part-time-faculty.png",
plot = last_plot(),
width = 8, height = 6, bg = "white"
)
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Ventura 13.5.2
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-02-17
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
ggthemes * 5.0.0 2023-11-21 [1] CRAN (R 4.3.1)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.1)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
ragg 1.2.7 2023-12-11 [1] CRAN (R 4.3.1)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
scales * 1.2.1 2024-01-18 [1] Github (r-lib/scales@c8eb772)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
systemfonts 1.0.5 2023-10-09 [1] CRAN (R 4.3.1)
textshaping 0.3.7 2023-10-09 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1)
withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1)
xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
viridis is often a good choice, but you can find others.↩︎