Take a sad plot, and make it better

Suggested answers

Application exercise
Answers
Modified

February 13, 2024

Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

library(tidyverse)
library(scales)
library(ggthemes)

Take a sad plot, and make it better

The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.

Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.

staff <- read_csv("data/instructional-staff.csv")
staff
# A tibble: 5 × 12
  faculty_type    `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Full-Time Tenu…   29     27.6   25     24.8   21.8   20.3   19.3   17.8   17.2
2 Full-Time Tenu…   16.1   11.4   10.2    9.6    8.9    9.2    8.8    8.2    8  
3 Full-Time Non-…   10.3   14.1   13.6   13.6   15.2   15.5   15     14.8   14.9
4 Part-Time Facu…   24     30.4   33.1   33.2   35.5   36     37     39.3   40.5
5 Graduate Stude…   20.5   16.5   18.1   18.8   18.7   19     20     19.9   19.5
# ℹ 2 more variables: `2009` <dbl>, `2011` <dbl>

Recreate the visualization

In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the wide format to long format.

Your turn: Reshape the data so we have one row per faculty type and year, and the percentage of hires as a single column.

staff_long <- staff |>
  pivot_longer(
    cols = -faculty_type,
    names_to = "year",
    values_to = "percentage"
  )
staff_long
# A tibble: 55 × 3
   faculty_type              year  percentage
   <chr>                     <chr>      <dbl>
 1 Full-Time Tenured Faculty 1975        29  
 2 Full-Time Tenured Faculty 1989        27.6
 3 Full-Time Tenured Faculty 1993        25  
 4 Full-Time Tenured Faculty 1995        24.8
 5 Full-Time Tenured Faculty 1999        21.8
 6 Full-Time Tenured Faculty 2001        20.3
 7 Full-Time Tenured Faculty 2003        19.3
 8 Full-Time Tenured Faculty 2005        17.8
 9 Full-Time Tenured Faculty 2007        17.2
10 Full-Time Tenured Faculty 2009        16.8
# ℹ 45 more rows

Your turn: Attempt to recreate the original bar chart as best as you can. Don’t worry about theming or color palettes right now. The most important aspects to incorporate:

  • Faculty type on the \(y\)-axis with bar segments color-coded based on the year of the survey
  • Percentage of instructional staff employees on the \(x\)-axis
  • Begin the \(x\)-axis at 5%
  • Label the \(x\)-axis at 5% increments
  • Match the order of the legend
Tip

forcats contains many functions for defining and adjusting the order of levels for factor variables. Factors are often used to enforce specific ordering of categorical variables in charts.

staff_long |>
  # convert faculty_type to factor to ensure correct order
  mutate(faculty_type = fct_relevel(
    .f = faculty_type,
    "Full-Time Tenured Faculty",
    "Full-Time Tenure-Track Faculty",
    "Full-Time Non-Tenure-Track Faculty",
    "Part-Time Faculty",
    "Graduate Student Employees"
  )) |>
  ggplot(mapping = aes(x = percentage, y = faculty_type, fill = year)) +
  # position dodge to separate the bars
  geom_col(position = "dodge", color = "white") +
  # generate a sequence of breaks from 5 to 45
  scale_x_continuous(
    breaks = seq(from = 5, to = 45, by = 5),
    labels = label_percent(scale = 1)
  ) +
  # reverse the legend values
  guides(fill = guide_legend(reverse = TRUE)) +
  # no labels on the chart
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  # crop the chart to begin at an origin of 5
  coord_cartesian(xlim = c(5, 45), expand = FALSE) +
  # attempt to match the visual design
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank()
  )

Let’s make it better

The original plot is not very informative. It’s hard to compare the trends for across each faculty type.

Your turn: Improve the chart by using a relative frequency bar chart with year on the \(y\)-axis and faculty type encoded using color.

staff_long |>
  mutate(faculty_type = fct_relevel(
    .f = faculty_type, "Full-Time Tenured Faculty",
    "Full-Time Tenure-Track Faculty",
    "Full-Time Non-Tenure-Track Faculty",
    "Part-Time Faculty",
    "Graduate Student Employees"
  )) |>
  ggplot(mapping = aes(x = percentage, y = year, fill = faculty_type)) +
  geom_col(position = "fill") +
  scale_x_continuous(labels = label_percent()) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  theme_minimal()

What are this chart’s advantages and disadvantages? Add response here

This distorts the intervals for the year variable. It makes it appear as if the survey was conducted at regular intervals, which is not the case.

Now we want a line chart

Your turn: Let’s instead use a line chart. Graph the data with year on the \(x\)-axis and percentage of employees on the \(y\)-axis. Distinguish each faculty type using an appropriate aesthetic mapping.

staff_long |>
  ggplot(mapping = aes(
    x = year, y = percentage,
    group = faculty_type,
    color = faculty_type
  )) +
  geom_line() +
  theme_minimal()

Ooops, it still is equal intervals because we never ensured year was converted to a numeric variable after pivoting it. Let’s fix that.

staff_long <- staff |>
  pivot_longer(
    cols = -faculty_type,
    names_to = "year",
    values_to = "percentage",
    names_transform = parse_number
  )

staff_long |>
  ggplot(mapping = aes(
    x = year, y = percentage,
    color = faculty_type
  )) +
  geom_line() +
  theme_minimal()

Your turn: Now we want to clean it up.

  • Add a proper title and labelling to the chart
  • Use an optimized color palette1
  • Order the legend values by the final value of the percentage variable
staff_long |>
  mutate(
    faculty_type = fct_reorder(
      .f = faculty_type,
      .x = percentage,
      .fun = last,
      .desc = TRUE
    )
  ) |>
  ggplot(
    mapping = aes(
      x = year, y = percentage,
      color = faculty_type
    )
  ) +
  geom_line() +
  scale_y_continuous(labels = label_percent(scale = 1)) +
  scale_color_viridis_d(end = 0.9) +
  labs(
    title = "Instructional staff employment trends",
    x = NULL, y = NULL, color = NULL,
    caption = "Source: AAUP"
  ) +
  theme_minimal()

Goal: even more improvement!

Colleges and universities have come to rely more heavily on non-tenure track faculty members over time, in particular part-time faculty (e.g. contingent faculty, adjuncts). We want to show how academia is increasingly relying on part-time faculty.

Your turn: With your peers, sketch/design a chart that highlights the trend for part-time faculty. What type of geom would you use? What elements would you include? What would you remove?

Here’s my attempt.

Your turn: Create the chart you designed above using ggplot2. Post your completed chart to this discussion thread.

Tip

When you render the document, your plot images are automatically saved as PNG files in the ae-05-sad-plot_files/figure-html directory. You can use these images to post your chart to the discussion thread, or use the ggsave() function to directly save your plot as an image file. For example,

ggsave(
  filename = "images/part-time-faculty.png",
  plot = last_plot(),
  width = 8, height = 6, bg = "white"
)

saves the last generated plot to a file named part-time-faculty.png in the images directory. It has a defined height and width (in “inches”) with a white background.

staff_long |>
  mutate(
    part_time = if_else(faculty_type == "Part-Time Faculty",
      "Part-Time Faculty", "Other Faculty"
    )
  ) |>
  ggplot(
    mapping = aes(
      x = year,
      y = percentage,
      group = faculty_type,
      color = part_time
    )
  ) +
  geom_line() +
  scale_color_manual(
    values = c("gray", "red"),
    guide = guide_legend(reverse = TRUE)
  ) +
  scale_y_continuous(labels = label_percent(scale = 1, accuracy = 1)) +
  theme_minimal() +
  labs(
    title = "Academia is increasingly relying on part-time faculty",
    subtitle = "As a percentage of all instructional staff employees",
    x = NULL, y = NULL, color = NULL,
    caption = "Source: AAUP"
  ) +
  theme(legend.position = "bottom")

ggsave(
  filename = "images/part-time-faculty.png",
  plot = last_plot(),
  width = 8, height = 6, bg = "white"
)
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Ventura 13.5.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-02-17
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 bit           4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
 bit64         4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
 cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
 colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
 digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.1)
 fansi         1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
 farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 ggthemes    * 5.0.0   2023-11-21 [1] CRAN (R 4.3.1)
 glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
 gtable        0.3.4   2023-08-21 [1] CRAN (R 4.3.0)
 here          1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
 knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
 labeling      0.4.3   2023-08-29 [1] CRAN (R 4.3.0)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
 lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.1)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 ragg          1.2.7   2023-12-11 [1] CRAN (R 4.3.1)
 readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.3.1)
 rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
 rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
 rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.3.1)
 rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
 scales      * 1.2.1   2024-01-18 [1] Github (r-lib/scales@c8eb772)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 stringi       1.8.3   2023-12-11 [1] CRAN (R 4.3.1)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 systemfonts   1.0.5   2023-10-09 [1] CRAN (R 4.3.1)
 textshaping   0.3.7   2023-10-09 [1] CRAN (R 4.3.1)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
 timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
 tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
 utf8          1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
 viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.3.0)
 vroom         1.6.5   2023-12-05 [1] CRAN (R 4.3.1)
 withr         2.5.2   2023-10-30 [1] CRAN (R 4.3.1)
 xfun          0.41    2023-11-01 [1] CRAN (R 4.3.1)
 yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. viridis is often a good choice, but you can find others.↩︎