Take a sad plot, and make it better

Application exercise
Modified

February 13, 2024

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

Take a sad plot, and make it better

The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.

Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.

staff <- read_csv("data/instructional-staff.csv")
Rows: 5 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): faculty_type
dbl (11): 1975, 1989, 1993, 1995, 1999, 2001, 2003, 2005, 2007, 2009, 2011

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
staff
# A tibble: 5 × 12
  faculty_type    `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007`
  <chr>            <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Full-Time Tenu…   29     27.6   25     24.8   21.8   20.3   19.3   17.8   17.2
2 Full-Time Tenu…   16.1   11.4   10.2    9.6    8.9    9.2    8.8    8.2    8  
3 Full-Time Non-…   10.3   14.1   13.6   13.6   15.2   15.5   15     14.8   14.9
4 Part-Time Facu…   24     30.4   33.1   33.2   35.5   36     37     39.3   40.5
5 Graduate Stude…   20.5   16.5   18.1   18.8   18.7   19     20     19.9   19.5
# ℹ 2 more variables: `2009` <dbl>, `2011` <dbl>

Recreate the visualization

In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format.

Your turn: Reshape the data so we have one row per faculty type and year, and the percentage of hires as a single column.

# add code here

Your turn: Attempt to recreate the original bar chart as best as you can. Don’t worry about theming or color palettes right now. The most important aspects to incorporate:

  • Faculty type on the \(y\)-axis with bar segments color-coded based on the year of the survey
  • Percentage of instructional staff employees on the \(x\)-axis
  • Begin the \(x\)-axis at 5%
  • Label the \(x\)-axis at 5% increments
  • Match the order of the legend
Tip

forcats contains many functions for defining and adjusting the order of levels for factor variables. Factors are often used to enforce specific ordering of categorical variables in charts.

# add code here

Let’s make it better

The original plot is not very informative. It’s hard to compare the trends for across each faculty type.

Your turn: Improve the chart by using a relative frequency bar chart with year on the \(y\)-axis and faculty type encoded using color.

# add code here

What are this chart’s advantages and disadvantages? Add response here

Now we want a line chart

Your turn: Let’s instead use a line chart. Graph the data with year on the \(x\)-axis and percentage of employees on the \(y\)-axis. Distinguish each faculty type using an appropriate aesthetic mapping.

# add code here

Your turn: Now we want to clean it up.

  • Add a proper title and labelling to the chart
  • Use an optimized color palette1
  • Order the legend values by the final value of the percentage variable
# add code here

Goal: even more improvement!

Colleges and universities have come to rely more heavily on non-tenure track faculty members over time, in particular part-time faculty (e.g. contingent faculty, adjuncts). We want to show how academia is increasingly relying on part-time faculty.

Your turn: With your peers, sketch/design a chart that highlights the trend for part-time faculty. What type of geom would you use? What elements would you include? What would you remove?

Add response here.

Your turn: Create the chart you designed above using ggplot2. Post your completed chart to this discussion thread.

Tip

When you render the document, your plot images are automatically saved as PNG files in the ae-05-sad-plot_files/figure-html directory. You can use these images to post your chart to the discussion thread, or use the ggsave() function to directly save your plot as an image file. For example,

ggsave(
  filename = "images/part-time-faculty.png",
  plot = last_plot(),
  width = 8, height = 6, bg = "white"
)

saves the last generated plot to a file named part-time-faculty.png in the images directory. It has a defined height and width (in “inches”) with a white background.

# add code here

Footnotes

  1. viridis is often a good choice, but you can find others.↩︎