HW 04 - Design + details

Homework
Modified

March 11, 2025

Important

This homework is due March 12 at 11:59pm ET.

Learning objectives

  • Design aesthetically pleasing and hideous data visualizations
  • Implement custom themes for charts using {ggplot2}
  • Evaluate design choices to improve the readability of charts
  • Wrangle data to prepare it for visualization

Getting started

  • Go to the info3312-sp25 organization on GitHub. Click on the repo with the prefix hw-04. It contains the starter documents you need to complete the lab.

  • Clone the repo and start a new project in RStudio.

General guidance

Guidelines + tips

As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete the lab and other assignments in this course. There will be periodic reminders in this assignment to remind you to render, commit, and push your changes to GitHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.

Workflow + formatting

Make sure to

  • Update author name on your document.
  • Label all code chunks informatively and concisely.
  • Follow the Tidyverse code style guidelines.
  • Make at least 3 commits.
  • Resize figures where needed, avoid tiny or huge plots.
  • Turn in an organized, well formatted document.

Packages

library(tidyverse)
library(scales)
library(palmerpenguins)
library(rvest)
library(colorspace)
library(ggtext)

Exercises

Exercise 1

Design an ugly theme. Create a custom theme for {ggplot2} that is intended to look as ugly as possible. Break all the design rules we’ve learned in the class. Leverage your understanding of theme options, colors, fonts, etc. to make it truly hideous.

Apply your theme to this plot:

ggplot(
  data = penguins,
  mapping = aes(
    x = flipper_length_mm,
    y = body_mass_g,
    color = species
  )
) +
  geom_point(
    mapping = aes(
      shape = species
    ),
    size = 3,
    alpha = 0.8
  ) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_continuous(labels = label_number(scale_cut = cut_si(unit = "mm"))) +
  scale_y_continuous(labels = label_number(scale_cut = cut_si(unit = "g"))) +
  labs(
    title = "Penguin size, Palmer Station LTER",
    subtitle = "Flipper length and body mass for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length",
    y = "Body mass",
    color = "Penguin species",
    shape = "Penguin species"
  )

Tip

You can annotate the graph and modify the visual appearance of scales and themes in the plot, but do not change the data-components of the plot (e.g. it still needs to be a color-coded scatterplot showing the relationship between flipper length and body mass by species).

Exercise 2

Design a beautiful theme. Use the same graph from exercise 1, but this time create a beautiful, interpretable, aesthetically pleasing theme. Again consider how you can make use of theme options, colors, fonts, annotations, etc. to make an effective chart.

Along with the plot, create accessible alternative text for the chart using the techniques we discussed in class.

Including the alternative text

Quarto’s fig-alt option works for HTML-based rendering formats. PDF documents are generally considered a poor choice for accessible content (though it is notable Adobe has made strides in this area). Certainly Gradescope does not have accessibility tools when we grade submissions. In order to see your alternative text, we have configured the code chunk for exercise 2 to use echo: fenced. This will allow the graders to see not only the code used for your chart, but also the Quarto code chunk options.

Exercise 3

Improve the axis tick mark labels. Consider yourself a Tolkien enthusiast and want to better understand how often the members of the Fellowship of the Ring speak in the original book trilogy. You also are a nerd who wants to ensure people know the race of each member.1 You have the data to visualize the number of words spoken by each member along with their race, but the plot is not as readable as you would like.

# load LOTR data
# source: https://github.com/MokoSan/FSharpAdvent/blob/master/Data/WordsByCharacter.csv
lotr_words <- read_csv(file = "data/LOTRWordsByCharacter.csv")

lotr_words |>
  summarize(
    n_words = sum(Words),
    .by = c(Character, Race)
  ) |>
  # filter and only keep members of the fellowship of the rings
  filter(Character %in% c("Frodo", "Sam", "Merry", "Pippin", "Gandalf", "Aragorn", "Legolas", "Gimli", "Boromir")) |>
  mutate(Character = str_glue("{Character} ({Race})") |>
           fct_reorder(.x = n_words, .desc = TRUE)) |>
  ggplot(mapping = aes(x = Character, y = n_words)) +
  geom_col() +
  labs(
    x = "Character (race)",
    y = "Number of words spoken",
    title = "Number of words spoken by members of the Fellowship of the Ring",
    subtitle = "Data from the Lord of the Rings trilogy of books"
  )

Alas, we encounter a common problem when visualizing categorical data. The labels on the \(x\)-axis are too long and overlap. This makes it difficult to read the chart.

Propose and implement at least 4 different solutions to improve the readability of the chart. For each method, implement the change and describe the advantages and disadvantages of the approach.

Exercise 4

Towards the EGOT. The Emmy, Grammy, Oscar, and Tony Awards are four of the most prestigious awards in the entertainment industry. Winning all four of these awards is considered a significant accomplishment in American show business.

A GIF of Tracy Jordan from 30 Rock saying 'I've got to EGOT' to Whoopi Goldberg.

As of this date, only 21 people have achieved this feat. We want to visualize the winners and the time it took for them to earn an EGOT.

You can find a list of all EGOT winners and the years in which they won each award on Wikipedia. Scrape the data from the first table and clean it to reproduce the visualization below.

Some useful hints
  • The color palette used in the plot is “Dark 2” from the {colorspace} package.
  • The font is Roboto Condensed.
  • Some individuals won multiple awards in the same or consecutive years. To ensure each point is still visible, we offset each point based approximately on when during the year the awards ceremony is held. For the purposes of calculating these offsets, we assume the Emmy Awards are held in September, the Grammy Awards in January, the Academy Awards in February, and the Tony Awards in June.2

Generative AI (GAI) self-reflection

As stated in the syllabus, include a written reflection for this assignment of how you used GAI tools (e.g. what tools you used, how you used them to assist you with writing code), what skills you believe you acquired, and how you believe you demonstrated mastery of the learning objectives.

Wrap up

Submission

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Cornell University NetID and log in using your NetID credentials.
  • Click on your INFO 3312 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of homework should be associated with at least one question (i.e., should be “checked”).

Grading

  • Exercise 1: 7 points
  • Exercise 2: 13 points
  • Exercise 3: 8 points
  • Exercise 3: 22 points
  • Total: 50 points

Footnotes

  1. Gandalf is not a “Wizard”, he’s Ainur.↩︎

  2. Historically these are the usual times of year for the ceremonies, though sometimes there are exceptions. Notably Elton John won his Emmy at the 75th Primetime Emmy Awards. Ordinarily that ceremony would have been held in September 2023 but due to ongoing labor disputes the ceremony was delayed until January 2024.↩︎