Presentation ready plots

Lecture 11

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2024

February 29, 2024

Announcements

Announcements

  • Project 01 due tomorrow
    • Presentation
      • Teaching team (me)
      • Peers
    • Write-up
    • Reproducibility, style, and organization
  • Homework 04 deferred to next week

Visualization critique

U.S. Median House Prices vs. Income

  • What is the story?
  • How does the visualization utilize annotations? How effective is it?

Telling a story

Multiple ways of telling a story

  • Sequential plots: Motivation, then resolution

  • A single plot: Resolution, and hidden in it motivation

Project note: you’re asked to create two plots per question. One possible approach: Start with a plot showing the raw data, and show derived quantities (e.g. percent increases, averages, coefficients of fitted models) in the subsequent plot.

Simplicity vs. complexity

When you’re trying to show too much data at once you may end up not showing anything.

  • Never assume your audience can rapidly process complex visual displays

  • Don’t add variables to your plot that are tangential to your story

  • Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)

  • Aim for memorable, but clear

Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots are getting too wordy to explain, take time to simplify them!

Consistency vs. repetitiveness

Be consistent but don’t be repetitive.

  • Use consistent features throughout plots (e.g., same color represents same level on all plots)

  • Aim to use a different type of visualization for each distinct analysis

Project note: If possible, ask a friend who is not in the class to listen to your presentation and then ask them what they remember. Then, ask yourself: is that what you wanted them to remember?

Designing effective visualizations

Keep it simple

Judging relative area

Use color to draw attention

Clarify the story

Leave out non-story details

Order matters

Clearly indicate missing data

Reduce cognitive load

Use descriptive titles

Annotate figures

Untangle a messy line chart

Online restaurant reservations

# A tibble: 3,420 × 5
   type    name          abbrev date       pct_change
   <chr>   <chr>         <chr>  <date>          <dbl>
 1 country United States US     2020-04-01      -1.00
 2 country United States US     2020-04-02      -1.00
 3 country United States US     2020-04-03      -1.00
 4 country United States US     2020-04-04      -1.00
 5 country United States US     2020-04-05      -1.00
 6 country United States US     2020-04-06      -1   
 7 country United States US     2020-04-07      -1.00
 8 country United States US     2020-04-08      -1.00
 9 country United States US     2020-04-09      -1   
10 country United States US     2020-04-10      -1.00
# ℹ 3,410 more rows

Highlight specific areas

Small multiples

Incorporate geography

Tell a different story

Project workflow overview

Demo

proj-01

Plot layout

Sample plots

library(gapminder)

gapminder_07 <- filter(.data = gapminder, year == 2007)

p_hist <- ggplot(data = gapminder_07, mapping = aes(x = lifeExp)) +
  geom_histogram(binwidth = 2)
p_box <- ggplot(data = gapminder_07, mapping = aes(x = continent, y = lifeExp)) +
  geom_boxplot()
p_scatter <- ggplot(data = gapminder_07, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()
p_text <- gapminder_07 |>
  filter(continent == "Americas") |>
  ggplot(mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_text_repel(mapping = aes(label = country)) +
  coord_cartesian(clip = "off")

Slide with single plot, little text

The plot will fill the empty space in the slide.

p_hist

Slide with single plot, lots of text

  • If there is more text on the slide

  • The plot will shrink

  • To make room for the text

p_hist

Small fig-width

For a zoomed-in look

```{r}
#| fig-width: 3
#| fig-asp: 0.618

p_hist
```

Large fig-width

For a zoomed-out look

```{r}
#| fig-width: 10
#| fig-asp: 0.618

p_hist
```

fig-width affects text size

Multiple plots on a slide

First, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?

  • If no, then don’t! Move the second plot to to the next slide!

  • If yes:

    • Insert columns using the Insert anything tool

    • Use layout-ncol chunk option

    • Use the patchwork package

    • Possibly, use pivoting to reshape your data and then use facets

Columns

Insert > Slide Columns

Quarto will automatically resize your plots to fit side-by-side.

layout-ncol

```{r}
#| fig-width: 5
#| fig-asp: 0.618
#| layout-ncol: 2
#| out-width: 100%

p_hist
p_scatter
```

patchwork

```{r}
#| fig-width: 7
#| fig-asp: 0.4

p_hist + p_scatter
```

patchwork layout I

(p_hist + p_box) /
  (p_scatter + p_text)

patchwork layout II

p_text / (p_hist + p_box + p_scatter)

patchwork layout III

p_text + p_hist + p_box + p_scatter +
  plot_annotation(title = "Gapminder", tag_levels = c("A"))

patchwork layout IV

p_text +
  {
    p_hist + {
      p_box + p_scatter + plot_layout(ncol = 1) + plot_layout(tag_level = "new")
    }
  } +
  plot_layout(ncol = 1) +
  plot_annotation(tag_levels = c("1", "a"), tag_prefix = "Fig ")

More patchwork


Learn more at https://patchwork.data-imaginist.com.

Wrap up

Wrap up

  • Use data to effectively tell a story
  • Use the right plot(s) for your story
  • Ensure plots are clearly legible and interpretable to the audience

Code
library(tidyverse)
library(rvest)
library(tvthemes)

# get episode ratings for season 1
ratings_page <- read_html(x = "https://www.imdb.com/title/tt9018736/episodes/?ref_=tt_eps_sm")

# extract elements
ratings_raw <- tibble(
  episode = html_elements(x = ratings_page, css = ".bblZrR .ipc-title__text") |>
    html_text2(),
  rating = html_elements(x = ratings_page, css = ".ratingGroup--imdb-rating") |>
    html_text2()
)

# clean data
ratings <- ratings_raw |>
  # separate episode number and title
  separate_wider_delim(
    cols = episode,
    delim = " ∙ ",
    names = c("episode_number", "episode_title")
  ) |>
  separate_wider_delim(
    cols = episode_number,
    delim = ".",
    names = c("season", "episode_number")
  ) |>
  # separate rating and number of votes
  separate_wider_delim(
    cols = rating,
    delim = " ",
    names = c("rating", "votes")
  ) |>
  # convert numeric variables
  mutate(
    across(
      .cols = -episode_title,
      .fns = parse_number
    ),
    votes = votes * 1e03
  )

# draw the plot
ratings |>
  # generate x-axis tick mark labels with title and epsiode number
  mutate(
    episode_title = str_glue("{episode_title}\n(S{season}E{episode_number})"),
    episode_title = fct_reorder(.f = episode_title, .x = episode_number)
  ) |>
  # draw a lollipop chart
  ggplot(mapping = aes(x = episode_title, y = rating)) +
  geom_point(mapping = aes(size = votes)) +
  geom_segment(
    mapping = aes(
      x = episode_title, xend = episode_title,
      y = 0, yend = rating
    )
  ) +
  # adjust the size scale
  scale_size(range = c(3, 8)) +
  # label the chart
  labs(
    title = "Live-action Avatar The Last Airbender is decent",
    x = NULL,
    y = "IMDB rating",
    caption = "Source: IMDB"
  ) +
  # use an Avatar theme
  theme_avatar(
    # custom font
    title.font = "Slayer",
    text.font = "Slayer",
    legend.font = "Slayer",
    # shrink legend text size
    legend.title.size = 8,
    legend.text.size = 6
  ) +
  theme(
    # remove undesired grid lines
    panel.grid.major.x = element_blank(),
    panel.grid.minor.y = element_blank(),
    # move legend to the top
    legend.position = "top",
    # align title flush with the edge
    plot.title.position = "plot",
    # shink x-axis text labels to fit
    axis.text.x = element_text(size = rel(x = 0.7))
  )

It was decent