Annotating charts

Lecture 10

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2025

February 25, 2024

Announcements

Homework 03
Project proposal grades/feedback
Project 01 presentations
- Clarifications
- Presentation times

Visualization critique

What is the story?
How does the visual design impact interpretability?

Axes

Axis breaks

How can the following figure be improved with custom breaks in axes, if at all?

Context matters

pac_plot +
  scale_x_continuous(breaks = seq(from = 2000, to = 2020, by = 2))

Conciseness matters

pac_plot +
  scale_x_continuous(breaks = seq(2000, 2020, 4))

Precision matters

pac_plot +
  scale_x_continuous(breaks = seq(2000, 2020, 4)) +
  labs(x = "Election year")

Fretting the little things

Little details matter

Obsession with tiny details

Human-focused design

“This is what customers pay us for – to sweat all these details so it’s easy and pleasant for them to use our computers.”

Graph details: Redundant coding

Graph details: Consistent ordering

Annotating plots

How can plots be annotated to enhance their clarity and interpretability?

Text
Arrows/lines
Rectangles
Colors/fills

04:00

04:00

Text in plots

Including text on a plot

Label actual data points

geom_text(), geom_label(), geom_text_repel(), etc.

Label actual data points

library(gapminder)

gapminder_europe <- gapminder |>
  filter(
    year == 2007,
    continent == "Europe"
  )

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_text(aes(label = country))

Label actual data points

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label(aes(label = country))

Solution 1: Repel labels

library(ggrepel)

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_text_repel(aes(label = country))

Solution 1: Repel labels

library(ggrepel)

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label_repel(aes(label = country))

Solution 2a: Don’t use so many labels

gapminder_europe <- gapminder_europe |>
  mutate(
    should_be_labeled = country %in% c(
      "Albania",
      "Norway",
      "Hungary"
    )
  )

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label_repel(
    data = filter(
      gapminder_europe,
      should_be_labeled == TRUE
    ),
    aes(label = country)
  )

Solution 2b: Use other aesthetics too

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point(aes(color = should_be_labeled)) +
  geom_label_repel(
    data = filter(
      gapminder_europe,
      should_be_labeled == TRUE
    ),
    aes(
      label = country,
      fill = should_be_labeled
    ),
    color = "white"
  ) +
  scale_color_manual(values = c(
    "grey50",
    "red"
  )) +
  scale_fill_manual(values = c("red")) +
  guides(color = "none", fill = "none")

(Highlight non-text things too!)

# Color just Oceania
gapminder_highlighted <- gapminder |>
  mutate(
    is_oceania = continent == "Oceania"
  )

ggplot(
  gapminder_highlighted,
  aes(
    x = year, y = lifeExp,
    group = country,
    color = is_oceania,
    size = is_oceania
  )
) +
  geom_line() +
  scale_color_manual(values = c(
    "grey70",
    "red"
  )) +
  scale_size_manual(values = c(0.1, 0.5)) +
  guides(color = "none", size = "none") +
  theme_minimal()

Including text on a plot

Label actual data points

geom_text(), geom_label(), geom_text_repel(), etc.

Add arbitrary annotations

annotate()

Adding arbitrary annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "text",
    x = 40000, y = 76,
    label = "Some text!"
  )

Adding arbitrary annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "label",
    x = 40000, y = 76,
    label = "Some text!"
  )

Any geom works

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  # This is evil though!!!
  # We just invented a point
  annotate(
    geom = "point",
    x = 40000, y = 76
  )

Any geom works

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "rect",
    xmin = 30000, xmax = 50000,
    ymin = 78, ymax = 82,
    fill = "red", alpha = 0.2
  )

Use multiple annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "rect",
    xmin = 30000, xmax = 50000,
    ymin = 78, ymax = 82,
    fill = "red", alpha = 0.2
  ) +
  annotate(
    geom = "label",
    x = 40000, y = 76.5,
    label = "Rich and long-living"
  ) +
  annotate(
    geom = "segment",
    x = 40000, xend = 40000,
    y = 76.8, yend = 77.8,
    arrow = arrow(
      length = unit(0.1, "in")
    )
  )

World development indicators

Rows: 6,020
Columns: 9
$ iso2c         <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF"…
$ iso3c         <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
$ country       <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "…
$ year          <dbl> 2001, 1998, 2009, 2000, 2012, 1996, 1999, 2002, 2003, 2004, 2005, 2006, 2007…
$ population    <dbl> 19688632, 18493132, 27385307, 19542982, 30466479, 17106595, 19262847, 210002…
$ co2_emissions <dbl> 0.05529272, 0.07126970, 0.23950690, 0.05516661, 0.33506104, 0.08226652, 0.05…
$ gdp_per_cap   <dbl> NA, NA, 490.2728, NA, 570.6761, NA, NA, 344.2242, 347.4152, 338.7394, 363.54…
$ region        <chr> "South Asia", "South Asia", "South Asia", "South Asia", "South Asia", "South…
$ income        <chr> "Low income", "Low income", "Low income", "Low income", "Low income", "Low i…

Clean and reshape data

# A tibble: 170 × 9
   iso3c country            region income rank_1995 rank_2020 rank_diff big_change better_big_change
   <chr> <chr>              <chr>  <chr>      <dbl>     <dbl>     <dbl> <lgl>      <chr>            
 1 ZWE   Zimbabwe           Sub-S… Lower…        75        39       -36 TRUE       Rank improved    
 2 DNK   Denmark            Europ… High …       160       127       -33 TRUE       Rank improved    
 3 SWE   Sweden             Europ… High …       132       100       -32 TRUE       Rank improved    
 4 SYR   Syrian Arab Repub… Middl… Low i…        96        64       -32 TRUE       Rank improved    
 5 MLT   Malta              Middl… High …       128        99       -29 FALSE      Rank changed a l…
 6 EST   Estonia            Europ… High …       161       133       -28 FALSE      Rank changed a l…
 7 UKR   Ukraine            Europ… Lower…       139       111       -28 FALSE      Rank changed a l…
 8 YEM   Yemen, Rep.        Middl… Low i…        53        25       -28 FALSE      Rank changed a l…
 9 VEN   Venezuela, RB      Latin… Not c…       119        92       -27 FALSE      Rank changed a l…
10 SWZ   Eswatini           Sub-S… Lower…        79        53       -26 FALSE      Rank changed a l…
# ℹ 160 more rows

Basic plot

Application exercise

`ae-09`: Improving the chart through annotation

Instructions

Brainstorm methods to improve the readability and interpretability of the chart through annotations

Points to emphasize

What is a “good” rank? What is a “bad” rank?
Note
- 1 is lowest carbon emissions per capita
- 170 is the highest carbon emissions per capita
What are the countries that have significantly improved or worsened their rank?
What other aspects do you feel should be emphasized?

Methods for annotation

Text labels
Arrows/lines
Rectangles
Colors/fills

Wrap up

Recap

Visual storytelling requires a combination of data, visualization, and annotation
Attention to detail is key to ensure that the message is clear
Annotation is a powerful method for enhancing clarity and interpretability of plots
{ggplot2} has powerful annotation tools
Alternatively, export to vector format and use a vector graphics editor (e.g. Illustrator, GIMP)

Acknowledgements

Slides derived in part from Data Visualization with R and licensed under Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0)

Annotating charts

Announcements

Announcements

Visualization critique

Axes

Axis breaks

Context matters

Conciseness matters

Precision matters

Fretting the little things

Little details matter

Obsession with tiny details

Human-focused design

Graph details: Redundant coding

Graph details: Consistent ordering

Annotating plots

Annotating plots

Text in plots

Including text on a plot

Label actual data points

Label actual data points

Label actual data points

Solution 1: Repel labels

Solution 1: Repel labels

Solution 2a: Don’t use so many labels

Solution 2b: Use other aesthetics too

(Highlight non-text things too!)

Including text on a plot

Label actual data points

Add arbitrary annotations

Adding arbitrary annotations

Adding arbitrary annotations

Any geom works

Any geom works

Use multiple annotations

World development indicators

World development indicators

Clean and reshape data

Basic plot

Application exercise

ae-09: Improving the chart through annotation

Points to emphasize

Methods for annotation

Wrap up

Recap

Acknowledgements

`ae-09`: Improving the chart through annotation