Annotating charts

Lecture 10

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2024

February 22, 2024

Announcements

Announcements

  • Homework 03
  • Project proposal feedback
  • Preparations for presentations

Agenda for today

  • Axes
  • Annotations

Axes

Axis breaks

How can the following figure be improved with custom breaks in axes, if at all?

Context matters

pac_plot +
  scale_x_continuous(breaks = seq(from = 2000, to = 2020, by = 2))

Conciseness matters

pac_plot +
  scale_x_continuous(breaks = seq(2000, 2020, 4))

Precision matters

pac_plot +
  scale_x_continuous(breaks = seq(2000, 2020, 4)) +
  labs(x = "Election year")

Fretting the little things

Little details matter

Obsession with tiny details

Human-focused design

“This is what customers pay us for – to sweat all these details so it’s easy and pleasant for them to use our computers.”

Graph details: Redundant coding

Graph details: Consistent ordering

Details matter

Worrying about tiny details in graphs

  • Makes them easier for your audience to understand
  • Improves their beauty
  • Enhances the truth

Text in plots

Including text on a plot

Label actual data points

geom_text(), geom_label(), geom_text_repel(), etc.

. . .

Add arbitrary annotations

annotate()

Label actual data points

library(gapminder)

gapminder_europe <- gapminder |>
  filter(
    year == 2007,
    continent == "Europe"
  )

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_text(aes(label = country))

Label actual data points

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label(aes(label = country))

Solution 1: Repel labels

library(ggrepel)

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_text_repel(aes(label = country))

Solution 1: Repel labels

library(ggrepel)

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label_repel(aes(label = country))

Solution 2a: Don’t use so many labels

gapminder_europe <- gapminder_europe |>
  mutate(
    should_be_labeled = country %in% c(
      "Albania",
      "Norway",
      "Hungary"
    )
  )

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  geom_label_repel(
    data = filter(
      gapminder_europe,
      should_be_labeled == TRUE
    ),
    aes(label = country)
  )

Solution 2b: Use other aesthetics too

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point(aes(color = should_be_labeled)) +
  geom_label_repel(
    data = filter(
      gapminder_europe,
      should_be_labeled == TRUE
    ),
    aes(
      label = country,
      fill = should_be_labeled
    ),
    color = "white"
  ) +
  scale_color_manual(values = c(
    "grey50",
    "red"
  )) +
  scale_fill_manual(values = c("red")) +
  guides(color = "none", fill = "none")

(Highlight non-text things too!)

# Color just Oceania
gapminder_highlighted <- gapminder |>
  mutate(
    is_oceania = continent == "Oceania"
  )

ggplot(
  gapminder_highlighted,
  aes(
    x = year, y = lifeExp,
    group = country,
    color = is_oceania,
    size = is_oceania
  )
) +
  geom_line() +
  scale_color_manual(values = c(
    "grey70",
    "red"
  )) +
  scale_size_manual(values = c(0.1, 0.5)) +
  guides(color = "none", size = "none") +
  theme_minimal()

Including text on a plot

Label actual data points

geom_text(), geom_label(), geom_text_repel(), etc.

. . .

Add arbitrary annotations

annotate()

Adding arbitrary annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "text",
    x = 40000, y = 76,
    label = "Some text!"
  )

Adding arbitrary annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "label",
    x = 40000, y = 76,
    label = "Some text!"
  )

Any geom works

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  # This is evil though!!!
  # We just invented a point
  annotate(
    geom = "point",
    x = 40000, y = 76
  )

Any geom works

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "rect",
    xmin = 30000, xmax = 50000,
    ymin = 78, ymax = 82,
    fill = "red", alpha = 0.2
  )

Use multiple annotations

ggplot(
  gapminder_europe,
  aes(x = gdpPercap, y = lifeExp)
) +
  geom_point() +
  annotate(
    geom = "rect",
    xmin = 30000, xmax = 50000,
    ymin = 78, ymax = 82,
    fill = "red", alpha = 0.2
  ) +
  annotate(
    geom = "label",
    x = 40000, y = 76.5,
    label = "Rich and long-living"
  ) +
  annotate(
    geom = "segment",
    x = 40000, xend = 40000,
    y = 76.8, yend = 77.8,
    arrow = arrow(
      length = unit(0.1, "in")
    )
  )

Application exercise

ae-08

  • Go to the course GitHub org and find your ae-08 (repo name will be suffixed with your NetID).
  • Clone the repo in RStudio Workbench, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow.

World development indicators

wdi_co2_raw <- read_csv("data/wdi_co2.csv")
wdi_clean <- wdi_co2_raw |>
  filter(region != "Aggregates") |>
  select(iso2c, iso3c, country, year,
    population = SP.POP.TOTL,
    co2_emissions = EN.ATM.CO2E.PC,
    gdp_per_cap = NY.GDP.PCAP.KD,
    region, income
  )

glimpse(wdi_clean)
Rows: 6,020
Columns: 9
$ iso2c         <chr> "AF", "AF", "AF", "AF…
$ iso3c         <chr> "AFG", "AFG", "AFG", …
$ country       <chr> "Afghanistan", "Afgha…
$ year          <dbl> 2001, 1998, 2009, 200…
$ population    <dbl> 19688632, 18493132, 2…
$ co2_emissions <dbl> 0.05529272, 0.0712697…
$ gdp_per_cap   <dbl> NA, NA, 490.2728, NA,…
$ region        <chr> "South Asia", "South …
$ income        <chr> "Low income", "Low in…

Clean and reshape data

co2_rankings <- wdi_clean |>
  # Get rid of smaller countries
  filter(population > 200000) |>
  # Only look at two years
  filter(year %in% c(1995, 2020)) |>
  # Get rid of all the rows that have missing values in co2_emissions
  drop_na(co2_emissions) |>
  # Look at each year individually and rank countries based on their emissions that year
  mutate(
    ranking = rank(co2_emissions),
    .by = year
  ) |>
  # Only select required columns
  select(iso3c, country, year, region, income, ranking) |>
  # pivot long
  pivot_wider(names_from = year, names_prefix = "rank_", values_from = ranking) |>
  # Find the difference in ranking between 2020 and 1995
  mutate(rank_diff = rank_2020 - rank_1995) |>
  # Remove all rows where there's a missing value in the rank_diff column
  drop_na(rank_diff) |>
  # Make an indicator variable that is true of the absolute value of the
  # difference in rankings is greater than 30
  mutate(big_change = if_else(abs(rank_diff) >= 30, TRUE, FALSE)) |>
  # Make another indicator variable that indicates if the rank improved by a
  # lot, worsened by a lot, or didn't change much.
  mutate(better_big_change = case_when(
    rank_diff <= -30 ~ "Rank improved",
    rank_diff >= 30 ~ "Rank worsened",
    .default = "Rank changed a little"
  )) |>
  # arrange rows by rank_diff for printing
  arrange(rank_diff)

Clean and reshape data

slice_head(wdi_clean, n = 5)
# A tibble: 5 × 9
  iso2c iso3c country    year population co2_emissions gdp_per_cap region income
  <chr> <chr> <chr>     <dbl>      <dbl>         <dbl>       <dbl> <chr>  <chr> 
1 AF    AFG   Afghanis…  2001   19688632        0.0553         NA  South… Low i…
2 AF    AFG   Afghanis…  1998   18493132        0.0713         NA  South… Low i…
3 AF    AFG   Afghanis…  2009   27385307        0.240         490. South… Low i…
4 AF    AFG   Afghanis…  2000   19542982        0.0552         NA  South… Low i…
5 AF    AFG   Afghanis…  2012   30466479        0.335         571. South… Low i…
slice_head(co2_rankings, n = 5)
# A tibble: 5 × 9
  iso3c country           region income rank_1995 rank_2020 rank_diff big_change
  <chr> <chr>             <chr>  <chr>      <dbl>     <dbl>     <dbl> <lgl>     
1 ZWE   Zimbabwe          Sub-S… Lower…        75        39       -36 TRUE      
2 DNK   Denmark           Europ… High …       160       127       -33 TRUE      
3 SWE   Sweden            Europ… High …       132       100       -32 TRUE      
4 SYR   Syrian Arab Repu… Middl… Low i…        96        64       -32 TRUE      
5 MLT   Malta             Middl… High …       128        99       -29 FALSE     
# ℹ 1 more variable: better_big_change <chr>

Basic plot

Improve the graph through annotation

Brainstorm methods to improve the readability and interpretability of the chart through annotations

Points to emphasize

  • What is a “good” rank? What is a “bad” rank?
  • What are the countries that have significantly improved or worsened their rank?
  • What other aspects do you feel should be emphasized?

Methods for annotation

  • Text labels
  • Arrows/lines
  • Rectangles
  • Colors/fills

A complex annotation

Plot the data and annotate

Set the random seed for reproducibility.

set.seed(123)

Plot the data and annotate

Initialize the ggplot() object.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
)

Plot the data and annotate

Create a basic scatterplot, color-coded based on rank changes.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  geom_point(aes(color = better_big_change))

Plot the data and annotate

Label points for countries with a “big” change.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  )

Plot the data and annotate

Add a reference line in the background to show what it would look like if countries did not change rank order. Note xend and yend differ due to new countries forming during the 25 year period.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  )

Plot the data and annotate

Annotate the plot to clarify outliers that are improving.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  )

Plot the data and annotate

Annotate the plot to clarify the outliers that are worsening.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  )

Plot the data and annotate

Identify the highest/lowest emitters using rectangular highlighting.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25)

Plot the data and annotate

Add text to define what the rectangles mean.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B")

Plot the data and annotate

Add arrows between the rectangles and labels.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines")))

Plot the data and annotate

Choose a custom color palette to highlight the outliers.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136"))

Plot the data and annotate

Use the same colors for the labels. Only need the last two values in the original palette.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136"))

Plot the data and annotate

Adjust the axis labeling and remove the padding on both axes.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25))

Plot the data and annotate

Add human-readable titles, labels, etc.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  labs(
    x = "Rank in 1995", y = "Rank in 2020",
    title = "Changes in CO2 emission rankings between 1995 and 2020",
    subtitle = "Countries that improved or worsened more than 30 positions in the rankings highlighted",
    caption = "Source: The World Bank.\nCountries with populations of less than 200,000 excluded."
  )

Plot the data and annotate

Get rid of the legends - unnecessary.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  labs(
    x = "Rank in 1995", y = "Rank in 2020",
    title = "Changes in CO2 emission rankings between 1995 and 2020",
    subtitle = "Countries that improved or worsened more than 30 positions in the rankings highlighted",
    caption = "Source: The World Bank.\nCountries with populations of less than 200,000 excluded."
  ) +
  guides(color = "none", fill = "none")

Plot the data and annotate

Change the base theme and font to match the text labels.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  labs(
    x = "Rank in 1995", y = "Rank in 2020",
    title = "Changes in CO2 emission rankings between 1995 and 2020",
    subtitle = "Countries that improved or worsened more than 30 positions in the rankings highlighted",
    caption = "Source: The World Bank.\nCountries with populations of less than 200,000 excluded."
  ) +
  guides(color = "none", fill = "none") +
  theme_bw(base_family = "Roboto Condensed")

Plot the data and annotate

Use HTML and Markdown syntax to customize the visual appearance of the title and subtitle.

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  labs(
    x = "Rank in 1995", y = "Rank in 2020",
    title = "Changes in CO<sub>2</sub> emission rankings between 1995 and 2020",
    subtitle = "Countries that <span style='color: #0074D9'>**improved**</span> or <span style='color: #FF4136'>**worsened**</span> more than 30 positions in the rankings highlighted",
    caption = "Source: The World Bank.\nCountries with populations of less than 200,000 excluded."
  ) +
  guides(color = "none", fill = "none") +
  theme_bw(base_family = "Roboto Condensed")

Plot the data and annotate

Ensure rendering of HTML/Markdown syntax with ggtext::element_markdown().

set.seed(123)

ggplot(
  data = co2_rankings,
  mapping = aes(x = rank_1995, y = rank_2020)
) +
  annotate(geom = "segment", x = 0, xend = max(co2_rankings$rank_1995), y = 0, yend = max(co2_rankings$rank_2020)) +
  geom_point(aes(color = better_big_change)) +
  geom_label_repel(
    data = filter(co2_rankings, big_change == TRUE),
    aes(label = country, fill = better_big_change),
    color = "white", family = "Roboto Condensed"
  ) +
  annotate(
    geom = "text", x = 167, y = 6, label = "Outliers improving",
    family = "Roboto Condensed", fontface = "italic", hjust = 1, color = "grey50"
  ) +
  annotate(
    geom = "text", x = 2, y = 170, label = "Outliers worsening",
    family = "Roboto Condensed", fontface = "italic", hjust = 0, color = "grey50"
  ) +
  annotate(geom = "rect", xmin = 0, xmax = 25, ymin = 0, ymax = 25, 
           fill = "#2ECC40", alpha = 0.25) +
  annotate(geom = "rect", xmin = 150, xmax = 175, ymin = 150, ymax = 175, 
           fill = "#FF851B", alpha = 0.25) +
  annotate(geom = "text", x = 40, y = 6, label = "Lowest emitters", 
           hjust = 0, color = "#2ECC40") +
  annotate(geom = "text", x = 167.5, y = 125, label = "Highest\nemitters", 
           hjust = 0.5, vjust = 1, lineheight = 1, color = "#FF851B") +
  annotate(geom = "segment", x = 38, xend = 20, y = 6, yend = 6, color = "#2ECC40", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  annotate(geom = "segment", x = 167.5, xend = 167.5, y = 130, yend = 155, color = "#FF851B", 
           arrow = arrow(angle = 15, length = unit(0.5, "lines"))) +
  scale_color_manual(values = c("grey50", "#0074D9", "#FF4136")) +
  scale_fill_manual(values = c("#0074D9", "#FF4136")) +
  scale_x_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  scale_y_continuous(expand = c(0, 0), breaks = seq(0, 175, 25)) +
  labs(
    x = "Rank in 1995", y = "Rank in 2020",
    title = "Changes in CO<sub>2</sub> emission rankings between 1995 and 2020",
    subtitle = "Countries that <span style='color: #0074D9'>**improved**</span> or <span style='color: #FF4136'>**worsened**</span> more than 30 positions in the rankings highlighted",
    caption = "Source: The World Bank.\nCountries with populations of less than 200,000 excluded."
  ) +
  guides(color = "none", fill = "none") +
  theme_bw(base_family = "Roboto Condensed") +
  theme(
    plot.title = element_markdown(face = "bold", size = rel(1.6)),
    plot.subtitle = element_markdown(size = rel(1.3)),
    plot.margin = unit(c(0.5, 1, 0.5, 0.5), units = "lines")
  )

Final plot

Wrap up

Wrap up

  • Visual storytelling requires a combination of data, visualization, and annotation
  • Attention to detail is key to ensure that the message is clear
  • Annotation is a powerful method for enhancing clarity and interpretability of plots
  • ggplot2 has powerful annotation tools
  • Alternatively, export to vector format and use a vector graphics editor (e.g. Illustrator, GIMP)

Please don’t be bad