Lecture 14
Cornell University
INFO 3312/5312 - Spring 2025
March 13, 2025
Image credit: Edward Tufte
Above all else show the data.
Source: The Visual Display of Quantitative Information, ch 1
Goal is to maximize the data-ink ratio
\[\text{Data-ink ratio} = \frac{\text{data-ink}}{\text{total ink used to print the graphic}}\]
Proportion of a graphic’s ink devoted to the non-redundant display of data-information
What parts of a graph are data-ink?
#| include: false
library(tidyverse)
library(palmerpenguins)
library(ggthemes)
penguins <- drop_na(penguins) |>
mutate(species = fct_infreq(species))
p <- ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
labs(
x = "Flipper length (mm)",
y = "Body mass (g)"
)
ae-13
Instructions
Redesign a boxplot to maximize the share of data-ink and reduce unnecessary duplication
#| warning: false
#| min-lines: 6
#| max-lines: 8
#| fig-width: 9
wdi |>
mutate(region = fct_reorder(region, life_exp)) |>
ggplot(mapping = aes(y = life_exp, x = region)) +
geom_boxplot() +
scale_x_discrete(labels = label_wrap_gen(width = 15)) +
labs(
x = NULL,
y = "Life expectancy",
title = "Distribution of life expectancy, by region"
)
Graphical decorations that do not improve the viewer’s understanding of the data