Annotating charts
- Explain why small details matter in data visualization
- Apply axis break choices to improve readability
- Annotate charts with text labels using
geom_text(),geom_label(), and {ggrepel} - Add arbitrary annotation layers with
annotate() - Apply redundant coding and consistent ordering principles
Axis breaks
The choice of where to place axis tick marks is a small detail that has a large effect on readability. Poorly chosen breaks can obscure patterns, force readers to interpolate between ticks, or obscure the time scale of a series.
Here is a time series of PAC contributions with default breaks:
{ggplot2} uses a heuristic to try and place breaks at aesthetically pleasing locations. The default breaks chosen here appear at 2005, 2010, 2015, 2020. These aren’t wrong, but they’re not particularly meaningful for political data.
Every two years aligns breaks with election cycles, matching how this data is naturally collected:
pac_plot +
scale_x_continuous(breaks = seq(2000, 2024, 2))- 1
-
seq(2000, 2024, 2)generates a sequence from 2000 to 2024 in steps of 2. This produces a break at every election year.
This version is more precise but perhaps visually cluttered. Every four years (presidential election cycles) captures the major structure without the clutter:
pac_plot +
scale_x_continuous(breaks = seq(2000, 2024, 4)) +
labs(x = "Election year")- 1
- Four-year intervals align with presidential election cycles.
- 2
- Updating the axis title to “Election year” clarifies what the breaks represent — making the label match the break logic.
The lesson: choose breaks that align with how the reader will interpret the data, and adjust axis labels to communicate that logic.
Details matter
“This is what customers pay us for — to sweat all these details so it’s easy and pleasant for them to use our computers.”
— Steve Jobs
When Steve Jobs insisted that the interior of the Apple IIe circuit board be beautiful even though no customer would ever see it, he was articulating a principle that applies equally to data visualization: the small details signal whether you have taken care. A reader who cannot see the care in a chart can still feel its absence.
Redundant coding
Using multiple aesthetics simultaneously to encode the same variable is called redundant coding. A color-blind reader who cannot distinguish red from green can still read a chart if shape also encodes the groups.
In the second example, each species is distinguished by both color and shape. We’ve also manually adjusted the mapping of each species to the three colors to minimize overlap between the green and blue points. A reader who cannot perceive color differences can still identify species by shape alone. Adding a second encoding costs nothing but makes the chart more accessible.
Consistent ordering
When the same variable appears in multiple places — in the chart and in a legend, for example — the order should be consistent:
In the second example, the legend lists companies in the same order as their lines appear at the right edge of the chart. A reader’s eye moves from a line to the legend and finds it immediately. Inconsistent ordering forces the reader to scan — a small friction that accumulates across many lookups.
Text on plots
There are two distinct strategies for placing text on a chart:
- Label data points — place text at the (x, y) position of an observation
- Add arbitrary annotations — place text (or other marks) at any location you choose
Labeling data points
geom_text() and geom_label() both place text at each observation’s coordinates:
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_text(aes(label = country))- 1
-
geom_text()places the country name at each point’s (x, y) coordinate. The result is unreadable — every label overlaps.
geom_label() draws a background rectangle behind each label, which helps readability slightly but does not solve the overlap:
Avoid overlap with {ggrepel}
The {ggrepel} package provides geom_text_repel() and geom_label_repel(), which automatically nudge labels to avoid overlap while drawing a connecting line back to the original point:
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_text_repel(aes(label = country))- 1
-
geom_text_repel()uses a force-directed algorithm to push labels apart while keeping them as close as possible to their data points. The thin lines connect displaced labels back to their points.
Even with repelling, 30+ labels is too many for most charts. The better solution is to label selectively — only the points that matter for the story:
gapminder_europe <- gapminder_europe |>
mutate(
highlight = country %in% c("Albania", "Norway", "Hungary")
)
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(color = highlight)) +
geom_label_repel(
data = filter(gapminder_europe, highlight),
aes(label = country, fill = highlight),
color = "white"
) +
scale_color_manual(values = c("grey70", "#B31B1B")) +
scale_fill_manual(values = "#B31B1B") +
guides(color = "none", fill = "none") +
labs(x = "GDP per capita", y = "Life expectancy")- 1
- Create a logical indicator for which points should be labeled.
- 2
-
Pass only the highlighted rows to
geom_label_repel(). The other points are still plotted but unlabeled.
This combination — gray non-highlighted points, colored highlighted points with labels — is a versatile design pattern for drawing attention to specific observations without removing context.
The same highlighting pattern applies to line charts:
gapminder |>
mutate(is_oceania = continent == "Oceania") |>
ggplot(aes(
x = year,
y = lifeExp,
group = country,
color = is_oceania,
linewidth = is_oceania
)) +
geom_line() +
scale_color_manual(values = c("grey80", "#B31B1B")) +
scale_linewidth_manual(values = c(0.3, 1)) +
guides(color = "none", linewidth = "none") +
labs(
title = "Life expectancy trends, 1952–2007",
x = NULL,
y = "Life expectancy"
)- 1
-
scale_linewidth_manual()makes the highlighted lines thicker and the background lines thin, reinforcing the contrast.
Arbitrary annotations with annotate()
annotate() places a single geom at a specific location you specify — independent of any data. Use it to add callout text, highlight a region, or draw a reference line at a meaningful value.
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
annotate(
geom = "text",
x = 40000,
y = 76,
label = "High-income\ncountries"
)- 1
-
geom = "text"places a text string at (40000, 76). Any geom name works here.
Use "rect" to shade a region:
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
annotate(
geom = "rect",
xmin = 30000,
xmax = 55000,
ymin = 78,
ymax = 82,
fill = "#B31B1B",
alpha = 0.15
) +
annotate(
geom = "label",
x = 42500,
y = 76.5,
label = "Rich and long-living"
) +
annotate(
geom = "segment",
x = 42500,
xend = 42500,
y = 77.0,
yend = 77.8,
arrow = arrow(length = unit(0.1, "in"))
)- 1
- Low alpha keeps the shading subtle enough that points underneath it remain visible.
- 2
-
arrow()adds an arrowhead at(xend, yend).
Markdown in annotations with {ggtext}
{ggtext} extends annotations with Markdown and HTML rendering. Use element_markdown() in theme() for title/subtitle text, and geom = "richtext" in annotate() for inline formatted annotations:
ggplot(gapminder_europe, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
annotate(
geom = "richtext",
x = 42000,
y = 76,
label = "Countries with GDP<br>>$30K are **mostly<br>above 80** years",
fill = NA,
label.color = NA,
hjust = 0.5,
size = 3
) +
labs(
title = "GDP and life expectancy in **Europe**, 2007",
x = "GDP per capita",
y = "Life expectancy"
) +
theme(plot.title = element_markdown())- 1
-
geom = "richtext"renders Markdown/HTML inside the annotation box.fill = NAandlabel.color = NAremove the box background and border. - 2
-
**bold**in the title requireselement_markdown()to render.
Summary
- Axis break placement should align with how readers naturally interpret the data; label the axis to communicate that logic
- Small details — font weight, ordering, redundant coding — signal care and improve accessibility
- Use redundant coding (mapping the same variable to multiple aesthetics) to serve readers who cannot perceive one of the encodings
- Keep legend order consistent with data order to reduce visual scanning
geom_text()/geom_label()place text at data coordinates;geom_text_repel()(from {ggrepel}) avoids overlap automatically- Label selectively: show only the points that serve the story, and mute the others to provide context
annotate()places a single geom at an arbitrary location; use it for callouts, shaded regions, and reference lines- {ggtext} adds Markdown/HTML rendering to titles, subtitles, and annotations
Acknowledgements
Material derived in part from Data Visualization with R and Fundamentals of Data Visualization.














