AE 02: Considering the data-ink ratio: The lollipop chart

Suggested answers

Application exercise
Answers
Modified

January 27, 2026

Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

library(tidyverse)

# set default theme to minimal - reduce extraneous background ink
theme_set(theme_minimal())

options(scipen = 999)

For the following exercises we will work with data on houses that were sold in Tompkins County, NY in 2022-24.1

The variables include:

The dataset can be found in the data folder of your repo. It is called tompkins-home-sales.csv. We will import the data and create a new variable, decade_built_cat, which identifies the decade in which the home was built. It will include catch-all categories for any homes pre-1940 and post-1990.

tompkins <- read_csv("data/tompkins-home-sales.csv")

Average sale price by decade

Let’s examine the average sales price of homes recently sold in Tompkins County by their age. To simplify this task, we will split the homes by decade of construction. It will include catch-all categories for any homes pre-1940 and post-1990. Then we will calculate the average sale price of homes sold by decade.

# create decade variable
tompkins <- tompkins |>
  mutate(
    decade_built = (year_built %/% 10) * 10,
    decade_built_cat = case_when(
      decade_built <= 1940 ~ "1940 or before",
      decade_built >= 1990 ~ "1990 or after",
      .default = as.character(decade_built)
    )
  )

# calculate mean sales price by decade
mean_price_decade <- tompkins |>
  group_by(decade_built_cat) |>
  summarize(mean_price = mean(price))
mean_price_decade
# A tibble: 6 × 2
  decade_built_cat mean_price
  <chr>                 <dbl>
1 1940 or before      351273.
2 1950                330779.
3 1960                355146.
4 1970                354562.
5 1980                338600.
6 1990 or after       445540.

Visualizing the data as a bar chart

A conventional approach to visualizing this data is a bar chart. Since we already calculated the average sales price, we can use geom_col() to create the bar chart. We also graph it horizontally to avoid overlapping labels for the decades.

ggplot(
  data = mean_price_decade,
  mapping = aes(x = mean_price, y = decade_built_cat)
) +
  geom_col() +
  labs(
    x = "Mean sales price",
    y = "Decade built",
    title = "Mean sales price of houses in Tompkins County, by decade built"
  )

Visualizing the data as a dot plot

The bar chart violates the data-ink ratio principle. The bars are not necessary to convey the information. We can use a dot plot instead. The dot plot is a variation of the bar chart, where the bars are replaced by dots. The dot plot is a (potentially) better choice because it uses less ink to convey the same information.

ggplot(
  data = mean_price_decade,
  mapping = aes(x = mean_price, y = decade_built_cat)
) +
  geom_point(size = 4) +
  labs(
    x = "Mean sales price",
    y = "Decade built",
    title = "Mean sales price of houses in Tompkins County, by decade built"
  )

The dot plot minimizes the data-ink ratio, but it is not perfect. Unlike with a bar chart, there is no expectation that the origin of the \(x\)-axis begins at 0. The relative distance between the dots communicates the difference in mean sales price, and compared to the bar chart, the difference in mean sales price is exaggerated.

Visualizing the data as a lollipop chart

The lollipop chart is a happy compromise, utilizing a skinny line + dot to communicate the values.

Your turn: Define the conceptual grammar of graphics for a lollipop chart to visualize the average sales price by decade built. Focus specifically on the layer(s) needed to create the “lollipop” effect, the geometric object(s), and the mapping aesthetics required.

TipChoosing appropriate geom(s)

Try to construct the chart without using geom_col(). You would have to spend more time tweaking some of the function’s parameters so it looks appropriate.

There is another geom_*() that works pretty well here.

Add response here. The grammar of graphics for a lollipop chart includes:

  • Layer 1 (the “stick”)
    • Data: mean_price_decade
    • Geometric object: geom_segment()
    • Mapping aesthetics:
      • x: 0
      • xend: mean_price
      • y: decade_built_cat
      • yend: decade_built_cat
  • Layer 2 (the “candy”)
    • Data: mean_price_decade
    • Geometric object: geom_point()
    • Mapping aesthetics:
      • x: mean_price
      • y: decade_built_cat

Your turn: Implement your lollipop chart using ggplot().

ggplot(
  data = mean_price_decade,
  mapping = aes(x = mean_price, y = decade_built_cat)
) +
  geom_point(size = 4) +
  geom_segment(
    mapping = aes(
      x = 0,
      xend = mean_price,
      y = decade_built_cat,
      yend = decade_built_cat
    )
  ) +
  labs(
    x = "Mean sales price",
    y = "Decade built",
    title = "Mean sales price of houses in Tompkins County, by decade built"
  )

# reuse global aesthetics in geom_segment()
ggplot(
  data = mean_price_decade,
  mapping = aes(x = mean_price, y = decade_built_cat)
) +
  geom_point(size = 4) +
  geom_segment(
    mapping = aes(
      xend = 0,
      yend = decade_built_cat
    )
  ) +
  labs(
    x = "Mean sales price",
    y = "Decade built",
    title = "Mean sales price of houses in Tompkins County, by decade built"
  )

Note

You can try making it work with geom_col() instead of geom_segment(), but it’s not as easy as it sounds. You need to set the width argument to a very small value, and set the color argument to "black" to remove the default fill color. You also need to set the x and xend aesthetics to 0 and mean_price, respectively.

This reduces the data-ink ratio compared to the bar chart, while still communicating the same information.

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Tahoe 26.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2026-01-28
 pandoc   3.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.8.26 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package      * version date (UTC) lib source
 P bit            4.6.0   2025-03-06 [?] RSPM (R 4.5.0)
 P bit64          4.6.0-1 2025-01-16 [?] RSPM (R 4.5.0)
 P cli            3.6.5   2025-04-23 [?] RSPM (R 4.5.0)
 P crayon         1.5.3   2024-06-20 [?] RSPM (R 4.5.0)
 P digest         0.6.39  2025-11-19 [?] RSPM (R 4.5.0)
 P dplyr        * 1.1.4   2023-11-17 [?] RSPM (R 4.5.0)
 P evaluate       1.0.5   2025-08-27 [?] RSPM (R 4.5.0)
 P farver         2.1.2   2024-05-13 [?] RSPM (R 4.5.0)
 P fastmap        1.2.0   2024-05-15 [?] RSPM (R 4.5.0)
 P forcats      * 1.0.1   2025-09-25 [?] RSPM (R 4.5.0)
 P generics       0.1.4   2025-05-09 [?] RSPM (R 4.5.0)
 P ggplot2      * 4.0.1   2025-11-14 [?] RSPM (R 4.5.0)
 P glue           1.8.0   2024-09-30 [?] RSPM (R 4.5.0)
 P gtable         0.3.6   2024-10-25 [?] RSPM (R 4.5.0)
 P here           1.0.2   2025-09-15 [?] CRAN (R 4.5.0)
 P hms            1.1.4   2025-10-17 [?] RSPM (R 4.5.0)
 P htmltools      0.5.9   2025-12-04 [?] RSPM (R 4.5.0)
 P htmlwidgets    1.6.4   2023-12-06 [?] RSPM (R 4.5.0)
 P jsonlite       2.0.0   2025-03-27 [?] RSPM (R 4.5.0)
 P knitr          1.51    2025-12-20 [?] RSPM (R 4.5.0)
 P labeling       0.4.3   2023-08-29 [?] RSPM (R 4.5.0)
 P lifecycle      1.0.4   2023-11-07 [?] RSPM (R 4.5.0)
 P lubridate    * 1.9.4   2024-12-08 [?] RSPM (R 4.5.0)
 P magrittr       2.0.4   2025-09-12 [?] RSPM (R 4.5.0)
 P otel           0.2.0   2025-08-29 [?] RSPM (R 4.5.0)
 P pillar         1.11.1  2025-09-17 [?] RSPM (R 4.5.0)
 P pkgconfig      2.0.3   2019-09-22 [?] RSPM (R 4.5.0)
 P purrr        * 1.2.0   2025-11-04 [?] CRAN (R 4.5.0)
 P R6             2.6.1   2025-02-15 [?] RSPM (R 4.5.0)
 P RColorBrewer   1.1-3   2022-04-03 [?] RSPM (R 4.5.0)
 P readr        * 2.1.6   2025-11-14 [?] RSPM (R 4.5.0)
   renv           1.0.11  2024-10-12 [1] CRAN (R 4.5.2)
 P rlang          1.1.6   2025-04-11 [?] RSPM (R 4.5.0)
 P rmarkdown      2.30    2025-09-28 [?] RSPM (R 4.5.0)
 P rprojroot      2.1.1   2025-08-26 [?] RSPM (R 4.5.0)
 P S7             0.2.1   2025-11-14 [?] RSPM (R 4.5.0)
 P scales         1.4.0   2025-04-24 [?] RSPM (R 4.5.0)
 P sessioninfo    1.2.3   2025-02-05 [?] RSPM (R 4.5.0)
 P stringi        1.8.7   2025-03-27 [?] RSPM (R 4.5.0)
 P stringr      * 1.6.0   2025-11-04 [?] RSPM (R 4.5.0)
 P tibble       * 3.3.0   2025-06-08 [?] RSPM (R 4.5.0)
 P tidyr        * 1.3.2   2025-12-19 [?] RSPM (R 4.5.0)
 P tidyselect     1.2.1   2024-03-11 [?] RSPM (R 4.5.0)
 P tidyverse    * 2.0.0   2023-02-22 [?] RSPM (R 4.5.0)
 P timechange     0.3.0   2024-01-18 [?] RSPM (R 4.5.0)
 P tzdb           0.5.0   2025-03-15 [?] RSPM (R 4.5.0)
 P utf8           1.2.6   2025-06-08 [?] RSPM (R 4.5.0)
 P vctrs          0.6.5   2023-12-01 [?] RSPM (R 4.5.0)
 P vroom          1.6.7   2025-11-28 [?] RSPM (R 4.5.0)
 P withr          3.0.2   2024-10-28 [?] RSPM (R 4.5.0)
 P xfun           0.55    2025-12-16 [?] CRAN (R 4.5.2)
 P yaml           2.3.12  2025-12-10 [?] RSPM (R 4.5.0)

 [1] /Users/bcs88/Projects/info-3312/course-site/renv/library/macos/R-4.5/aarch64-apple-darwin20
 [2] /Users/bcs88/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.5/aarch64-apple-darwin20/4cd76b74

 * ── Packages attached to the search path.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. Data source: Redfin.↩︎