library(tidyverse)
library(colorspace) # for improved color palettes
library(scales) # for improved labels
library(ggthemes) # for scale_color_colorblind()
theme_set(theme_classic(base_size = 12))
AE 15: Optimizing color spaces
Suggested answers
Import birth data
The Social Security Administration keeps detailed records on births and deaths in the United States. For our analysis, we will use a dataset of the number of births daily in the United States from 1994-2014.1
<- read_rds("data/births.Rds")
births births
# A tibble: 7,670 × 7
year month date_of_month day_of_week births date_of_month_catego…¹ weekend
<dbl> <ord> <dbl> <ord> <dbl> <fct> <lgl>
1 1994 January 1 Saturday 8096 1 TRUE
2 1994 January 2 Sunday 7772 2 TRUE
3 1994 January 3 Monday 10142 3 FALSE
4 1994 January 4 Tuesday 11248 4 FALSE
5 1994 January 5 Wednesday 11053 5 FALSE
6 1994 January 6 Thursday 11406 6 FALSE
7 1994 January 7 Friday 11251 7 FALSE
8 1994 January 8 Saturday 8653 8 TRUE
9 1994 January 9 Sunday 7910 9 TRUE
10 1994 January 10 Monday 10498 10 FALSE
# ℹ 7,660 more rows
# ℹ abbreviated name: ¹date_of_month_categorical
The Friday the 13th effect
Friday the 13th is considered an unlucky day in Western superstition. Let’s see if fewer babies are born on the 13th of each month if it falls on a Friday compared to another week day. Specifically, we will compare the average number of births on the 13th of the month to the average number of births on the 6th and 20th of the month.
Your turn: Visualize the results using a bar chart. Emphasize the difference on Fridays compared to other weekdays.2
<- births |>
friday_13_births # only look at births on the 6, 13, and 20th
filter(date_of_month %in% c(6, 13, 20)) |>
# distinguish 6/20 from 13
mutate(not_13 = date_of_month == 13) |>
# calculate average number of births for each week day and whether or not it was the 13th
summarize(
avg_births = mean(births),
.by = c(day_of_week, not_13)
|>
) # calculate the difference in percentage
pivot_wider(
names_from = not_13,
values_from = avg_births
|>
) mutate(pct_diff = (`TRUE` - `FALSE`) / `FALSE`) |>
arrange(day_of_week)
# highlight one bar in orange
ggplot(
data = friday_13_births,
mapping = aes(
x = day_of_week,
y = pct_diff,
fill = day_of_week == "Friday"
)+
) geom_col() +
scale_y_continuous(labels = label_percent()) +
scale_fill_manual(values = c("grey50", "orange"), guide = "none") +
labs(
x = NULL,
y = NULL,
title = "The Friday the 13th effect",
subtitle = "Difference in the share of U.S. births on the 13th of each month\nfrom the average of births on the 6th and the 20th, 1994-2014"
)
# use transparency instead
ggplot(
data = friday_13_births,
mapping = aes(
x = day_of_week,
y = pct_diff,
alpha = day_of_week == "Friday"
)+
) geom_col(fill = "orange") +
scale_y_continuous(labels = label_percent()) +
scale_alpha_manual(values = c(0.4, 1), guide = "none") +
labs(
x = NULL,
y = NULL,
title = "The Friday the 13th effect",
subtitle = "Difference in the share of U.S. births on the 13th of each month\nfrom the average of births on the 6th and the 20th, 1994-2014"
)
Create a heatmap showing average number of births by day of year
Let’s explore the relative popularity of each calendar day for births. We will create a heatmap showing the relative ratio of births for each day of the year compared to the annual average.
<- births |>
avg_births_month_day group_by(month, date_of_month_categorical) |>
summarize(avg_births = mean(births), .groups = "drop") |>
mutate(avg_births_ratio = avg_births / mean(births$births))
avg_births_month_day
# A tibble: 366 × 4
month date_of_month_categorical avg_births avg_births_ratio
<ord> <fct> <dbl> <dbl>
1 January 1 7827. 0.700
2 January 2 9356. 0.837
3 January 3 10869. 0.973
4 January 4 11064. 0.990
5 January 5 10992. 0.984
6 January 6 10942. 0.979
7 January 7 10963. 0.981
8 January 8 10656. 0.954
9 January 9 10672. 0.955
10 January 10 11072. 0.991
# ℹ 356 more rows
<- ggplot(
birth_days_plot data = avg_births_month_day,
# By default, the y-axis will have December at the top, so use fct_rev() to reverse it
mapping = aes(x = date_of_month_categorical, y = fct_rev(month), fill = avg_births_ratio)
+
) geom_tile() +
# Add nice labels
labs(
x = "Day of the month", y = NULL,
title = "Average births per day",
subtitle = "1994-2014",
fill = "Ratio of births\nrelative to average"
+
) # Force all the tiles to have equal widths and heights
coord_equal()
birth_days_plot
Your turn: Modify the plot to use an appropriate color palette. What days have an unusually high or low number of births?
+ scale_fill_continuous_diverging(mid = 1) birth_days_plot
+ scale_fill_continuous_diverging(palette = "Blue-Red 3", mid = 1) birth_days_plot
+ scale_fill_continuous_diverging(palette = "Tropic", mid = 1) birth_days_plot
+ scale_fill_continuous_diverging(palette = "Purple-Brown", mid = 1) birth_days_plot
Add response here.
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2025-03-21
pandoc 3.4 @ /usr/local/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
colorspace * 2.1-1 2024-07-26 [1] CRAN (R 4.4.0)
dichromat 2.0-0.1 2022-05-02 [1] CRAN (R 4.3.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.3.1)
ggthemes * 5.1.0 2024-02-10 [1] CRAN (R 4.4.0)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
pillar 1.10.1 2025-01-07 [1] CRAN (R 4.4.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
rstudioapi 0.17.0 2024-10-16 [1] CRAN (R 4.4.1)
scales * 1.3.0.9000 2025-03-19 [1] Github (bensoltoff/scales@71d8f13)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
xfun 0.50.5 2025-01-15 [1] https://yihui.r-universe.dev (R 4.4.2)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
Collected by FiveThirtyEight.↩︎
Essentially a replication of Carl Bialik’s original chart.↩︎