Lecture 11
Cornell University
INFO 3312/5312 - Spring 2024
February 29, 2024
Sequential plots: Motivation, then resolution
A single plot: Resolution, and hidden in it motivation
Project note: you’re asked to create two plots per question. One possible approach: Start with a plot showing the raw data, and show derived quantities (e.g. percent increases, averages, coefficients of fitted models) in the subsequent plot.
When you’re trying to show too much data at once you may end up not showing anything.
Never assume your audience can rapidly process complex visual displays
Don’t add variables to your plot that are tangential to your story
Don’t jump straight to a highly complex figure; first show an easily digestible subset (e.g., show one facet first)
Aim for memorable, but clear
Project note: Make sure to leave time to iterate on your plots after you practice your presentation. If certain plots are getting too wordy to explain, take time to simplify them!
Be consistent but don’t be repetitive.
Use consistent features throughout plots (e.g., same color represents same level on all plots)
Aim to use a different type of visualization for each distinct analysis
Project note: If possible, ask a friend who is not in the class to listen to your presentation and then ask them what they remember. Then, ask yourself: is that what you wanted them to remember?
# A tibble: 3,420 × 5
type name abbrev date pct_change
<chr> <chr> <chr> <date> <dbl>
1 country United States US 2020-04-01 -1.00
2 country United States US 2020-04-02 -1.00
3 country United States US 2020-04-03 -1.00
4 country United States US 2020-04-04 -1.00
5 country United States US 2020-04-05 -1.00
6 country United States US 2020-04-06 -1
7 country United States US 2020-04-07 -1.00
8 country United States US 2020-04-08 -1.00
9 country United States US 2020-04-09 -1
10 country United States US 2020-04-10 -1.00
# ℹ 3,410 more rows
proj-01
https://pages.github.coecis.cornell.edu/info3312-sp24/proj-01-YOUR_TEAM_NAME/
library(gapminder)
gapminder_07 <- filter(.data = gapminder, year == 2007)
p_hist <- ggplot(data = gapminder_07, mapping = aes(x = lifeExp)) +
geom_histogram(binwidth = 2)
p_box <- ggplot(data = gapminder_07, mapping = aes(x = continent, y = lifeExp)) +
geom_boxplot()
p_scatter <- ggplot(data = gapminder_07, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
p_text <- gapminder_07 |>
filter(continent == "Americas") |>
ggplot(mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_text_repel(mapping = aes(label = country)) +
coord_cartesian(clip = "off")
The plot will fill the empty space in the slide.
If there is more text on the slide
The plot will shrink
To make room for the text
fig-width
For a zoomed-in look
fig-width
For a zoomed-out look
fig-width
affects text sizeFirst, ask yourself, must you include multiple plots on a slide? For example, is your narrative about comparing results from two plots?
If no, then don’t! Move the second plot to to the next slide!
If yes:
Insert columns using the Insert anything tool
Use layout-ncol
chunk option
Use the patchwork package
Possibly, use pivoting to reshape your data and then use facets
Insert > Slide Columns
Quarto will automatically resize your plots to fit side-by-side.
layout-ncol
Learn more at https://patchwork.data-imaginist.com.
library(tidyverse)
library(rvest)
library(tvthemes)
# get episode ratings for season 1
ratings_page <- read_html(x = "https://www.imdb.com/title/tt9018736/episodes/?ref_=tt_eps_sm")
# extract elements
ratings_raw <- tibble(
episode = html_elements(x = ratings_page, css = ".bblZrR .ipc-title__text") |>
html_text2(),
rating = html_elements(x = ratings_page, css = ".ratingGroup--imdb-rating") |>
html_text2()
)
# clean data
ratings <- ratings_raw |>
# separate episode number and title
separate_wider_delim(
cols = episode,
delim = " ∙ ",
names = c("episode_number", "episode_title")
) |>
separate_wider_delim(
cols = episode_number,
delim = ".",
names = c("season", "episode_number")
) |>
# separate rating and number of votes
separate_wider_delim(
cols = rating,
delim = " ",
names = c("rating", "votes")
) |>
# convert numeric variables
mutate(
across(
.cols = -episode_title,
.fns = parse_number
),
votes = votes * 1e03
)
# draw the plot
ratings |>
# generate x-axis tick mark labels with title and epsiode number
mutate(
episode_title = str_glue("{episode_title}\n(S{season}E{episode_number})"),
episode_title = fct_reorder(.f = episode_title, .x = episode_number)
) |>
# draw a lollipop chart
ggplot(mapping = aes(x = episode_title, y = rating)) +
geom_point(mapping = aes(size = votes)) +
geom_segment(
mapping = aes(
x = episode_title, xend = episode_title,
y = 0, yend = rating
)
) +
# adjust the size scale
scale_size(range = c(3, 8)) +
# label the chart
labs(
title = "Live-action Avatar The Last Airbender is decent",
x = NULL,
y = "IMDB rating",
caption = "Source: IMDB"
) +
# use an Avatar theme
theme_avatar(
# custom font
title.font = "Slayer",
text.font = "Slayer",
legend.font = "Slayer",
# shrink legend text size
legend.title.size = 8,
legend.text.size = 6
) +
theme(
# remove undesired grid lines
panel.grid.major.x = element_blank(),
panel.grid.minor.y = element_blank(),
# move legend to the top
legend.position = "top",
# align title flush with the edge
plot.title.position = "plot",
# shink x-axis text labels to fit
axis.text.x = element_text(size = rel(x = 0.7))
)