Lecture 16
Cornell University
INFO 3312/5312 - Spring 2026
March 19, 2026
TODO
A poll is a type of survey or inquiry into public opinion conducted by interviewing a random sample of people.
More information: 🔗 Roper Center + 🔗 Pew Research Center
Most public opinion data is proprietary
Some public datasets include:
TODO add more examples about missing values?
# A tibble: 4,138 × 234
response_id start_date right_track economy_better interest registration
<chr> <dttm> <dbl+lbl> <dbl+lbl> <dbl+lb> <dbl+lbl>
1 07700007 2021-01-12 10:52:22 2 [Off on the wrong track] 1 [Better] 1 [Mos… 1 [Regist…
2 07700008 2021-01-12 10:55:11 1 [Generally headed in th… 1 [Better] NA 1 [Regist…
3 07700009 2021-01-12 10:50:00 2 [Off on the wrong track] 2 [About the … 2 [Som… 1 [Regist…
4 07700010 2021-01-12 10:47:32 2 [Off on the wrong track] 2 [About the … 1 [Mos… 1 [Regist…
5 07700011 2021-01-12 10:52:57 2 [Off on the wrong track] 3 [Worse] 3 [Onl… 999 [Don't …
6 07700013 2021-01-12 10:50:17 999 [Not sure] 1 [Better] 2 [Som… 1 [Regist…
7 07700014 2021-01-12 10:49:44 999 [Not sure] 3 [Worse] 2 [Som… 1 [Regist…
8 07700015 2021-01-12 10:51:21 999 [Not sure] 3 [Worse] 2 [Som… 1 [Regist…
9 07700016 2021-01-12 11:01:55 1 [Generally headed in th… 2 [About the … 2 [Som… 1 [Regist…
10 07700020 2021-01-12 10:58:21 2 [Off on the wrong track] 2 [About the … 4 [Har… 1 [Regist…
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <dbl+lbl>, news_sources_cnn <dbl+lbl>,
# news_sources_msnbc <dbl+lbl>, news_sources_fox <dbl+lbl>, news_sources_network <dbl+lbl>,
# news_sources_localtv <dbl+lbl>, news_sources_telemundo <dbl+lbl>, news_sources_npr <dbl+lbl>,
# news_sources_amtalk <dbl+lbl>, news_sources_new_york_times <dbl+lbl>,
# news_sources_local_newspaper <dbl+lbl>, news_sources_other <dbl+lbl>,
# news_sources_other_TEXT <chr>, pres_approval <dbl+lbl>, vote_2016 <dbl+lbl>, …
# A tibble: 4,138 × 234
response_id start_date right_track economy_better interest registration
<chr> <dttm> <fct> <fct> <fct> <fct>
1 07700007 2021-01-12 10:52:22 Off on the wrong track Better Most of… Registered
2 07700008 2021-01-12 10:55:11 Generally headed in the rig… Better <NA> Registered
3 07700009 2021-01-12 10:50:00 Off on the wrong track About the same Some of… Registered
4 07700010 2021-01-12 10:47:32 Off on the wrong track About the same Most of… Registered
5 07700011 2021-01-12 10:52:57 Off on the wrong track Worse Only no… Don't know
6 07700013 2021-01-12 10:50:17 Not sure Better Some of… Registered
7 07700014 2021-01-12 10:49:44 Not sure Worse Some of… Registered
8 07700015 2021-01-12 10:51:21 Not sure Worse Some of… Registered
9 07700016 2021-01-12 11:01:55 Generally headed in the rig… About the same Some of… Registered
10 07700020 2021-01-12 10:58:21 Off on the wrong track About the same Hardly … Registered
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <fct>, news_sources_cnn <fct>,
# news_sources_msnbc <fct>, news_sources_fox <fct>, news_sources_network <fct>,
# news_sources_localtv <fct>, news_sources_telemundo <fct>, news_sources_npr <fct>,
# news_sources_amtalk <fct>, news_sources_new_york_times <fct>,
# news_sources_local_newspaper <fct>, news_sources_other <fct>, news_sources_other_TEXT <chr>,
# pres_approval <fct>, vote_2016 <fct>, vote_2016_other_text <chr>, vote_intention_retro <fct>, …
nationscape |>
# generate frequency counts
count(college, pid3) |>
# drop respondents who were not asked these questions
drop_na() |>
# estimate percentages within pid3 groups
mutate(pct = n / sum(n), .by = pid3) |>
# pivot to cross-tab format
select(-n) |>
pivot_wider(names_from = pid3, values_from = pct)# A tibble: 3 × 5
college Democrat Republican Independent `Something else`
<fct> <dbl> <dbl> <dbl> <dbl>
1 Agree 0.704 0.349 0.487 0.445
2 Disagree 0.148 0.487 0.308 0.264
3 Not Sure 0.149 0.164 0.205 0.291
Source: Animal Farm
Independent Sampling design (with replacement)
Called via srvyr
Sampling variables:
- ids: `1`
- weights: weight
Data variables:
- response_id (chr), start_date (dttm), right_track (fct), economy_better (fct), interest (fct),
registration (fct), news_sources_facebook (fct), news_sources_cnn (fct), news_sources_msnbc
(fct), news_sources_fox (fct), news_sources_network (fct), news_sources_localtv (fct),
news_sources_telemundo (fct), news_sources_npr (fct), news_sources_amtalk (fct),
news_sources_new_york_times (fct), news_sources_local_newspaper (fct), news_sources_other
(fct), news_sources_other_TEXT (chr), pres_approval (fct), vote_2016 (fct),
vote_2016_other_text (chr), vote_intention_retro (fct), vote_2020_retro (fct),
vote_2020_retro_other_text (chr), who_won (fct), who_won_other_text (chr), primary_party_retro
(fct), group_favorability_whites (fct), group_favorability_blacks (fct),
group_favorability_latinos (fct), group_favorability_asians (fct),
group_favorability_evangelicals (fct), group_favorability_socialists (fct),
group_favorability_muslims (fct), group_favorability_labor_unions (fct),
group_favorability_the_police (fct), group_favorability_undocumented (fct),
group_favorability_lgbt (fct), group_favorability_republicans (fct),
group_favorability_democrats (fct), group_favorability_white_men (fct), group_favorability_jews
(fct), group_favorability_blm (fct), group_favorability_trump_s (fct),
group_favorability_biden_s (fct), cand_favorability_trump (fct), cand_favorability_obama (fct),
cand_favorability_biden (fct), cand_favorability_harris (fct), cand_favorability_pence (fct),
rep_prim_vote (fct), rep_prim_vote_TEXT (chr), dem_prim_vote (fct), dem_prim_vote_TEXT (chr),
house_intent_retro (fct), senate_intent_retro (fct), governor_intent_retro (fct),
primary_sen_barrasso (fct), primary_sen_blackburn (fct), primary_sen_blunt (fct),
primary_sen_boozman (fct), primary_sen_crapo (fct), primary_sen_cruz (fct), primary_sen_fischer
(fct), primary_sen_grassley (fct), primary_sen_hoeven (fct), primary_sen_lankford (fct),
primary_sen_lee (fct), primary_sen_moran (fct), primary_sen_murkowski (fct),
primary_sen_neelykennedy (fct), primary_sen_paul (fct), primary_sen_portman (fct),
primary_sen_rubio (fct), primary_sen_scott_tim (fct), primary_sen_shelby (fct),
primary_sen_thune (fct), primary_sen_toomey (fct), primary_sen_wicker (fct), primary_sen_young
(fct), primary_sen_braun (fct), primary_sen_cramer (fct), primary_sen_hawley (fct),
primary_sen_romney (fct), primary_sen_scott_rick (fct), cand_truth_donald_trump (fct),
cand_truth_joe_biden (fct), cand_facts_donald_trump (fct), cand_facts_joe_biden (fct),
pence_president (fct), racial_attitudes_tryhard (fct), racial_attitudes_generations (fct),
racial_attitudes_marry (fct), racial_attitudes_date (fct), gender_attitudes_maleboss (fct),
gender_attitudes_logical (fct), gender_attitudes_opportunity (fct), gender_attitudes_complain
(fct), discrimination_blacks (fct), discrimination_whites (fct), discrimination_muslims (fct),
discrimination_christians (fct), discrimination_jews (fct), discrimination_women (fct),
discrimination_men (fct), discrimination_asians (fct), discrimination_latinos (fct),
sen_knowledge (fct), sc_knowledge (fct), pid3 (fct), pid7 (fct), pid7_legacy (fct),
strength_democrat (fct), strength_republican (fct), lean_independent (fct), ideo5 (fct),
employment (fct), employment_other_text (chr), work_location (fct), foreign_born (fct),
language (fct), religion (fct), religion_other_text (chr), is_evangelical (fct),
orientation_group (fct), in_union (fct), married (fct), extra_n_children (dbl),
household_gun_owner (fct), wall (fct), cap_carbon (fct), guns_bg (fct), mctaxes (fct),
estate_tax (fct), raise_upper_tax (fct), college (fct), abortion_any_time (fct), abortion_never
(fct), abortion_conditions (fct), late_term_abortion (fct), abolish_priv_insurance (fct),
abortion_insurance (fct), abortion_waiting (fct), china_tariffs (fct), criminal_immigration
(fct), environment (fct), guaranteed_jobs (fct), green_new_deal (fct), gun_registry (fct),
immigration_insurance (fct), immigration_separation (fct), immigration_system (fct),
immigration_wire (fct), israel (fct), marijuana (fct), maternityleave (fct), medicare_for_all
(fct), military_size (fct), minwage (fct), muslimban (fct), oil_and_gas (fct), reparations
(fct), right_to_work (fct), saudi_arabia (fct), ten_commandments (fct), trade (fct),
trans_military (fct), uctaxes2 (fct), vouchers (fct), gov_insurance (fct), public_option (fct),
health_subsidies (fct), path_to_citizenship (fct), dreamers (fct), deportation (fct), ban_guns
(fct), ban_assault_rifles (fct), limit_magazines (fct), impeach_trump (fct), egypt (fct),
fc_smallgov (fct), fc_trad_val (fct), statements_protect_traditions (fct),
statements_defense_burden (fct), statements_trade_effects (fct),
statements_christianity_assault (fct), statements_gender_identity (fct),
statements_american_loss (fct), statements_imm_assimilate (fct), statements_gun_rights (fct),
statements_confront_china (fct), statements_foreign_interests (fct), elect_conf_conduct_retro
(fct), elect_conf_vote_retro (fct), extra_vote_mail_retr (fct), extra_vacc_flu (dbl),
extra_vacc_covid (dbl), extra_dem_violence (fct), extra_ind_violence (fct), extra_rep_violence
(fct), extra_corona_concern (fct), extra_sick_you (fct), extra_sick_family (fct),
extra_sick_work (fct), extra_sick_other (fct), extra_covid_worn_mask (fct),
extra_covid_socialize_distance (fct), extra_covid_socialize_no_dist (fct), extra_trump_corona
(fct), extra_gub_corona (fct), extra_covid_cancel_meet (fct), extra_covid_close_business (fct),
extra_covid_close_schools (fct), extra_covid_work_home (fct), extra_covid_restrict_home (fct),
extra_covid_testing (fct), extra_covid_require_mask (fct), capitol_approval (fct),
capitol_trump_approv (fct), capitol_trump_more (fct), twitter_ban (fct), age (dbl), gender
(fct), census_region (fct), hispanic (fct), race_ethnicity (fct), household_income (fct),
education (fct), state (chr), congress_district (chr), weight (dbl), weight_2020 (dbl),
weight_both (dbl)
# A tibble: 19 × 6
# Groups: college [4]
college pid3 pct pct_se total total_se
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 Agree Democrat 0.499 0.0186 1053. 51.2
2 Agree Republican 0.173 0.0135 364. 30.4
3 Agree Independent 0.286 0.0176 602. 44.2
4 Agree Something else 0.0422 0.00795 89.0 17.1
5 Agree <NA> 0.000471 0.000471 0.993 0.993
6 Disagree Democrat 0.198 0.0186 246. 25.5
7 Disagree Republican 0.459 0.0232 569. 37.1
8 Disagree Independent 0.285 0.0212 353. 30.8
9 Disagree Something else 0.0581 0.0124 72.1 16.0
10 Disagree <NA> 0.0000119 0.0000119 0.0147 0.0147
11 Not Sure Democrat 0.329 0.0300 247. 27.4
12 Not Sure Republican 0.255 0.0270 191. 23.0
13 Not Sure Independent 0.302 0.0289 226. 25.6
14 Not Sure Something else 0.114 0.0207 85.3 16.5
15 <NA> Democrat 0.228 0.0926 9.08 3.55
16 <NA> Republican 0.334 0.133 13.3 6.35
17 <NA> Independent 0.277 0.143 11.0 7.09
18 <NA> Something else 0.143 0.115 5.70 5.06
19 <NA> <NA> 0.0172 0.0143 0.684 0.543
# A tibble: 3 × 5
# Groups: college [3]
college Democrat Republican Independent `Something else`
<fct> <dbl> <dbl> <dbl> <dbl>
1 Agree 0.499 0.173 0.286 0.0422
2 Disagree 0.198 0.459 0.285 0.0581
3 Not Sure 0.329 0.255 0.302 0.114
# A tibble: 4 × 3
pid3 pct pct_se
<fct> <dbl> <dbl>
1 Democrat 68.4 1.57
2 Republican 55.9 2.00
3 Independent 55.5 2.08
4 Something else 39.6 4.21
TODO clean up any of the examples? Actually visualize the data with uncertainty measures?
Source: YouGov
Before we can visualize top-line survey results, we need to import and wrangle the data.
Often complicated because the data is reported in a PDF document not intended for programmatic usage.
You can use LLMs to extract tabular data, but should you?
Requires careful prompting and expensive API calls (or monthly plan), and the results may be unreliable.
Instead, just write code to do it in R!
# A tibble: 22 × 11
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <lgl> <chr>
1 <NA> <NA> <NA> Sex <NA> Race <NA> NA Age NA Educ…
2 <NA> Total Male Female White <NA> Black Hispanic 18-29 NA 30-44 45-… NA No d…
3 Support 33% 40% 26% 39% <NA> 8% 25% 22% NA 26% 39% 4… NA 35% …
4 Oppose 56% 52% 59% 51% <NA> 75% 62% 64% NA 58% 52% 5… NA 51% …
5 Strongly support 17% 24% 11% 21% 7% 14% 10% NA 12% 19% 2… NA 18% …
6 Somewhat support 16% 17% 15% 18% 1% 12% 12% NA 14% 19% 1… NA 17% …
7 Somewhat oppose 13% 11% 16% 13% 16% 14% 15% NA 13% 14% 1… NA 14% …
8 Strongly oppose 42% 41% 43% 38% 59% 48% 49% NA 45% 38% 3… NA 37% …
9 Not sure 11% 7% 15% 11% 17% 13% 14% NA 16% 10% 6% NA 14% …
10 Totals 99% 100% 100% 101% <NA> 100% 101% 100% NA 100% 100%… NA 100%…
# ℹ 12 more rows
# A tibble: 5 × 4
response Democrats Independents Republicans
<chr> <chr> <chr> <chr>
1 Strongly support 3% 10% 40%
2 Somewhat support 3% 13% 33%
3 Somewhat oppose 14% 15% 11%
4 Strongly oppose 75% 47% 5%
5 Not sure 6% 15% 12%
A psychometric scale used to scale responses in survey research, often used to measure attitudes or opinions.
Scale is symmetric/bilateral with a neutral midpoint, and typically has 5 or 7 response options:
stack_bar_p <- ggplot(
data = iran_war_long,
mapping = aes(x = pct, y = pid3, fill = response)
) +
geom_col() +
scale_x_continuous(labels = label_percent(), position = "top") +
scale_fill_discrete_diverging(
palette = "Blue-Red",
labels = label_wrap(width = 8),
guide = guide_legend(
reverse = TRUE,
theme = theme(legend.key.width = unit(1, "cm"))
)
) +
labs(
x = NULL,
y = NULL,
fill = NULL,
title = "Do you support or oppose the war with Iran?",
caption = "Source: YouGov (March 13-16, 2026)"
) +
theme(
legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.length.x = unit(0.25, "cm")
)
stack_bar_p# main plot
div_bar_no_neutral_p <- iran_war_long |>
# remove Not sure responses
filter(response != "Not sure") |>
# shift negative responses to the left of the origin
mutate(
pct = if_else(
response %in% c("Strongly oppose", "Somewhat oppose"),
-pct,
pct
)
) |>
ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
geom_col() +
scale_x_continuous(
breaks = seq(from = -.8, to = .6, by = .2),
labels = label_percent(),
position = "top"
) +
scale_fill_discrete_diverging(
palette = "Blue-Red",
labels = label_wrap(width = 8),
guide = guide_legend(
reverse = TRUE,
theme = theme(legend.key.width = unit(1, "cm"))
)
) +
labs(
x = NULL,
y = NULL,
fill = NULL
) +
theme(
legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.length.x = unit(0.25, "cm")
)
# separate plot for neutrals
div_bar_neutral_p <- iran_war_long |>
filter(response == "Not sure") |>
ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
geom_col() +
scale_x_continuous(
breaks = c(0, .2),
limits = c(NA, .2),
labels = label_percent(),
position = "top"
) +
scale_fill_discrete_diverging(
palette = "Blue-Red",
labels = label_wrap(width = 8),
guide = guide_legend(
reverse = TRUE,
theme = theme(legend.key.width = unit(1, "cm"),
legend.justification = "left")
)
) +
labs(
x = NULL,
y = NULL,
fill = NULL
) +
theme(
legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.length.x = unit(0.25, "cm")
)
# combine together with patchwork
div_bar_no_neutral_p +
div_bar_neutral_p +
# add shared title and caption
plot_annotation(
title = "Do you support or oppose the war with Iran?",
caption = "Source: YouGov (March 13-16, 2026)"
) +
# combine shared axes
plot_layout(
widths = c(4, 1),
axes = "collect"
)# split the neutral responses in half and plot them on either side of the origin
div_int_neutral_p <- iran_war_long |>
# neutral will be split into two equal halves
mutate(pct_plot = if_else(response == "Not sure", pct / 2, pct)) |>
# duplicate neutral rows so one half can go left and one right
uncount(
weights = if_else(response == "Not sure", 2L, 1L),
.id = "neutral_half"
) |>
# invert oppose responses and one half of neutrals to negative values
mutate(
pct_plot = case_when(
response %in% c("Strongly oppose", "Somewhat oppose") ~ -pct_plot,
response == "Not sure" & neutral_half == 1L ~ -pct_plot,
.default = pct_plot
),
# convert response to character vector
# factors with duplicated levels can cause issues with plotting
response = as.character(response)
) |>
ggplot(mapping = aes(x = pct_plot, y = pid3, fill = response)) +
# reverse the stacking order since response is no longer a factor
geom_col(position = position_stack(reverse = TRUE)) +
geom_vline(xintercept = 0, linewidth = 0.4, color = "gray40") +
scale_x_continuous(
breaks = seq(from = -.8, to = .6, by = .2),
labels = label_percent(),
position = "top"
) +
# manually generate the diverging color scale since response is no longer a factor
scale_fill_manual(
labels = label_wrap(width = 8),
# fix the order to match the other plots
breaks = c(
"Strongly support",
"Somewhat support",
"Not sure",
"Somewhat oppose",
"Strongly oppose"
),
# generate palette manually as a character vector
values = diverging_hcl(palette = "Blue-Red", n = 5),
guide = guide_legend(
reverse = TRUE,
theme = theme(legend.key.width = unit(1, "cm"))
)
) +
labs(
x = NULL,
y = NULL,
fill = NULL,
title = "Do you support or oppose the war with Iran?",
caption = "Source: YouGov (March 13-16, 2026)"
) +
theme(
legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.length.x = unit(0.25, "cm")
)
div_int_neutral_psplit_bars_p <- ggplot(
data = iran_war_long,
mapping = aes(x = pct, y = pid3, fill = response)
) +
geom_col() +
scale_x_continuous(
breaks = seq(from = 0, to = 1, by = 0.2),
labels = label_percent(),
position = "top"
) +
scale_fill_discrete_diverging(
palette = "Blue-Red",
guide = "none"
) +
facet_wrap(
facets = vars(response |> fct_rev()),
nrow = 1,
space = "free_x",
scales = "free_x",
labeller = label_wrap_gen(width = 15)
) +
labs(
x = NULL,
y = NULL,
fill = NULL,
title = "Do you support or oppose the war with Iran?",
caption = "Source: YouGov (March 13-16, 2026)"
) +
theme(
panel.grid = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.length.x = unit(0.25, "cm"),
# shrink the text size for the facet labels to fit
strip.text = element_text(size = rel(0.7),
margin = margin(t = 1, r = 0, b = 1, l = 0, unit = "mm"))
)
split_bars_pae-15Compare the advantages of different bar chart designs for reporting Likert scale data



