Visualizing survey data

Lecture 16

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2026

March 19, 2026

Announcements

Mini-project
Project 02

Learning objectives

Identify methods for importing survey datasets
Estimate summary statistics using survey weights
Import and wrangle messy data from top-line survey reports
Design bar charts for reporting Likert scale data
Evaluate the interpretability of various bar charts

Reporting on public opinion

A brief history of public opinion polling

A poll is a type of survey or inquiry into public opinion conducted by interviewing a random sample of people.

George Gallup and the 1936 presidential election
Random-digit dialing
Shift to probability-based online surveys

Sources of raw public opinion data

Most public opinion data is proprietary

Some public datasets include:

Common challenges working with survey datasets

File formats (Stata or SPSS)
Encoding missing values
Survey weights

Visualizing the Nationscape survey

Import using {haven}

library(haven)

nationscape <- read_dta(file = "data/ns20210112.dta")
nationscape

# A tibble: 4,138 × 234
   response_id start_date          right_track                  economy_better interest registration
   <chr>       <dttm>              <dbl+lbl>                    <dbl+lbl>      <dbl+lb> <dbl+lbl>   
 1 07700007    2021-01-12 10:52:22   2 [Off on the wrong track] 1 [Better]      1 [Mos…   1 [Regist…
 2 07700008    2021-01-12 10:55:11   1 [Generally headed in th… 1 [Better]     NA         1 [Regist…
 3 07700009    2021-01-12 10:50:00   2 [Off on the wrong track] 2 [About the …  2 [Som…   1 [Regist…
 4 07700010    2021-01-12 10:47:32   2 [Off on the wrong track] 2 [About the …  1 [Mos…   1 [Regist…
 5 07700011    2021-01-12 10:52:57   2 [Off on the wrong track] 3 [Worse]       3 [Onl… 999 [Don't …
 6 07700013    2021-01-12 10:50:17 999 [Not sure]               1 [Better]      2 [Som…   1 [Regist…
 7 07700014    2021-01-12 10:49:44 999 [Not sure]               3 [Worse]       2 [Som…   1 [Regist…
 8 07700015    2021-01-12 10:51:21 999 [Not sure]               3 [Worse]       2 [Som…   1 [Regist…
 9 07700016    2021-01-12 11:01:55   1 [Generally headed in th… 2 [About the …  2 [Som…   1 [Regist…
10 07700020    2021-01-12 10:58:21   2 [Off on the wrong track] 2 [About the …  4 [Har…   1 [Regist…
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <dbl+lbl>, news_sources_cnn <dbl+lbl>,
#   news_sources_msnbc <dbl+lbl>, news_sources_fox <dbl+lbl>, news_sources_network <dbl+lbl>,
#   news_sources_localtv <dbl+lbl>, news_sources_telemundo <dbl+lbl>, news_sources_npr <dbl+lbl>,
#   news_sources_amtalk <dbl+lbl>, news_sources_new_york_times <dbl+lbl>,
#   news_sources_local_newspaper <dbl+lbl>, news_sources_other <dbl+lbl>,
#   news_sources_other_TEXT <chr>, pres_approval <dbl+lbl>, vote_2016 <dbl+lbl>, …

Convert labels to factors

nationscape <- as_factor(nationscape)
nationscape

# A tibble: 4,138 × 234
   response_id start_date          right_track                  economy_better interest registration
   <chr>       <dttm>              <fct>                        <fct>          <fct>    <fct>       
 1 07700007    2021-01-12 10:52:22 Off on the wrong track       Better         Most of… Registered  
 2 07700008    2021-01-12 10:55:11 Generally headed in the rig… Better         <NA>     Registered  
 3 07700009    2021-01-12 10:50:00 Off on the wrong track       About the same Some of… Registered  
 4 07700010    2021-01-12 10:47:32 Off on the wrong track       About the same Most of… Registered  
 5 07700011    2021-01-12 10:52:57 Off on the wrong track       Worse          Only no… Don't know  
 6 07700013    2021-01-12 10:50:17 Not sure                     Better         Some of… Registered  
 7 07700014    2021-01-12 10:49:44 Not sure                     Worse          Some of… Registered  
 8 07700015    2021-01-12 10:51:21 Not sure                     Worse          Some of… Registered  
 9 07700016    2021-01-12 11:01:55 Generally headed in the rig… About the same Some of… Registered  
10 07700020    2021-01-12 10:58:21 Off on the wrong track       About the same Hardly … Registered  
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <fct>, news_sources_cnn <fct>,
#   news_sources_msnbc <fct>, news_sources_fox <fct>, news_sources_network <fct>,
#   news_sources_localtv <fct>, news_sources_telemundo <fct>, news_sources_npr <fct>,
#   news_sources_amtalk <fct>, news_sources_new_york_times <fct>,
#   news_sources_local_newspaper <fct>, news_sources_other <fct>, news_sources_other_TEXT <chr>,
#   pres_approval <fct>, vote_2016 <fct>, vote_2016_other_text <chr>, vote_intention_retro <fct>, …

Estimating summary statistics

nationscape |>
  # generate frequency counts
  count(college, pid3) |>
  # drop respondents who were not asked these questions
  drop_na() |>
  # estimate percentages within pid3 groups
  mutate(pct = n / sum(n), .by = pid3) |>
  # pivot to cross-tab format
  select(-n) |>
  pivot_wider(names_from = pid3, values_from = pct)

# A tibble: 3 × 5
  college  Democrat Republican Independent `Something else`
  <fct>       <dbl>      <dbl>       <dbl>            <dbl>
1 Agree       0.704      0.349       0.487            0.445
2 Disagree    0.148      0.487       0.308            0.264
3 Not Sure    0.149      0.164       0.205            0.291

Accounting for survey weights

nationscape |> select(weight)

# A tibble: 4,138 × 1
   weight
    <dbl>
 1 2.17  
 2 3.61  
 3 5.01  
 4 0.969 
 5 0.173 
 6 0.153 
 7 0.924 
 8 0.144 
 9 0.0964
10 2.64  
# ℹ 4,128 more rows

Accounting for survey weights with {srvyr}

library(srvyr)

nationscape |>
  as_survey_design(ids = 1, weights = weight)

Independent Sampling design (with replacement)
Called via srvyr
Sampling variables:
  - ids: `1` 
  - weights: weight 
Data variables: 
  - response_id (chr), start_date (dttm), right_track (fct), economy_better (fct), interest (fct),
    registration (fct), news_sources_facebook (fct), news_sources_cnn (fct), news_sources_msnbc
    (fct), news_sources_fox (fct), news_sources_network (fct), news_sources_localtv (fct),
    news_sources_telemundo (fct), news_sources_npr (fct), news_sources_amtalk (fct),
    news_sources_new_york_times (fct), news_sources_local_newspaper (fct), news_sources_other
    (fct), news_sources_other_TEXT (chr), pres_approval (fct), vote_2016 (fct),
    vote_2016_other_text (chr), vote_intention_retro (fct), vote_2020_retro (fct),
    vote_2020_retro_other_text (chr), who_won (fct), who_won_other_text (chr), primary_party_retro
    (fct), group_favorability_whites (fct), group_favorability_blacks (fct),
    group_favorability_latinos (fct), group_favorability_asians (fct),
    group_favorability_evangelicals (fct), group_favorability_socialists (fct),
    group_favorability_muslims (fct), group_favorability_labor_unions (fct),
    group_favorability_the_police (fct), group_favorability_undocumented (fct),
    group_favorability_lgbt (fct), group_favorability_republicans (fct),
    group_favorability_democrats (fct), group_favorability_white_men (fct), group_favorability_jews
    (fct), group_favorability_blm (fct), group_favorability_trump_s (fct),
    group_favorability_biden_s (fct), cand_favorability_trump (fct), cand_favorability_obama (fct),
    cand_favorability_biden (fct), cand_favorability_harris (fct), cand_favorability_pence (fct),
    rep_prim_vote (fct), rep_prim_vote_TEXT (chr), dem_prim_vote (fct), dem_prim_vote_TEXT (chr),
    house_intent_retro (fct), senate_intent_retro (fct), governor_intent_retro (fct),
    primary_sen_barrasso (fct), primary_sen_blackburn (fct), primary_sen_blunt (fct),
    primary_sen_boozman (fct), primary_sen_crapo (fct), primary_sen_cruz (fct), primary_sen_fischer
    (fct), primary_sen_grassley (fct), primary_sen_hoeven (fct), primary_sen_lankford (fct),
    primary_sen_lee (fct), primary_sen_moran (fct), primary_sen_murkowski (fct),
    primary_sen_neelykennedy (fct), primary_sen_paul (fct), primary_sen_portman (fct),
    primary_sen_rubio (fct), primary_sen_scott_tim (fct), primary_sen_shelby (fct),
    primary_sen_thune (fct), primary_sen_toomey (fct), primary_sen_wicker (fct), primary_sen_young
    (fct), primary_sen_braun (fct), primary_sen_cramer (fct), primary_sen_hawley (fct),
    primary_sen_romney (fct), primary_sen_scott_rick (fct), cand_truth_donald_trump (fct),
    cand_truth_joe_biden (fct), cand_facts_donald_trump (fct), cand_facts_joe_biden (fct),
    pence_president (fct), racial_attitudes_tryhard (fct), racial_attitudes_generations (fct),
    racial_attitudes_marry (fct), racial_attitudes_date (fct), gender_attitudes_maleboss (fct),
    gender_attitudes_logical (fct), gender_attitudes_opportunity (fct), gender_attitudes_complain
    (fct), discrimination_blacks (fct), discrimination_whites (fct), discrimination_muslims (fct),
    discrimination_christians (fct), discrimination_jews (fct), discrimination_women (fct),
    discrimination_men (fct), discrimination_asians (fct), discrimination_latinos (fct),
    sen_knowledge (fct), sc_knowledge (fct), pid3 (fct), pid7 (fct), pid7_legacy (fct),
    strength_democrat (fct), strength_republican (fct), lean_independent (fct), ideo5 (fct),
    employment (fct), employment_other_text (chr), work_location (fct), foreign_born (fct),
    language (fct), religion (fct), religion_other_text (chr), is_evangelical (fct),
    orientation_group (fct), in_union (fct), married (fct), extra_n_children (dbl),
    household_gun_owner (fct), wall (fct), cap_carbon (fct), guns_bg (fct), mctaxes (fct),
    estate_tax (fct), raise_upper_tax (fct), college (fct), abortion_any_time (fct), abortion_never
    (fct), abortion_conditions (fct), late_term_abortion (fct), abolish_priv_insurance (fct),
    abortion_insurance (fct), abortion_waiting (fct), china_tariffs (fct), criminal_immigration
    (fct), environment (fct), guaranteed_jobs (fct), green_new_deal (fct), gun_registry (fct),
    immigration_insurance (fct), immigration_separation (fct), immigration_system (fct),
    immigration_wire (fct), israel (fct), marijuana (fct), maternityleave (fct), medicare_for_all
    (fct), military_size (fct), minwage (fct), muslimban (fct), oil_and_gas (fct), reparations
    (fct), right_to_work (fct), saudi_arabia (fct), ten_commandments (fct), trade (fct),
    trans_military (fct), uctaxes2 (fct), vouchers (fct), gov_insurance (fct), public_option (fct),
    health_subsidies (fct), path_to_citizenship (fct), dreamers (fct), deportation (fct), ban_guns
    (fct), ban_assault_rifles (fct), limit_magazines (fct), impeach_trump (fct), egypt (fct),
    fc_smallgov (fct), fc_trad_val (fct), statements_protect_traditions (fct),
    statements_defense_burden (fct), statements_trade_effects (fct),
    statements_christianity_assault (fct), statements_gender_identity (fct),
    statements_american_loss (fct), statements_imm_assimilate (fct), statements_gun_rights (fct),
    statements_confront_china (fct), statements_foreign_interests (fct), elect_conf_conduct_retro
    (fct), elect_conf_vote_retro (fct), extra_vote_mail_retr (fct), extra_vacc_flu (dbl),
    extra_vacc_covid (dbl), extra_dem_violence (fct), extra_ind_violence (fct), extra_rep_violence
    (fct), extra_corona_concern (fct), extra_sick_you (fct), extra_sick_family (fct),
    extra_sick_work (fct), extra_sick_other (fct), extra_covid_worn_mask (fct),
    extra_covid_socialize_distance (fct), extra_covid_socialize_no_dist (fct), extra_trump_corona
    (fct), extra_gub_corona (fct), extra_covid_cancel_meet (fct), extra_covid_close_business (fct),
    extra_covid_close_schools (fct), extra_covid_work_home (fct), extra_covid_restrict_home (fct),
    extra_covid_testing (fct), extra_covid_require_mask (fct), capitol_approval (fct),
    capitol_trump_approv (fct), capitol_trump_more (fct), twitter_ban (fct), age (dbl), gender
    (fct), census_region (fct), hispanic (fct), race_ethnicity (fct), household_income (fct),
    education (fct), state (chr), congress_district (chr), weight (dbl), weight_2020 (dbl),
    weight_both (dbl)

Estimating weighted proportions

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  summarize(
    pct = survey_mean(),
    total = survey_total(),
    .by = c(pid3, college)
  )

# A tibble: 19 × 6
# Groups:   pid3 [5]
   pid3           college      pct  pct_se     total total_se
   <fct>          <fct>      <dbl>   <dbl>     <dbl>    <dbl>
 1 Democrat       Agree    0.677   0.0200  1053.      51.2   
 2 Democrat       Disagree 0.158   0.0153   246.      25.5   
 3 Democrat       Not Sure 0.159   0.0162   247.      27.4   
 4 Democrat       <NA>     0.00584 0.00229    9.08     3.55  
 5 Republican     Agree    0.320   0.0224   364.      30.4   
 6 Republican     Disagree 0.500   0.0240   569.      37.1   
 7 Republican     Not Sure 0.168   0.0185   191.      23.0   
 8 Republican     <NA>     0.0117  0.00555   13.3      6.35  
 9 Independent    Agree    0.505   0.0256   602.      44.2   
10 Independent    Disagree 0.296   0.0226   353.      30.8   
11 Independent    Not Sure 0.190   0.0197   226.      25.6   
12 Independent    <NA>     0.00925 0.00591   11.0      7.09  
13 Something else Agree    0.353   0.0551    89.0     17.1   
14 Something else Disagree 0.286   0.0531    72.1     16.0   
15 Something else Not Sure 0.338   0.0541    85.3     16.5   
16 Something else <NA>     0.0226  0.0198     5.70     5.06  
17 <NA>           Agree    0.587   0.307      0.993    0.993 
18 <NA>           Disagree 0.00870 0.0104     0.0147   0.0147
19 <NA>           <NA>     0.404   0.305      0.684    0.543

Turn into a cross-tab

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  summarize(
    pct = survey_mean(),
    total = survey_total(),
    .by = c(pid3, college)
  ) |>
  drop_na() |>
  select(college, pid3, pct) |>
  pivot_wider(names_from = pid3, values_from = pct)

# A tibble: 3 × 5
  college  Democrat Republican Independent `Something else`
  <fct>       <dbl>      <dbl>       <dbl>            <dbl>
1 Agree       0.677      0.320       0.505            0.353
2 Disagree    0.158      0.500       0.296            0.286
3 Not Sure    0.159      0.168       0.190            0.338

Estimating weighted means

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  drop_na(pid3, extra_vacc_covid) |>
  summarize(
    pct = survey_mean(extra_vacc_covid),
    .by = pid3
  )

# A tibble: 4 × 3
  pid3             pct pct_se
  <fct>          <dbl>  <dbl>
1 Democrat        68.4   1.57
2 Republican      55.9   2.00
3 Independent     55.5   2.08
4 Something else  39.6   4.21

Working with top-line survey results

Top-line survey reports

Question-by-question results reported in tabular format
Broken down by key demographic groups
Frequently released by public opinion research firms, whereas respondent-level data is often proprietary

How to visualize top-line survey results

Before we can visualize top-line survey results, we need to import and wrangle the data.

Often complicated because the data is reported in a PDF document not intended for programmatic usage.

Before we reach for LLMs

You can use LLMs to extract tabular data, but should you?

Requires careful prompting and expensive API calls (or monthly plan), and the results may be unreliable.

Instead, just write code to do it in R!

Extracting table with {tabulapdf}

library(tabulapdf)

iran_war <- extract_tables(
  file = "data/econTabReport_CwWXhS2.pdf",
  pages = 7,
  col_names = FALSE
) |>
  nth(1)
iran_war

# A tibble: 22 × 11
   X1               X2    X3    X4           X5    X6    X7             X8    X9         X10   X11  
   <chr>            <chr> <chr> <chr>        <chr> <chr> <chr>          <lgl> <chr>      <lgl> <chr>
 1 <NA>             <NA>  <NA>  Sex          <NA>  Race  <NA>           NA    Age        NA    Educ…
 2 <NA>             Total Male  Female White <NA>  Black Hispanic 18-29 NA    30-44 45-… NA    No d…
 3 Support          33%   40%   26% 39%      <NA>  8%    25% 22%        NA    26% 39% 4… NA    35% …
 4 Oppose           56%   52%   59% 51%      <NA>  75%   62% 64%        NA    58% 52% 5… NA    51% …
 5 Strongly support 17%   24%   11%          21%   7%    14% 10%        NA    12% 19% 2… NA    18% …
 6 Somewhat support 16%   17%   15%          18%   1%    12% 12%        NA    14% 19% 1… NA    17% …
 7 Somewhat oppose  13%   11%   16%          13%   16%   14% 15%        NA    13% 14% 1… NA    14% …
 8 Strongly oppose  42%   41%   43%          38%   59%   48% 49%        NA    45% 38% 3… NA    37% …
 9 Not sure         11%   7%    15%          11%   17%   13% 14%        NA    16% 10% 6% NA    14% …
10 Totals           99%   100%  100% 101%    <NA>  100%  101% 100%      NA    100% 100%… NA    100%…
# ℹ 12 more rows

Clean and tidy the table

iran_war |>
  # keep relevant rows
  slice(16:20) |>
  # keep response and party ID columns
  select(X1, X11) |>
  # separate party ID column
  separate_wider_delim(
    cols = X11,
    delim = " ",
    names = c("Democrats", "Independents", "Republicans")
  ) |>
  # fix column names
  rename(
    response = X1
  )

# A tibble: 5 × 4
  response         Democrats Independents Republicans
  <chr>            <chr>     <chr>        <chr>      
1 Strongly support 3%        10%          40%        
2 Somewhat support 3%        13%          33%        
3 Somewhat oppose  14%       15%          11%        
4 Strongly oppose  75%       47%          5%         
5 Not sure         6%        15%          12%

Visualizing Likert scale data

Likert scales

A psychometric scale used to scale responses in survey research, often used to measure attitudes or opinions.

Scale is symmetric/bilateral with a neutral midpoint, and typically has 5 or 7 response options:

Strongly disagree
Disagree
Unsure
Agree
Strongly agree

Stacked bar charts

Show the code

stack_bar_p <- ggplot(
  data = iran_war_long,
  mapping = aes(x = pct, y = pid3, fill = response)
) +
  geom_col() +
  scale_x_continuous(labels = label_percent(), position = "top") +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    labels = label_wrap(width = 8),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )
stack_bar_p

With a double axis

Show the code

stack_bar_p +
  scale_x_continuous(
    labels = label_percent(),
    position = "top",
    sec.axis = sec_axis(
      transform = \(x) 1 - x,
      labels = label_percent(),
      name = NULL
    )
  )

Diverging, with extra neutrals

Show the code

# main plot
div_bar_no_neutral_p <- iran_war_long |>
  # remove Not sure responses
  filter(response != "Not sure") |>
  # shift negative responses to the left of the origin
  mutate(
    pct = if_else(
      response %in% c("Strongly oppose", "Somewhat oppose"),
      -pct,
      pct
    ),
    # convert response to character vector
    # factors with duplicated levels can cause issues with plotting
    response = as.character(response)
  ) |>
  ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
  # reverse the stacking order since response is no longer a factor
  geom_col(position = position_stack(reverse = TRUE)) +
  scale_x_continuous(
    breaks = seq(from = -.8, to = .6, by = .2),
    labels = label_percent(),
    position = "top"
  ) +
  # manually generate the diverging color scale since response is no longer a factor
  scale_fill_manual(
    labels = label_wrap(width = 8),
    # fix the order to match the other plots
    breaks = c(
      "Strongly support",
      "Somewhat support",
      "Not sure",
      "Somewhat oppose",
      "Strongly oppose"
    ),
    # generate palette manually as a character vector
    values = diverging_hcl(palette = "Blue-Red", n = 5),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )

# separate plot for neutrals
div_bar_neutral_p <- iran_war_long |>
  filter(response == "Not sure") |>
  ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
  geom_col(position = position_stack(reverse = TRUE)) +
  scale_x_continuous(
    breaks = c(0, .2),
    limits = c(NA, .2),
    labels = label_percent(),
    position = "top"
  ) +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    labels = label_wrap(width = 8),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"),
      legend.justification = "left")
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )

# combine together with patchwork
div_bar_no_neutral_p +
  div_bar_neutral_p +
  # add shared title and caption
  plot_annotation(
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  # combine shared axes
  plot_layout(
    widths = c(4, 1),
    axes = "collect"
  )

Diverging, integrated neutrals

Show the code

# split the neutral responses in half and plot them on either side of the origin
div_int_neutral_p <- iran_war_long |>
  # neutral will be split into two equal halves
  mutate(pct_plot = if_else(response == "Not sure", pct / 2, pct)) |>
  # duplicate neutral rows so one half can go left and one right
  uncount(
    weights = if_else(response == "Not sure", 2L, 1L),
    .id = "neutral_half"
  ) |>
  # invert oppose responses and one half of neutrals to negative values
  mutate(
    pct_plot = case_when(
      response %in% c("Strongly oppose", "Somewhat oppose") ~ -pct_plot,
      response == "Not sure" & neutral_half == 1L ~ -pct_plot,
      .default = pct_plot
    ),
    # convert response to character vector
    # factors with duplicated levels can cause issues with plotting
    response = as.character(response)
  ) |>
  ggplot(mapping = aes(x = pct_plot, y = pid3, fill = response)) +
  # reverse the stacking order since response is no longer a factor
  geom_col(position = position_stack(reverse = TRUE)) +
  geom_vline(xintercept = 0, linewidth = 0.4, color = "gray40") +
  scale_x_continuous(
    breaks = seq(from = -.8, to = .6, by = .2),
    labels = label_percent(),
    position = "top"
  ) +
  # manually generate the diverging color scale since response is no longer a factor
  scale_fill_manual(
    labels = label_wrap(width = 8),
    # fix the order to match the other plots
    breaks = c(
      "Strongly support",
      "Somewhat support",
      "Not sure",
      "Somewhat oppose",
      "Strongly oppose"
    ),
    # generate palette manually as a character vector
    values = diverging_hcl(palette = "Blue-Red", n = 5),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )
div_int_neutral_p

Split bars

Show the code

split_bars_p <- ggplot(
  data = iran_war_long,
  mapping = aes(x = pct, y = pid3, fill = response)
) +
  geom_col() +
  scale_x_continuous(
    breaks = seq(from = 0, to = 1, by = 0.2),
    labels = label_percent(),
    position = "top"
  ) +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    guide = "none"
  ) +
  facet_wrap(
    facets = vars(response |> fct_rev()),
    nrow = 1,
    space = "free_x",
    scales = "free_x",
    labeller = label_wrap_gen(width = 15)
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm"),
    # shrink the text size for the facet labels to fit
    strip.text = element_text(size = rel(0.7),
    margin = margin(t = 1, r = 0, b = 1, l = 0, unit = "mm"))
  )
split_bars_p

Application exercise

`ae-15`

Compare the advantages of different bar chart designs for reporting Likert scale data

Wrap up

Recap

Survey datasets provide rich information about public opinion, but can be difficult to work with
Survey weights are necessary to account for sampling design and nonresponse bias, and can be used to estimate population-level summary statistics
Top-line survey reports can be imported and wrangled with code, but may require some manual effort
Choose bar chart designs carefully when reporting Likert scale data, and be aware of the tradeoffs of different designs

Acknowledgments

Diverging bar charts example drawn from The case against diverging stacked bars by Lisa Charlotte Muth and Gregor Aisch