Visualizing survey data

Lecture 16

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2026

March 19, 2026

Announcements

Announcements

  • Mini-project
  • Project 02

Learning objectives

  • Identify methods for importing survey datasets
  • Estimate summary statistics using survey weights
  • Import and wrangle messy data from top-line survey reports
  • Design bar charts for reporting Likert scale data
  • Evaluate the interpretability of various bar charts

Viz critique

TODO

Reporting on public opinion

A brief history of public opinion polling

A poll is a type of survey or inquiry into public opinion conducted by interviewing a random sample of people.

  • George Gallup and the 1936 presidential election
  • Random-digit dialing
  • Shift to probability-based online surveys

Sources of raw public opinion data

Most public opinion data is proprietary

Common challenges working with survey datasets

  • File formats (Stata or SPSS)
  • Encoding missing values
  • Survey weights

TODO add more examples about missing values?

Visualizing the Nationscape survey

Import using {haven}

nationscape <- read_dta(file = "data/ns20210112.dta")
nationscape
# A tibble: 4,138 × 234
   response_id start_date          right_track                  economy_better interest registration
   <chr>       <dttm>              <dbl+lbl>                    <dbl+lbl>      <dbl+lb> <dbl+lbl>   
 1 07700007    2021-01-12 10:52:22   2 [Off on the wrong track] 1 [Better]      1 [Mos…   1 [Regist…
 2 07700008    2021-01-12 10:55:11   1 [Generally headed in th… 1 [Better]     NA         1 [Regist…
 3 07700009    2021-01-12 10:50:00   2 [Off on the wrong track] 2 [About the …  2 [Som…   1 [Regist…
 4 07700010    2021-01-12 10:47:32   2 [Off on the wrong track] 2 [About the …  1 [Mos…   1 [Regist…
 5 07700011    2021-01-12 10:52:57   2 [Off on the wrong track] 3 [Worse]       3 [Onl… 999 [Don't …
 6 07700013    2021-01-12 10:50:17 999 [Not sure]               1 [Better]      2 [Som…   1 [Regist…
 7 07700014    2021-01-12 10:49:44 999 [Not sure]               3 [Worse]       2 [Som…   1 [Regist…
 8 07700015    2021-01-12 10:51:21 999 [Not sure]               3 [Worse]       2 [Som…   1 [Regist…
 9 07700016    2021-01-12 11:01:55   1 [Generally headed in th… 2 [About the …  2 [Som…   1 [Regist…
10 07700020    2021-01-12 10:58:21   2 [Off on the wrong track] 2 [About the …  4 [Har…   1 [Regist…
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <dbl+lbl>, news_sources_cnn <dbl+lbl>,
#   news_sources_msnbc <dbl+lbl>, news_sources_fox <dbl+lbl>, news_sources_network <dbl+lbl>,
#   news_sources_localtv <dbl+lbl>, news_sources_telemundo <dbl+lbl>, news_sources_npr <dbl+lbl>,
#   news_sources_amtalk <dbl+lbl>, news_sources_new_york_times <dbl+lbl>,
#   news_sources_local_newspaper <dbl+lbl>, news_sources_other <dbl+lbl>,
#   news_sources_other_TEXT <chr>, pres_approval <dbl+lbl>, vote_2016 <dbl+lbl>, …

Convert labels to factors

nationscape <- as_factor(nationscape)
nationscape
# A tibble: 4,138 × 234
   response_id start_date          right_track                  economy_better interest registration
   <chr>       <dttm>              <fct>                        <fct>          <fct>    <fct>       
 1 07700007    2021-01-12 10:52:22 Off on the wrong track       Better         Most of… Registered  
 2 07700008    2021-01-12 10:55:11 Generally headed in the rig… Better         <NA>     Registered  
 3 07700009    2021-01-12 10:50:00 Off on the wrong track       About the same Some of… Registered  
 4 07700010    2021-01-12 10:47:32 Off on the wrong track       About the same Most of… Registered  
 5 07700011    2021-01-12 10:52:57 Off on the wrong track       Worse          Only no… Don't know  
 6 07700013    2021-01-12 10:50:17 Not sure                     Better         Some of… Registered  
 7 07700014    2021-01-12 10:49:44 Not sure                     Worse          Some of… Registered  
 8 07700015    2021-01-12 10:51:21 Not sure                     Worse          Some of… Registered  
 9 07700016    2021-01-12 11:01:55 Generally headed in the rig… About the same Some of… Registered  
10 07700020    2021-01-12 10:58:21 Off on the wrong track       About the same Hardly … Registered  
# ℹ 4,128 more rows
# ℹ 228 more variables: news_sources_facebook <fct>, news_sources_cnn <fct>,
#   news_sources_msnbc <fct>, news_sources_fox <fct>, news_sources_network <fct>,
#   news_sources_localtv <fct>, news_sources_telemundo <fct>, news_sources_npr <fct>,
#   news_sources_amtalk <fct>, news_sources_new_york_times <fct>,
#   news_sources_local_newspaper <fct>, news_sources_other <fct>, news_sources_other_TEXT <chr>,
#   pres_approval <fct>, vote_2016 <fct>, vote_2016_other_text <chr>, vote_intention_retro <fct>, …

Estimating summary statistics

nationscape |>
  # generate frequency counts
  count(college, pid3) |>
  # drop respondents who were not asked these questions
  drop_na() |>
  # estimate percentages within pid3 groups
  mutate(pct = n / sum(n), .by = pid3) |>
  # pivot to cross-tab format
  select(-n) |>
  pivot_wider(names_from = pid3, values_from = pct)
# A tibble: 3 × 5
  college  Democrat Republican Independent `Something else`
  <fct>       <dbl>      <dbl>       <dbl>            <dbl>
1 Agree       0.704      0.349       0.487            0.445
2 Disagree    0.148      0.487       0.308            0.264
3 Not Sure    0.149      0.164       0.205            0.291

Accounting for survey weights

nationscape |> select(weight)
# A tibble: 4,138 × 1
   weight
    <dbl>
 1 2.17  
 2 3.61  
 3 5.01  
 4 0.969 
 5 0.173 
 6 0.153 
 7 0.924 
 8 0.144 
 9 0.0964
10 2.64  
# ℹ 4,128 more rows

Accounting for survey weights with {srvyr}

library(srvyr)

nationscape |>
  as_survey_design(ids = 1, weights = weight)
Independent Sampling design (with replacement)
Called via srvyr
Sampling variables:
  - ids: `1` 
  - weights: weight 
Data variables: 
  - response_id (chr), start_date (dttm), right_track (fct), economy_better (fct), interest (fct),
    registration (fct), news_sources_facebook (fct), news_sources_cnn (fct), news_sources_msnbc
    (fct), news_sources_fox (fct), news_sources_network (fct), news_sources_localtv (fct),
    news_sources_telemundo (fct), news_sources_npr (fct), news_sources_amtalk (fct),
    news_sources_new_york_times (fct), news_sources_local_newspaper (fct), news_sources_other
    (fct), news_sources_other_TEXT (chr), pres_approval (fct), vote_2016 (fct),
    vote_2016_other_text (chr), vote_intention_retro (fct), vote_2020_retro (fct),
    vote_2020_retro_other_text (chr), who_won (fct), who_won_other_text (chr), primary_party_retro
    (fct), group_favorability_whites (fct), group_favorability_blacks (fct),
    group_favorability_latinos (fct), group_favorability_asians (fct),
    group_favorability_evangelicals (fct), group_favorability_socialists (fct),
    group_favorability_muslims (fct), group_favorability_labor_unions (fct),
    group_favorability_the_police (fct), group_favorability_undocumented (fct),
    group_favorability_lgbt (fct), group_favorability_republicans (fct),
    group_favorability_democrats (fct), group_favorability_white_men (fct), group_favorability_jews
    (fct), group_favorability_blm (fct), group_favorability_trump_s (fct),
    group_favorability_biden_s (fct), cand_favorability_trump (fct), cand_favorability_obama (fct),
    cand_favorability_biden (fct), cand_favorability_harris (fct), cand_favorability_pence (fct),
    rep_prim_vote (fct), rep_prim_vote_TEXT (chr), dem_prim_vote (fct), dem_prim_vote_TEXT (chr),
    house_intent_retro (fct), senate_intent_retro (fct), governor_intent_retro (fct),
    primary_sen_barrasso (fct), primary_sen_blackburn (fct), primary_sen_blunt (fct),
    primary_sen_boozman (fct), primary_sen_crapo (fct), primary_sen_cruz (fct), primary_sen_fischer
    (fct), primary_sen_grassley (fct), primary_sen_hoeven (fct), primary_sen_lankford (fct),
    primary_sen_lee (fct), primary_sen_moran (fct), primary_sen_murkowski (fct),
    primary_sen_neelykennedy (fct), primary_sen_paul (fct), primary_sen_portman (fct),
    primary_sen_rubio (fct), primary_sen_scott_tim (fct), primary_sen_shelby (fct),
    primary_sen_thune (fct), primary_sen_toomey (fct), primary_sen_wicker (fct), primary_sen_young
    (fct), primary_sen_braun (fct), primary_sen_cramer (fct), primary_sen_hawley (fct),
    primary_sen_romney (fct), primary_sen_scott_rick (fct), cand_truth_donald_trump (fct),
    cand_truth_joe_biden (fct), cand_facts_donald_trump (fct), cand_facts_joe_biden (fct),
    pence_president (fct), racial_attitudes_tryhard (fct), racial_attitudes_generations (fct),
    racial_attitudes_marry (fct), racial_attitudes_date (fct), gender_attitudes_maleboss (fct),
    gender_attitudes_logical (fct), gender_attitudes_opportunity (fct), gender_attitudes_complain
    (fct), discrimination_blacks (fct), discrimination_whites (fct), discrimination_muslims (fct),
    discrimination_christians (fct), discrimination_jews (fct), discrimination_women (fct),
    discrimination_men (fct), discrimination_asians (fct), discrimination_latinos (fct),
    sen_knowledge (fct), sc_knowledge (fct), pid3 (fct), pid7 (fct), pid7_legacy (fct),
    strength_democrat (fct), strength_republican (fct), lean_independent (fct), ideo5 (fct),
    employment (fct), employment_other_text (chr), work_location (fct), foreign_born (fct),
    language (fct), religion (fct), religion_other_text (chr), is_evangelical (fct),
    orientation_group (fct), in_union (fct), married (fct), extra_n_children (dbl),
    household_gun_owner (fct), wall (fct), cap_carbon (fct), guns_bg (fct), mctaxes (fct),
    estate_tax (fct), raise_upper_tax (fct), college (fct), abortion_any_time (fct), abortion_never
    (fct), abortion_conditions (fct), late_term_abortion (fct), abolish_priv_insurance (fct),
    abortion_insurance (fct), abortion_waiting (fct), china_tariffs (fct), criminal_immigration
    (fct), environment (fct), guaranteed_jobs (fct), green_new_deal (fct), gun_registry (fct),
    immigration_insurance (fct), immigration_separation (fct), immigration_system (fct),
    immigration_wire (fct), israel (fct), marijuana (fct), maternityleave (fct), medicare_for_all
    (fct), military_size (fct), minwage (fct), muslimban (fct), oil_and_gas (fct), reparations
    (fct), right_to_work (fct), saudi_arabia (fct), ten_commandments (fct), trade (fct),
    trans_military (fct), uctaxes2 (fct), vouchers (fct), gov_insurance (fct), public_option (fct),
    health_subsidies (fct), path_to_citizenship (fct), dreamers (fct), deportation (fct), ban_guns
    (fct), ban_assault_rifles (fct), limit_magazines (fct), impeach_trump (fct), egypt (fct),
    fc_smallgov (fct), fc_trad_val (fct), statements_protect_traditions (fct),
    statements_defense_burden (fct), statements_trade_effects (fct),
    statements_christianity_assault (fct), statements_gender_identity (fct),
    statements_american_loss (fct), statements_imm_assimilate (fct), statements_gun_rights (fct),
    statements_confront_china (fct), statements_foreign_interests (fct), elect_conf_conduct_retro
    (fct), elect_conf_vote_retro (fct), extra_vote_mail_retr (fct), extra_vacc_flu (dbl),
    extra_vacc_covid (dbl), extra_dem_violence (fct), extra_ind_violence (fct), extra_rep_violence
    (fct), extra_corona_concern (fct), extra_sick_you (fct), extra_sick_family (fct),
    extra_sick_work (fct), extra_sick_other (fct), extra_covid_worn_mask (fct),
    extra_covid_socialize_distance (fct), extra_covid_socialize_no_dist (fct), extra_trump_corona
    (fct), extra_gub_corona (fct), extra_covid_cancel_meet (fct), extra_covid_close_business (fct),
    extra_covid_close_schools (fct), extra_covid_work_home (fct), extra_covid_restrict_home (fct),
    extra_covid_testing (fct), extra_covid_require_mask (fct), capitol_approval (fct),
    capitol_trump_approv (fct), capitol_trump_more (fct), twitter_ban (fct), age (dbl), gender
    (fct), census_region (fct), hispanic (fct), race_ethnicity (fct), household_income (fct),
    education (fct), state (chr), congress_district (chr), weight (dbl), weight_2020 (dbl),
    weight_both (dbl)

Estimating weighted proportions

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  summarize(
    pct = survey_mean(),
    total = survey_total(),
    .by = c(college, pid3)
  )
# A tibble: 19 × 6
# Groups:   college [4]
   college  pid3                 pct    pct_se     total total_se
   <fct>    <fct>              <dbl>     <dbl>     <dbl>    <dbl>
 1 Agree    Democrat       0.499     0.0186    1053.      51.2   
 2 Agree    Republican     0.173     0.0135     364.      30.4   
 3 Agree    Independent    0.286     0.0176     602.      44.2   
 4 Agree    Something else 0.0422    0.00795     89.0     17.1   
 5 Agree    <NA>           0.000471  0.000471     0.993    0.993 
 6 Disagree Democrat       0.198     0.0186     246.      25.5   
 7 Disagree Republican     0.459     0.0232     569.      37.1   
 8 Disagree Independent    0.285     0.0212     353.      30.8   
 9 Disagree Something else 0.0581    0.0124      72.1     16.0   
10 Disagree <NA>           0.0000119 0.0000119    0.0147   0.0147
11 Not Sure Democrat       0.329     0.0300     247.      27.4   
12 Not Sure Republican     0.255     0.0270     191.      23.0   
13 Not Sure Independent    0.302     0.0289     226.      25.6   
14 Not Sure Something else 0.114     0.0207      85.3     16.5   
15 <NA>     Democrat       0.228     0.0926       9.08     3.55  
16 <NA>     Republican     0.334     0.133       13.3      6.35  
17 <NA>     Independent    0.277     0.143       11.0      7.09  
18 <NA>     Something else 0.143     0.115        5.70     5.06  
19 <NA>     <NA>           0.0172    0.0143       0.684    0.543 

Turn into a cross-tab

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  summarize(
    pct = survey_mean(),
    total = survey_total(),
    .by = c(college, pid3)
  ) |>
  drop_na() |>
  select(college, pid3, pct) |>
  pivot_wider(names_from = pid3, values_from = pct)
# A tibble: 3 × 5
# Groups:   college [3]
  college  Democrat Republican Independent `Something else`
  <fct>       <dbl>      <dbl>       <dbl>            <dbl>
1 Agree       0.499      0.173       0.286           0.0422
2 Disagree    0.198      0.459       0.285           0.0581
3 Not Sure    0.329      0.255       0.302           0.114 

Estimating weighted means

nationscape |>
  as_survey_design(ids = 1, weights = weight) |>
  drop_na(pid3, extra_vacc_covid) |>
  summarize(
    pct = survey_mean(extra_vacc_covid),
    .by = pid3
  )
# A tibble: 4 × 3
  pid3             pct pct_se
  <fct>          <dbl>  <dbl>
1 Democrat        68.4   1.57
2 Republican      55.9   2.00
3 Independent     55.5   2.08
4 Something else  39.6   4.21

TODO clean up any of the examples? Actually visualize the data with uncertainty measures?

Working with top-line survey results

Top-line survey reports

Top-line survey reports

  • Question-by-question results reported in tabular format
  • Broken down by key demographic groups
  • Frequently released by public opinion research firms, whereas respondent-level data is often proprietary

How to visualize top-line survey results

Before we can visualize top-line survey results, we need to import and wrangle the data.

Often complicated because the data is reported in a PDF document not intended for programmatic usage.

Before we reach for LLMs

You can use LLMs to extract tabular data, but should you?

Requires careful prompting and expensive API calls (or monthly plan), and the results may be unreliable.

Instead, just write code to do it in R!

Extracting table with {tabulapdf}

library(tabulapdf)

iran_war <- extract_tables(
  file = "data/econTabReport_CwWXhS2.pdf",
  pages = 7,
  col_names = FALSE
) |>
  nth(1)
iran_war
# A tibble: 22 × 11
   X1               X2    X3    X4           X5    X6    X7             X8    X9         X10   X11  
   <chr>            <chr> <chr> <chr>        <chr> <chr> <chr>          <lgl> <chr>      <lgl> <chr>
 1 <NA>             <NA>  <NA>  Sex          <NA>  Race  <NA>           NA    Age        NA    Educ…
 2 <NA>             Total Male  Female White <NA>  Black Hispanic 18-29 NA    30-44 45-… NA    No d…
 3 Support          33%   40%   26% 39%      <NA>  8%    25% 22%        NA    26% 39% 4… NA    35% …
 4 Oppose           56%   52%   59% 51%      <NA>  75%   62% 64%        NA    58% 52% 5… NA    51% …
 5 Strongly support 17%   24%   11%          21%   7%    14% 10%        NA    12% 19% 2… NA    18% …
 6 Somewhat support 16%   17%   15%          18%   1%    12% 12%        NA    14% 19% 1… NA    17% …
 7 Somewhat oppose  13%   11%   16%          13%   16%   14% 15%        NA    13% 14% 1… NA    14% …
 8 Strongly oppose  42%   41%   43%          38%   59%   48% 49%        NA    45% 38% 3… NA    37% …
 9 Not sure         11%   7%    15%          11%   17%   13% 14%        NA    16% 10% 6% NA    14% …
10 Totals           99%   100%  100% 101%    <NA>  100%  101% 100%      NA    100% 100%… NA    100%…
# ℹ 12 more rows

Clean and tidy the table

iran_war |>
  # keep relevant rows
  slice(16:20) |>
  # keep response and party ID columns
  select(X1, X11) |>
  # separate party ID column
  separate_wider_delim(
    cols = X11,
    delim = " ",
    names = c("Democrats", "Independents", "Republicans")
  ) |>
  # fix column names
  rename(
    response = X1
  )
# A tibble: 5 × 4
  response         Democrats Independents Republicans
  <chr>            <chr>     <chr>        <chr>      
1 Strongly support 3%        10%          40%        
2 Somewhat support 3%        13%          33%        
3 Somewhat oppose  14%       15%          11%        
4 Strongly oppose  75%       47%          5%         
5 Not sure         6%        15%          12%        

Visualizing Likert scale data

Likert scales

A psychometric scale used to scale responses in survey research, often used to measure attitudes or opinions.

Scale is symmetric/bilateral with a neutral midpoint, and typically has 5 or 7 response options:

  1. Strongly disagree
  2. Disagree
  3. Unsure
  4. Agree
  5. Strongly agree

Stacked bar charts

Show the code
stack_bar_p <- ggplot(
  data = iran_war_long,
  mapping = aes(x = pct, y = pid3, fill = response)
) +
  geom_col() +
  scale_x_continuous(labels = label_percent(), position = "top") +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    labels = label_wrap(width = 8),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )
stack_bar_p

With a double axis

Show the code
stack_bar_p +
  scale_x_continuous(
    labels = label_percent(),
    position = "top",
    sec.axis = sec_axis(
      transform = \(x) 1 - x,
      labels = label_percent(),
      name = NULL
    )
  )

Diverging, with extra neutrals

Show the code
# main plot
div_bar_no_neutral_p <- iran_war_long |>
  # remove Not sure responses
  filter(response != "Not sure") |>
  # shift negative responses to the left of the origin
  mutate(
    pct = if_else(
      response %in% c("Strongly oppose", "Somewhat oppose"),
      -pct,
      pct
    )
  ) |>
  ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
  geom_col() +
  scale_x_continuous(
    breaks = seq(from = -.8, to = .6, by = .2),
    labels = label_percent(),
    position = "top"
  ) +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    labels = label_wrap(width = 8),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )

# separate plot for neutrals
div_bar_neutral_p <- iran_war_long |>
  filter(response == "Not sure") |>
  ggplot(mapping = aes(x = pct, y = pid3, fill = response)) +
  geom_col() +
  scale_x_continuous(
    breaks = c(0, .2),
    limits = c(NA, .2),
    labels = label_percent(),
    position = "top"
  ) +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    labels = label_wrap(width = 8),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"),
      legend.justification = "left")
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )

# combine together with patchwork
div_bar_no_neutral_p +
  div_bar_neutral_p +
  # add shared title and caption
  plot_annotation(
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  # combine shared axes
  plot_layout(
    widths = c(4, 1),
    axes = "collect"
  )

Diverging, integrated neutrals

Show the code
# split the neutral responses in half and plot them on either side of the origin
div_int_neutral_p <- iran_war_long |>
  # neutral will be split into two equal halves
  mutate(pct_plot = if_else(response == "Not sure", pct / 2, pct)) |>
  # duplicate neutral rows so one half can go left and one right
  uncount(
    weights = if_else(response == "Not sure", 2L, 1L),
    .id = "neutral_half"
  ) |>
  # invert oppose responses and one half of neutrals to negative values
  mutate(
    pct_plot = case_when(
      response %in% c("Strongly oppose", "Somewhat oppose") ~ -pct_plot,
      response == "Not sure" & neutral_half == 1L ~ -pct_plot,
      .default = pct_plot
    ),
    # convert response to character vector
    # factors with duplicated levels can cause issues with plotting
    response = as.character(response)
  ) |>
  ggplot(mapping = aes(x = pct_plot, y = pid3, fill = response)) +
  # reverse the stacking order since response is no longer a factor
  geom_col(position = position_stack(reverse = TRUE)) +
  geom_vline(xintercept = 0, linewidth = 0.4, color = "gray40") +
  scale_x_continuous(
    breaks = seq(from = -.8, to = .6, by = .2),
    labels = label_percent(),
    position = "top"
  ) +
  # manually generate the diverging color scale since response is no longer a factor
  scale_fill_manual(
    labels = label_wrap(width = 8),
    # fix the order to match the other plots
    breaks = c(
      "Strongly support",
      "Somewhat support",
      "Not sure",
      "Somewhat oppose",
      "Strongly oppose"
    ),
    # generate palette manually as a character vector
    values = diverging_hcl(palette = "Blue-Red", n = 5),
    guide = guide_legend(
      reverse = TRUE,
      theme = theme(legend.key.width = unit(1, "cm"))
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm")
  )
div_int_neutral_p

Split bars

Show the code
split_bars_p <- ggplot(
  data = iran_war_long,
  mapping = aes(x = pct, y = pid3, fill = response)
) +
  geom_col() +
  scale_x_continuous(
    breaks = seq(from = 0, to = 1, by = 0.2),
    labels = label_percent(),
    position = "top"
  ) +
  scale_fill_discrete_diverging(
    palette = "Blue-Red",
    guide = "none"
  ) +
  facet_wrap(
    facets = vars(response |> fct_rev()),
    nrow = 1,
    space = "free_x",
    scales = "free_x",
    labeller = label_wrap_gen(width = 15)
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = NULL,
    title = "Do you support or oppose the war with Iran?",
    caption = "Source: YouGov (March 13-16, 2026)"
  ) +
  theme(
    panel.grid = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.length.x = unit(0.25, "cm"),
    # shrink the text size for the facet labels to fit
    strip.text = element_text(size = rel(0.7),
    margin = margin(t = 1, r = 0, b = 1, l = 0, unit = "mm"))
  )
split_bars_p

Application exercise

ae-15

Compare the advantages of different bar chart designs for reporting Likert scale data

Stacked bar charts

Diverging, with extra neutrals

Diverging, integrated neutrals

Split bars

Wrap up

Recap

  • Survey datasets provide rich information about public opinion, but can be difficult to work with
  • Survey weights are necessary to account for sampling design and nonresponse bias, and can be used to estimate population-level summary statistics
  • Top-line survey reports can be imported and wrangled with code, but may require some manual effort
  • Choose bar chart designs carefully when reporting Likert scale data, and be aware of the tradeoffs of different designs

Acknowledgments