Tables

Lecture 25

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2024

April 30, 2024

Announcements

Announcements

  • Homework 07
  • Project 2
  • Class on Thursday

Goals

  • Identify when tables are appropriate methods for communication
  • Compare and contrast presenting similar data through tables and figures
  • Introduce the gt package for constructing reproducible tables
  • Implement sparkline plots within a table

Data in tables

Tables vs. plots

Tables:

  • To look up or compare individual values
  • To display precise values
  • To include detail and summary values
  • To display quantitative values including more than one unit of measure

Plots:

  • To reveal relationships among whole sets of values
  • To display a message that is contained in the shape of the values (e.g., patterns, trends, exceptions)

PhD degrees in the United States

phd_degrees_raw <- read_csv("data/phd_by_field.csv")
phd_degrees_raw
# A tibble: 3,370 × 5
   broad_field   major_field                                 field   year n_phds
   <chr>         <chr>                                       <chr>  <dbl>  <dbl>
 1 Life sciences Agricultural sciences and natural resources Agric…  2008    111
 2 Life sciences Agricultural sciences and natural resources Agric…  2008     28
 3 Life sciences Agricultural sciences and natural resources Agric…  2008      3
 4 Life sciences Agricultural sciences and natural resources Agron…  2008     68
 5 Life sciences Agricultural sciences and natural resources Anima…  2008     41
 6 Life sciences Agricultural sciences and natural resources Anima…  2008     18
 7 Life sciences Agricultural sciences and natural resources Anima…  2008     77
 8 Life sciences Agricultural sciences and natural resources Envir…  2008    182
 9 Life sciences Agricultural sciences and natural resources Fishi…  2008     52
10 Life sciences Agricultural sciences and natural resources Food …  2008     96
# ℹ 3,360 more rows

Aggregate by field and year

phd_degrees <- phd_degrees_raw |>
  summarize(n_phds = sum(n_phds, na.rm = TRUE), .by = c(broad_field, year)) |>
  mutate(pct_phds = n_phds / sum(n_phds), .by = c(year)) |>
  rename(field = broad_field, count = n_phds, perc = pct_phds)
phd_degrees
# A tibble: 70 × 4
   field                              year count   perc
   <chr>                             <dbl> <dbl>  <dbl>
 1 Life sciences                      2008 18732 0.390 
 2 Mathematics and computer sciences  2008  3187 0.0664
 3 Psychology and social sciences     2008 10684 0.222 
 4 Engineering                        2008  1364 0.0284
 5 Education                          2008  6561 0.137 
 6 Humanities and arts                2008  4736 0.0986
 7 Other                              2008  2762 0.0575
 8 Life sciences                      2009 19332 0.393 
 9 Mathematics and computer sciences  2009  3163 0.0644
10 Psychology and social sciences     2009 11074 0.225 
# ℹ 60 more rows

PhDs awarded in 2017

# A tibble: 7 × 2
  field                               perc
  <chr>                              <dbl>
1 Life sciences                     0.408 
2 Mathematics and computer sciences 0.0710
3 Psychology and social sciences    0.236 
4 Engineering                       0.0412
5 Education                         0.0891
6 Humanities and arts               0.0978
7 Other                             0.0575

PhDs awarded in 2017

# A tibble: 7 × 2
  field                               perc
  <chr>                              <dbl>
1 Life sciences                     0.408 
2 Psychology and social sciences    0.236 
3 Humanities and arts               0.0978
4 Education                         0.0891
5 Mathematics and computer sciences 0.0710
6 Other                             0.0575
7 Engineering                       0.0412

PhDs awarded in 2017

Field Percentage
Life sciences 40.8%
Psychology and social sciences 23.6%
Humanities and arts 9.8%
Education 8.9%
Mathematics and computer sciences 7.1%
Other 5.8%
Engineering 4.1%

In the next few slides:

Degrees awarded in 2017

How should this information be displayed? And why?

In a table?

Popular Bachelor's degrees over the years
Year Education Engineering Humanities and arts Life sciences Mathematics and computer sciences Other Psychology and social sciences
2008 13.7% 2.8% 9.9% 39.0% 6.6% 5.8% 22.2%
2009 13.3% 2.7% 10.0% 39.3% 6.4% 5.7% 22.5%
2010 11.1% 2.9% 10.5% 39.9% 6.8% 5.7% 23.0%
2011 9.6% 3.1% 10.8% 40.5% 6.7% 5.5% 23.7%
2012 9.5% 3.5% 11.0% 40.1% 6.9% 5.4% 23.7%
2013 9.4% 3.9% 10.9% 39.9% 7.0% 5.8% 23.2%
2014 9.0% 4.1% 10.4% 40.5% 7.2% 5.7% 23.1%
2015 9.4% 4.0% 10.3% 40.0% 7.1% 5.6% 23.6%
2016 9.4% 3.9% 10.0% 40.7% 7.2% 5.4% 23.4%
2017 8.9% 4.1% 9.8% 40.8% 7.1% 5.8% 23.6%

Or in a plot?

Tables, the making of

Tables with gt

We will use the gt (Grammar of Tables) package to create tables in R.

The gt philosophy: we can construct a wide variety of useful tables with a cohesive set of table parts.

Source: gt.rstudio.com

Tables with gt

Generating a gt table

Start with the data frame

phd_degrees
# A tibble: 70 × 4
   field               year count   perc
   <chr>              <dbl> <dbl>  <dbl>
 1 Life sciences       2008 18732 0.390 
 2 Mathematics and c…  2008  3187 0.0664
 3 Psychology and so…  2008 10684 0.222 
 4 Engineering         2008  1364 0.0284
 5 Education           2008  6561 0.137 
 6 Humanities and ar…  2008  4736 0.0986
 7 Other               2008  2762 0.0575
 8 Life sciences       2009 19332 0.393 
 9 Mathematics and c…  2009  3163 0.0644
10 Psychology and so…  2009 11074 0.225 
# ℹ 60 more rows

Generating a gt table

Filter to keep desired rows

phd_degrees |>
  filter(year == 2017)
# A tibble: 7 × 4
  field                year count   perc
  <chr>               <dbl> <dbl>  <dbl>
1 Life sciences        2017 22057 0.408 
2 Mathematics and co…  2017  3843 0.0710
3 Psychology and soc…  2017 12757 0.236 
4 Engineering          2017  2227 0.0412
5 Education            2017  4823 0.0891
6 Humanities and arts  2017  5290 0.0978
7 Other                2017  3113 0.0575

Generating a gt table

Select desired columns

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc)
# A tibble: 7 × 2
  field                             perc
  <chr>                            <dbl>
1 Life sciences                   0.408 
2 Mathematics and computer scien… 0.0710
3 Psychology and social sciences  0.236 
4 Engineering                     0.0412
5 Education                       0.0891
6 Humanities and arts             0.0978
7 Other                           0.0575

Generating a gt table

Arrange the row order

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc))
# A tibble: 7 × 2
  field                             perc
  <chr>                            <dbl>
1 Life sciences                   0.408 
2 Psychology and social sciences  0.236 
3 Humanities and arts             0.0978
4 Education                       0.0891
5 Mathematics and computer scien… 0.0710
6 Other                           0.0575
7 Engineering                     0.0412

Generating a gt table

Initialize the gt() object

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt()
field perc
Life sciences 0.40763260
Psychology and social sciences 0.23576049
Humanities and arts 0.09776381
Education 0.08913325
Mathematics and computer sciences 0.07102199
Other 0.05753096
Engineering 0.04115690

Generating a gt table

Decrease vertical padding

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt() |>
  opt_vertical_padding(scale = 0)
field perc
Life sciences 0.40763260
Psychology and social sciences 0.23576049
Humanities and arts 0.09776381
Education 0.08913325
Mathematics and computer sciences 0.07102199
Other 0.05753096
Engineering 0.04115690

Generating a gt table

Increase font size

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt() |>
  opt_vertical_padding(scale = 0) |>
  tab_style(
    style = cell_text(size = "large"),
    locations = cells_body(columns = everything())
  )
field perc
Life sciences 0.40763260
Psychology and social sciences 0.23576049
Humanities and arts 0.09776381
Education 0.08913325
Mathematics and computer sciences 0.07102199
Other 0.05753096
Engineering 0.04115690

Generating a gt table

Format percentages

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt() |>
  opt_vertical_padding(scale = 0) |>
  tab_style(
    style = cell_text(size = "large"),
    locations = cells_body(columns = everything())
  ) |>
  fmt_percent(
    columns = perc,
    decimals = 1
  )
field perc
Life sciences 40.8%
Psychology and social sciences 23.6%
Humanities and arts 9.8%
Education 8.9%
Mathematics and computer sciences 7.1%
Other 5.8%
Engineering 4.1%

Generating a gt table

Create human-readable column headers

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt() |>
  opt_vertical_padding(scale = 0) |>
  tab_style(
    style = cell_text(size = "large"),
    locations = cells_body(columns = everything())
  ) |>
  fmt_percent(
    columns = perc,
    decimals = 1
  ) |>
  cols_label(
    field = "Field",
    perc = "Percentage"
  )
Field Percentage
Life sciences 40.8%
Psychology and social sciences 23.6%
Humanities and arts 9.8%
Education 8.9%
Mathematics and computer sciences 7.1%
Other 5.8%
Engineering 4.1%

Generating a gt table

Add a title

phd_degrees |>
  filter(year == 2017) |>
  select(field, perc) |>
  arrange(desc(perc)) |>
  gt() |>
  opt_vertical_padding(scale = 0) |>
  tab_style(
    style = cell_text(size = "large"),
    locations = cells_body(columns = everything())
  ) |>
  fmt_percent(
    columns = perc,
    decimals = 1
  ) |>
  cols_label(
    field = "Field",
    perc = "Percentage"
  ) |>
  tab_header(
    title = "PhD degrees awarded in 2017"
  )
PhD degrees awarded in 2017
Field Percentage
Life sciences 40.8%
Psychology and social sciences 23.6%
Humanities and arts 9.8%
Education 8.9%
Mathematics and computer sciences 7.1%
Other 5.8%
Engineering 4.1%

Plots in tables

Should these data be displayed in a table or a plot?

Popular PhD degrees over the years
Field 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Life sciences 39% 39% 40% 40% 40% 40% 41% 40% 41% 41%
Mathematics and computer sciences 7% 6% 7% 7% 7% 7% 7% 7% 7% 7%
Psychology and social sciences 22% 23% 23% 24% 24% 23% 23% 24% 23% 24%
Engineering 3% 3% 3% 3% 3% 4% 4% 4% 4% 4%
Education 14% 13% 11% 10% 9% 9% 9% 9% 9% 9%
Humanities and arts 10% 10% 11% 11% 11% 11% 10% 10% 10% 10%
Other 6% 6% 6% 6% 5% 6% 6% 6% 5% 6%

Add visualizations to your table

Example: Add sparklines to display trend alongside raw data


Popular Bachelor's degrees over the years
Field Trend 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Education 14% 13% 11% 10% 9% 9% 9% 9% 9% 9%
Engineering 3% 3% 3% 3% 3% 4% 4% 4% 4% 4%
Humanities and arts 10% 10% 11% 11% 11% 11% 10% 10% 10% 10%
Life sciences 39% 39% 40% 40% 40% 40% 41% 40% 41% 41%
Mathematics and computer sciences 7% 6% 7% 7% 7% 7% 7% 7% 7% 7%
Other 6% 6% 6% 6% 5% 6% 6% 6% 5% 6%
Psychology and social sciences 22% 23% 23% 24% 24% 23% 23% 24% 23% 24%

Custom function with ggplot()

plot_spark <- function(df) {
  ggplot(df, aes(x = year, y = perc)) +
    geom_line(size = 20) +
    theme_void()
}

phd_degrees_other |>
  filter(field == "Psychology and social sciences") |>
  plot_spark()

Prep the data

phd_degrees_other_plots <- phd_degrees_other |>
  nest(field_df = c(year, perc)) |>
  mutate(plot = map(.x = field_df, .f = plot_spark))
phd_degrees_other_plots
# A tibble: 7 × 3
  field                             field_df          plot  
  <chr>                             <list>            <list>
1 Life sciences                     <tibble [10 × 2]> <gg>  
2 Mathematics and computer sciences <tibble [10 × 2]> <gg>  
3 Psychology and social sciences    <tibble [10 × 2]> <gg>  
4 Engineering                       <tibble [10 × 2]> <gg>  
5 Education                         <tibble [10 × 2]> <gg>  
6 Humanities and arts               <tibble [10 × 2]> <gg>  
7 Other                             <tibble [10 × 2]> <gg>  

Widen the table

phd_degrees_other |>
  pivot_wider(names_from = year, values_from = perc) |>
  mutate(ggplot = NA, .after = field)
# A tibble: 7 × 12
  field    ggplot `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016`
  <chr>    <lgl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Life sc… NA     0.390  0.393  0.399  0.405  0.401  0.399  0.405  0.400  0.407 
2 Mathema… NA     0.0664 0.0644 0.0677 0.0674 0.0689 0.0699 0.0724 0.0705 0.0721
3 Psychol… NA     0.222  0.225  0.230  0.237  0.237  0.232  0.231  0.236  0.234 
4 Enginee… NA     0.0284 0.0271 0.0292 0.0314 0.0346 0.0385 0.0411 0.0402 0.0393
5 Educati… NA     0.137  0.133  0.111  0.0962 0.0946 0.0942 0.0897 0.0941 0.0938
6 Humanit… NA     0.0986 0.0998 0.105  0.108  0.110  0.109  0.104  0.103  0.0999
7 Other    NA     0.0575 0.0571 0.0573 0.0553 0.0538 0.0577 0.0570 0.0558 0.0536
# ℹ 1 more variable: `2017` <dbl>

Basic gt table

phd_degrees_other |>
  pivot_wider(names_from = year, values_from = perc) |>
  mutate(ggplot = NA, .after = field) |>
  gt()
field ggplot 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Life sciences NA 0.39003873 0.39339859 0.39902998 0.40466774 0.40148099 0.39893069 0.40538565 0.40009972 0.40745871 0.40763260
Mathematics and computer sciences NA 0.06635989 0.06436580 0.06767028 0.06742059 0.06885007 0.06988734 0.07237089 0.07050264 0.07210820 0.07102199
Psychology and social sciences NA 0.22246283 0.22535154 0.23049467 0.23738310 0.23711523 0.23160206 0.23090473 0.23597518 0.23384127 0.23576049
Engineering NA 0.02840128 0.02712603 0.02920551 0.03141350 0.03460228 0.03851442 0.04105764 0.04021864 0.03929860 0.04115690
Education NA 0.13661350 0.13284223 0.11100613 0.09619742 0.09459007 0.09421424 0.08974215 0.09413894 0.09379899 0.08913325
Humanities and arts NA 0.09861325 0.09979447 0.10529520 0.10765048 0.10951809 0.10912736 0.10351548 0.10329800 0.09988699 0.09776381
Other NA 0.05751052 0.05712134 0.05729823 0.05526717 0.05384328 0.05772389 0.05702346 0.05576689 0.05360723 0.05753096

Basic gt table with sparklines

phd_degrees_other |>
  pivot_wider(names_from = year, values_from = perc) |>
  mutate(ggplot = NA, .after = field) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = ggplot),
    fn = \(x) {
      map(
        .x = phd_degrees_other_plots$plot, .f = ggplot_image,
        height = px(15), aspect_ratio = 4
      )
    }
  )
field ggplot 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Life sciences 0.39003873 0.39339859 0.39902998 0.40466774 0.40148099 0.39893069 0.40538565 0.40009972 0.40745871 0.40763260
Mathematics and computer sciences 0.06635989 0.06436580 0.06767028 0.06742059 0.06885007 0.06988734 0.07237089 0.07050264 0.07210820 0.07102199
Psychology and social sciences 0.22246283 0.22535154 0.23049467 0.23738310 0.23711523 0.23160206 0.23090473 0.23597518 0.23384127 0.23576049
Engineering 0.02840128 0.02712603 0.02920551 0.03141350 0.03460228 0.03851442 0.04105764 0.04021864 0.03929860 0.04115690
Education 0.13661350 0.13284223 0.11100613 0.09619742 0.09459007 0.09421424 0.08974215 0.09413894 0.09379899 0.08913325
Humanities and arts 0.09861325 0.09979447 0.10529520 0.10765048 0.10951809 0.10912736 0.10351548 0.10329800 0.09988699 0.09776381
Other 0.05751052 0.05712134 0.05729823 0.05526717 0.05384328 0.05772389 0.05702346 0.05576689 0.05360723 0.05753096

Final formatting

phd_degrees_other |>
  pivot_wider(names_from = year, values_from = perc) |>
  mutate(ggplot = NA, .after = field) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = ggplot),
    fn = \(x) {
      map(
        .x = phd_degrees_other_plots$plot, .f = ggplot_image,
        height = px(15), aspect_ratio = 4
      )
    }
  ) |>
  cols_width(ggplot ~ px(100)) |>
  cols_align(align = "left", columns = field) |>
  fmt_percent(columns = where(is.numeric), decimals = 0) |>
  cols_label(field = "Field", ggplot = "Trend") |>
  tab_spanner(
    label = "Popular PhD degrees over the years",
    columns = everything()
  ) |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_spanners()
  )

Generate the table

Popular PhD degrees over the years
Field Trend 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Life sciences 39% 39% 40% 40% 40% 40% 41% 40% 41% 41%
Mathematics and computer sciences 7% 6% 7% 7% 7% 7% 7% 7% 7% 7%
Psychology and social sciences 22% 23% 23% 24% 24% 23% 23% 24% 23% 24%
Engineering 3% 3% 3% 3% 3% 4% 4% 4% 4% 4%
Education 14% 13% 11% 10% 9% 9% 9% 9% 9% 9%
Humanities and arts 10% 10% 11% 11% 11% 11% 10% 10% 10% 10%
Other 6% 6% 6% 6% 5% 6% 6% 6% 5% 6%

Adding color text

Popular Bachelor's degrees over the years
Field Trend 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Education 14% 13% 11% 10% 9% 9% 9% 9% 9% 9%
Engineering 3% 3% 3% 3% 3% 4% 4% 4% 4% 4%
Humanities and arts 10% 10% 11% 11% 11% 11% 10% 10% 10% 10%
Life sciences 39% 39% 40% 40% 40% 40% 41% 40% 41% 41%
Mathematics and computer sciences 7% 6% 7% 7% 7% 7% 7% 7% 7% 7%
Other 6% 6% 6% 6% 5% 6% 6% 6% 5% 6%
Psychology and social sciences 22% 23% 23% 24% 24% 23% 23% 24% 23% 24%
plot_spark_color <- function(df) {
  ggplot(df, aes(x = year, y = perc, color = line_color)) +
    geom_line(size = 20) +
    theme_void() +
    scale_color_identity()
}

phd_degrees_other_plots_color <- phd_degrees_other |>
  mutate(line_color = case_when(
    field == "Education" ~ "#9D6C06",
    field == "Engineering" ~ "#077DAA",
    field == "Humanities and arts" ~ "#026D4E",
    field == "Life sciences" ~ "#A39A09",
    field == "Mathematics and computer sciences" ~ "#044F7E",
    field == "Other" ~ "#954000",
    field == "Psychology and social sciences" ~ "#984B77"
  )) |>
  nest(field_df = c(year, perc, line_color)) |>
  mutate(plot = map(field_df, plot_spark_color)) |>
  arrange(field)

phd_degrees_other |>
  pivot_wider(names_from = year, values_from = perc) |>
  arrange(field) |>
  mutate(ggplot = NA, .after = field) |>
  gt() |>
  text_transform(
    locations = cells_body(columns = ggplot),
    fn = \(x) {
      map(phd_degrees_other_plots_color$plot, ggplot_image, height = px(15), aspect_ratio = 4)
    }
  ) |>
  cols_width(ggplot ~ px(100)) |>
  cols_align(align = "left", columns = field) |>
  fmt_percent(columns = where(is.numeric), decimals = 0) |>
  tab_style(style = cell_text(color = "#9D6C06"), locations = cells_body(rows = 1, columns = field)) |>
  tab_style(style = cell_text(color = "#077DAA"), locations = cells_body(rows = 2, columns = field)) |>
  tab_style(style = cell_text(color = "#026D4E"), locations = cells_body(rows = 3, columns = field)) |>
  tab_style(style = cell_text(color = "#A39A09"), locations = cells_body(rows = 4, columns = field)) |>
  tab_style(style = cell_text(color = "#044F7E"), locations = cells_body(rows = 5, columns = field)) |>
  tab_style(style = cell_text(color = "#954000"), locations = cells_body(rows = 6, columns = field)) |>
  tab_style(style = cell_text(color = "#984B77"), locations = cells_body(rows = 7, columns = field)) |>
  cols_label(field = "Field", ggplot = "Trend") |>
  tab_spanner(label = "Popular Bachelor's degrees over the years", columns = everything()) |>
  tab_style(style = cell_text(weight = "bold"), locations = cells_column_spanners())

10 guidelines for better tables

10 guidelines for better tables

  1. Offset the heads from the body
  2. Use subtle dividers rather than heavy gridlines
  3. Right-align numbers and heads
  4. Left-align text and heads
  5. Select the appropriate level of precision
  6. Guide your reader with space between rows and columns
  7. Remove unit repetition
  8. Highlight outliers
  9. Group similar data and increase white space
  10. Add visualizations when appropriate

Table resources

Extensions to gt

Other packages

  • knitr::kable(): “Cheapest” pretty tables in Quarto
  • Interactivity: useful with Shiny applications!

Wrap-up

Wrap-up

  • Tables are used to communicate values precisely and with substantial detail
  • gt is a powerful package for creating tables in R
  • Structure the data frame first, then use gt() and appropriate functions to format the table

Acknowledgements