Visualizing spatial data II

Lecture 18

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2025

March 27, 2025

Announcements

Announcements

  • Homework 05

Visualization critique

Smoking and excessive drinking in the United States

  • What does the map tell you?
  • How effective is the bivariate color palette?

Structuring map data in R

Map data file formats

  • Vector files
    • Raster images
    • Numeric data
  • Popular formats
    • Shapefile
    • GeoJSON

Shapefile

  • Encodes points, lines, and polygons
  • Collection of files
    • .shp - geographic coordinates
    • .dbf - data associated with the geographic features
    • .prj - projection of the coordinates in the shapefile
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.dbf
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.prj
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.shp
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.shx

GeoJSON

  • Uses JavaScript Object Notation (JSON) file format

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [125.6, 10.1]
      },
      "properties": {
        "name": "Dinagat Islands"
      }
    }
  • Plain text files

Simple Features for R

Three cute fuzzy monsters adding spatial geometries to an existing table of attributes using glue and tape, while one cuts out the spatial polygons. Title text reads 'sf: spatial data...simplified.' and a caption at the bottom reads 'sticky geometries: for people who love their maps and sanity.'

What is a feature?

  • Thing or an object in the real world
  • Sets of features
  • Geometry
  • Attributes

Simple features

Simple features in R

  • Uses basic R data structures
  • Data frame with one row per feature
  • Lots of list columns

Importing shapefiles

nyc_shape <- st_read(dsn = "data/borough-boundaries")
Reading layer `geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222' from data source `/Users/soltoffbc/Projects/info-3312/course-site/slides/data/borough-boundaries' 
  using driver `ESRI Shapefile'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
nyc_shape
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
  boro_code     boro_name shape_area shape_leng                       geometry
1         5 Staten Island 1623631283   325924.0 MULTIPOLYGON (((-74.05051 4...
2         2         Bronx 1187189499   463277.2 MULTIPOLYGON (((-73.89681 4...
3         1     Manhattan  636605816   359103.2 MULTIPOLYGON (((-74.01093 4...
4         3      Brooklyn 1934169229   728478.1 MULTIPOLYGON (((-73.86327 4...
5         4        Queens 3041397430   888238.6 MULTIPOLYGON (((-73.82645 4...

Importing GeoJSON files

nyc_json <- st_read(dsn = "data/borough-boundaries.geojson")
Reading layer `borough-boundaries' from data source 
  `/Users/soltoffbc/Projects/info-3312/course-site/slides/data/borough-boundaries.geojson' 
  using driver `GeoJSON'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
nyc_json
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
  boro_code     boro_name    shape_area    shape_leng
1         5 Staten Island 1623631283.36 325924.002076
2         2         Bronx  1187189499.3 463277.240478
3         1     Manhattan 636605816.437 359103.151368
4         3      Brooklyn 1934169228.83 728478.125489
5         4        Queens 3041397430.33 888238.562635
                        geometry
1 MULTIPOLYGON (((-74.05051 4...
2 MULTIPOLYGON (((-73.89681 4...
3 MULTIPOLYGON (((-74.01093 4...
4 MULTIPOLYGON (((-73.86327 4...
5 MULTIPOLYGON (((-73.82645 4...

Visualizing {sf} objects

Drawing maps with {sf} objects

usa <- states(cb = TRUE)
usa
Simple feature collection with 56 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
First 10 features:
   STATEFP  STATENS    AFFGEOID GEOID STUSPS           NAME LSAD        ALAND
1       56 01779807 0400000US56    56     WY        Wyoming   00 2.514587e+11
2       02 01785533 0400000US02    02     AK         Alaska   00 1.478943e+12
3       24 01714934 0400000US24    24     MD       Maryland   00 2.515199e+10
4       60 01802701 0400000US60    60     AS American Samoa   00 1.977591e+08
5       05 00068085 0400000US05    05     AR       Arkansas   00 1.346608e+11
6       38 01779797 0400000US38    38     ND   North Dakota   00 1.786943e+11
7       10 01779781 0400000US10    10     DE       Delaware   00 5.046732e+09
8       66 01802705 0400000US66    66     GU           Guam   00 5.435558e+08
9       35 00897535 0400000US35    35     NM     New Mexico   00 3.141986e+11
10      49 01455989 0400000US49    49     UT           Utah   00 2.133551e+11
         AWATER                       geometry
1    1867503716 MULTIPOLYGON (((-111.0546 4...
2  245378425142 MULTIPOLYGON (((179.4825 51...
3    6979074857 MULTIPOLYGON (((-76.05015 3...
4    1307243751 MULTIPOLYGON (((-168.1458 -...
5    3121950081 MULTIPOLYGON (((-94.61792 3...
6    4414779956 MULTIPOLYGON (((-104.0487 4...
7    1399179670 MULTIPOLYGON (((-75.56555 3...
8     934337453 MULTIPOLYGON (((144.6454 13...
9     726482113 MULTIPOLYGON (((-109.0502 3...
10   6529973239 MULTIPOLYGON (((-114.053 37...

USA boundaries

ggplot(data = usa) +
  geom_sf()

Plot a subset of a map

usa_48 <- usa |>
  filter(NAME %in% state.name) |>
  filter(NAME != "Alaska", NAME != "Hawaii")

ggplot(data = usa_48) +
  geom_sf()

Just another ggplot()

ggplot(data = usa_48) +
  geom_sf(
    fill = "palegreen",
    color = "black"
  )

shift_geometry()

states_sf <- states(cb = TRUE) |>
  filter(NAME %in% state.name) |>
  shift_geometry()

ggplot(data = states_sf) +
  geom_sf()

Identifying points on a map

Points

crimes <- read_csv(file = "data/nyc-crimes.csv")
crimes_homicide <- filter(.data = crimes, ofns_desc == "MURDER & NON-NEGL. MANSLAUGHTER")
crimes_homicide
# A tibble: 269 × 7
   cmplnt_num    boro_nm   cmplnt_fr_dt        law_cat_cd ofns_desc     latitude
   <chr>         <chr>     <dttm>              <chr>      <chr>            <dbl>
 1 240954923H1   BROOKLYN  1977-12-20 05:00:00 FELONY     MURDER & NON…     40.7
 2 245958045H1   BROOKLYN  2001-08-13 04:00:00 FELONY     MURDER & NON…     40.7
 3 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON…     40.8
 4 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON…     40.8
 5 16631466H8909 BROOKLYN  2006-05-24 04:00:00 FELONY     MURDER & NON…     40.7
 6 246056367H1   QUEENS    2015-05-13 04:00:00 FELONY     MURDER & NON…     40.6
 7 243507594H1   MANHATTAN 2020-06-19 04:00:00 FELONY     MURDER & NON…     40.8
 8 243688124H1   BROOKLYN  2021-01-31 05:00:00 FELONY     MURDER & NON…     40.7
 9 240767513H1   BROOKLYN  2021-02-17 05:00:00 FELONY     MURDER & NON…     40.6
10 240767512H1   BROOKLYN  2021-05-24 04:00:00 FELONY     MURDER & NON…     40.6
# ℹ 259 more rows
# ℹ 1 more variable: longitude <dbl>

Points

ggplot(
  data = crimes_homicide,
  mapping = aes(
    x = longitude,
    y = latitude
  )
) +
  geom_point()

Points

ggplot(data = nyc_json) +
  geom_sf() +
  geom_point(
    data = crimes_homicide,
    mapping = aes(
      x = longitude,
      y = latitude
    ),
    shape = 1
  )

Converting to {sf} data frame

crimes_homicide_sf <- st_as_sf(x = crimes_homicide, coords = c("longitude", "latitude"))
st_crs(crimes_homicide_sf) <- 4326 # set the coordinate reference system
crimes_homicide_sf
Simple feature collection with 269 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -74.08578 ymin: 40.59087 xmax: -73.73331 ymax: 40.90316
Geodetic CRS:  WGS 84
# A tibble: 269 × 6
   cmplnt_num    boro_nm   cmplnt_fr_dt        law_cat_cd ofns_desc             
 * <chr>         <chr>     <dttm>              <chr>      <chr>                 
 1 240954923H1   BROOKLYN  1977-12-20 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 2 245958045H1   BROOKLYN  2001-08-13 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 3 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 4 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 5 16631466H8909 BROOKLYN  2006-05-24 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 6 246056367H1   QUEENS    2015-05-13 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 7 243507594H1   MANHATTAN 2020-06-19 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 8 243688124H1   BROOKLYN  2021-01-31 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 9 240767513H1   BROOKLYN  2021-02-17 05:00:00 FELONY     MURDER & NON-NEGL. MA…
10 240767512H1   BROOKLYN  2021-05-24 04:00:00 FELONY     MURDER & NON-NEGL. MA…
# ℹ 259 more rows
# ℹ 1 more variable: geometry <POINT [°]>

Plotting with two sf data frames

ggplot() +
  geom_sf(data = nyc_json) +
  geom_sf(
    data = crimes_homicide_sf,
    shape = 1
  )

Choropleths

Fill (choropleths)

usa_fb <- get_acs(
  geography = "state",
  variables = c(total = "B05002_001E", native = "B05002_002E",
                foreign = "B05002_013E"),
  year = 2023,
  output = "wide",
  geometry = TRUE
) |>
  select(GEOID, NAME, total, native, foreign) |>
  mutate(pct_foreign = foreign / total) |>
  filter(NAME %in% state.name) |>
  shift_geometry()
usa_fb
Simple feature collection with 50 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -3111747 ymin: -1697746 xmax: 2258200 ymax: 1565782
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
First 10 features:
   GEOID         NAME    total   native  foreign pct_foreign
1     35   New Mexico  2114768  1919284   195484  0.09243756
2     46 South Dakota   899194   863976    35218  0.03916619
3     06   California 39242785 28763259 10479526  0.26704338
4     21     Kentucky  4510725  4311120   199605  0.04425120
5     01      Alabama  5054253  4862214   192039  0.03799553
6     13      Georgia 10822590  9650102  1172488  0.10833710
7     05     Arkansas  3032651  2874432   158219  0.05217185
8     42 Pennsylvania 12986518 12025271   961247  0.07401884
9     29     Missouri  6168181  5898279   269902  0.04375715
10    08     Colorado  5810774  5254183   556591  0.09578603
                         geometry
1  MULTIPOLYGON (((-1231344 -5...
2  MULTIPOLYGON (((-633765.6 8...
3  MULTIPOLYGON (((-2066923 -2...
4  MULTIPOLYGON (((584560 -886...
5  MULTIPOLYGON (((760323.7 -7...
6  MULTIPOLYGON (((1390722 -58...
7  MULTIPOLYGON (((122656.3 -1...
8  MULTIPOLYGON (((1287712 486...
9  MULTIPOLYGON (((19008.51 34...
10 MULTIPOLYGON (((-1123223 20...

Draw the map

ggplot(data = usa_fb) +
  geom_sf(mapping = aes(fill = pct_foreign))

Use better colors

library(colorspace)

ggplot(data = usa_fb) +
  geom_sf(mapping = aes(
    fill = pct_foreign
  )) +
  scale_fill_continuous_sequential(palette = "mako")

Spatial aggregation

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf)
Simple feature collection with 269 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
First 10 features:
    boro_code     boro_name    shape_area    shape_leng  cmplnt_num
1           5 Staten Island 1623631283.36 325924.002076 239347883H1
1.1         5 Staten Island 1623631283.36 325924.002076 244818447H1
1.2         5 Staten Island 1623631283.36 325924.002076 246985876H1
2           2         Bronx  1187189499.3 463277.240478 242029394H1
2.1         2         Bronx  1187189499.3 463277.240478 242029394H1
2.2         2         Bronx  1187189499.3 463277.240478 243573864H1
2.3         2         Bronx  1187189499.3 463277.240478 244256173H1
2.4         2         Bronx  1187189499.3 463277.240478 240776535H1
2.5         2         Bronx  1187189499.3 463277.240478 240776535H1
2.6         2         Bronx  1187189499.3 463277.240478 240776535H1
          boro_nm        cmplnt_fr_dt law_cat_cd
1   STATEN ISLAND 2022-01-18 05:00:00     FELONY
1.1 STATEN ISLAND 2022-05-09 04:00:00     FELONY
1.2 STATEN ISLAND 2022-06-21 04:00:00     FELONY
2           BRONX 2021-06-01 04:00:00     FELONY
2.1         BRONX 2021-06-01 04:00:00     FELONY
2.2         BRONX 2021-10-24 04:00:00     FELONY
2.3         BRONX 2021-12-17 05:00:00     FELONY
2.4         BRONX 2022-01-01 05:00:00     FELONY
2.5         BRONX 2022-01-01 05:00:00     FELONY
2.6         BRONX 2022-01-01 05:00:00     FELONY
                          ofns_desc                       geometry
1   MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
1.1 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
1.2 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
2   MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.1 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.2 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.3 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.4 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.5 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.6 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf) |>
  count(boro_name)
Simple feature collection with 5 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
      boro_name   n                       geometry
1         Bronx 101 MULTIPOLYGON (((-73.89679 4...
2      Brooklyn  75 MULTIPOLYGON (((-73.86318 4...
3     Manhattan  46 MULTIPOLYGON (((-74.0086 40...
4        Queens  44 MULTIPOLYGON (((-73.82646 4...
5 Staten Island   3 MULTIPOLYGON (((-74.05054 4...

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf) |>
  count(boro_name) |>
  ggplot() +
  geom_sf(mapping = aes(fill = n))

Which is better for comparisons?

Application exercise

ae-17

Instructions

  • Go to the course GitHub org and find your ae-17 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

Bin continuous data to discrete intervals

Bin data to discrete intervals

  • Continuous vs. discrete variables for color
  • Collapse to a discrete variable

ggplot2::binned_scale()

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(palette = "inferno")

ggplot2::binned_scale() with quartiles

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(
    palette = "inferno",
    n.breaks = 3
  )

ggplot2::binned_scale() with quartiles

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(
    palette = "inferno",
    n.breaks = 3,
    nice.breaks = FALSE
  )

Map projection

Map projection

Map projection

Map projection

  • Coordinate reference system
  • proj4string
  • Well Known Text 2 (WKT2) string

https://spatialreference.org/ref/epsg/

US National Atlas Equal Area

proj4string

+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs

WKT2

PROJCS["US National Atlas Equal Area",
    GEOGCS["Unspecified datum based upon the Clarke 1866 Authalic Sphere",
        DATUM["Not_specified_based_on_Clarke_1866_Authalic_Sphere",
            SPHEROID["Clarke 1866 Authalic Sphere",6370997,0,
                AUTHORITY["EPSG","7052"]],
            AUTHORITY["EPSG","6052"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.01745329251994328,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4052"]],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",45],
    PARAMETER["longitude_of_center",-100],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","2163"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]

EPSG

Mercator projection

Projection using standard lines

South on the top

Application exercise

ae-17

Work through part 2

  • Modify the map to use a binned color scale
  • Implement a projection method appropriate for the state of New York

Wrap-up

Recap

  • Structure geographic features using vector data and simple features
  • Use geom_sf() to visualize simple features data in {ggplot2}
  • Consider spatially aggregating your observations for more effective communication
  • Deliberately select color palettes and projection methods

How to make the bivariate color palette

# import health data
# source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-County-Data-GIS-Friendly-Format-2024-releas/i46a-9kgh/about_data
county_health <- read_csv(file = "data/PLACES__County_Data__GIS_Friendly_Format___2024_release_20250326.csv") |>
  select(GEOID = CountyFIPS, drinking = BINGE_AdjPrev, smoking = CSMOKING_AdjPrev)

# get county/state boundaries
county_sf <- counties(cb = TRUE) |>
  # only keep counties in US states
  semi_join(
    y = fips_codes |>
      filter(state_name %in% state.name),
    by = join_by(
      STATEFP == state_code,
      COUNTYFP == county_code
    )
  ) |>
  # shift Alaska and Hawaii
  shift_geometry()

state_sf <- states(cb = TRUE) |>
  filter(NAME %in% state.name) |>
  # shift Alaska and Hawaii
  shift_geometry()

How to make the bivariate color palette

# create map
smoke_drink_map <- left_join(x = county_sf, y = county_health) |>
  # use biscales to create bivariate color palette
  bi_class(
    x = smoking,
    y = drinking,
    style = "quantile"
  ) |>
  # draw map
  ggplot() +
  geom_sf(mapping = aes(fill = bi_class), linewidth = 0.1) +
  # overlay state boundaries for context
  geom_sf(data = state_sf, fill = NA, color = "white") +
  # use biscale color palette
  bi_scale_fill(pal = "PinkGrn", guide = "none") +
  # appropriate labels and theme
  labs(
    title = "Smoking and excessive drinking in the United States",
    caption = "Source: Behavioral Risk Factor Surveillance System (2022)"
  ) +
  theme_map(base_family = "Atkinson Hyperlegible", base_size = 14)

# create legend
smoke_drink_legend <- bi_legend(
  pal = "PinkGrn",
  dim = 3,
  xlab = "Higher % Smokers",
  ylab = "Higher % Binge Drinkers",
  size = 9,
  pad_width = 1.5, pad_color = "white",
  rotate_pal = FALSE,
  base_family = "Atkinson Hyperlegible"
) +
  # fix justification of text labels
  theme(
    axis.title.x = element_text(hjust = 0),
    axis.title.y = element_text(hjust = 0)
  )

# combine map + legend with patchwork
smoke_drink_map +
  inset_element(smoke_drink_legend,
    left = 0, bottom = 0.3, right = 0.20, top = 0.575
  )

The final first lost tooth