Visualizing spatial data II

Lecture 14

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2024

March 14, 2024

Announcements

Announcements

  • Project 02

Visualization critique

Spring is just around the corner – or is it already here?

  • What is the story?
  • How do the aesthetic design choices support the story?

Structuring map data in R

Map data file formats

  • Vector files
    • Raster images
    • Numeric data
  • Popular formats
    • Shapefile
    • GeoJSON

Shapefile

  • Encodes points, lines, and polygons
  • Collection of files
    • .shp - geographic coordinates
    • .dbf - data associated with the geographic features
    • .prj - projection of the coordinates in the shapefile
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.dbf
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.prj
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.shp
-- geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222.shx

GeoJSON

  • Uses JavaScript Object Notation (JSON) file format

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [125.6, 10.1]
      },
      "properties": {
        "name": "Dinagat Islands"
      }
    }
  • Plain text files

Simple Features for R

Three cute fuzzy monsters adding spatial geometries to an existing table of attributes using glue and tape, while one cuts out the spatial polygons. Title text reads 'sf: spatial data...simplified.' and a caption at the bottom reads 'sticky geometries: for people who love their maps and sanity.'

What is a feature?

  • Thing or an object in the real world
  • Sets of features
  • Geometry
  • Attributes

Simple features

Simple features in R

  • Uses basic R data structures
  • Data frame with one row per feature
  • Lots of list columns

Importing shapefiles

nyc_shape <- st_read(dsn = "data/borough-boundaries")
Reading layer `geo_export_6fd95df5-1136-4829-8f2d-9cb5d1cc2222' from data source `/Users/soltoffbc/Projects/info-3312/course-site/slides/data/borough-boundaries' 
  using driver `ESRI Shapefile'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
nyc_shape
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS84(DD)
  boro_code     boro_name shape_area shape_leng                       geometry
1         5 Staten Island 1623631283   325924.0 MULTIPOLYGON (((-74.05051 4...
2         2         Bronx 1187189499   463277.2 MULTIPOLYGON (((-73.89681 4...
3         1     Manhattan  636605816   359103.2 MULTIPOLYGON (((-74.01093 4...
4         3      Brooklyn 1934169229   728478.1 MULTIPOLYGON (((-73.86327 4...
5         4        Queens 3041397430   888238.6 MULTIPOLYGON (((-73.82645 4...

Importing GeoJSON files

nyc_json <- st_read(dsn = "data/borough-boundaries.geojson")
Reading layer `borough-boundaries' from data source 
  `/Users/soltoffbc/Projects/info-3312/course-site/slides/data/borough-boundaries.geojson' 
  using driver `GeoJSON'
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
nyc_json
Simple feature collection with 5 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
  boro_code     boro_name    shape_area    shape_leng
1         5 Staten Island 1623631283.36 325924.002076
2         2         Bronx  1187189499.3 463277.240478
3         1     Manhattan 636605816.437 359103.151368
4         3      Brooklyn 1934169228.83 728478.125489
5         4        Queens 3041397430.33 888238.562635
                        geometry
1 MULTIPOLYGON (((-74.05051 4...
2 MULTIPOLYGON (((-73.89681 4...
3 MULTIPOLYGON (((-74.01093 4...
4 MULTIPOLYGON (((-73.86327 4...
5 MULTIPOLYGON (((-73.82645 4...

Visualizing sf objects

Drawing maps with sf objects

usa <- st_read(dsn = "data/cb_2020_us_state_5m")
Reading layer `cb_2020_us_state_5m' from data source 
  `/Users/soltoffbc/Projects/info-3312/course-site/slides/data/cb_2020_us_state_5m' 
  using driver `ESRI Shapefile'
Simple feature collection with 56 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
Geodetic CRS:  NAD83
usa
Simple feature collection with 56 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
Geodetic CRS:  NAD83
First 10 features:
   STATEFP  STATENS    AFFGEOID GEOID STUSPS          NAME LSAD        ALAND
1       55 01779806 0400000US55    55     WI     Wisconsin   00 140292246684
2       54 01779805 0400000US54    54     WV West Virginia   00  62266296765
3       16 01779783 0400000US16    16     ID         Idaho   00 214049923496
4       27 00662849 0400000US27    27     MN     Minnesota   00 206232157570
5       19 01779785 0400000US19    19     IA          Iowa   00 144659688848
6       10 01779781 0400000US10    10     DE      Delaware   00   5046731558
7       72 01779808 0400000US72    72     PR   Puerto Rico   00   8868948653
8       29 01779791 0400000US29    29     MO      Missouri   00 178052563675
9       50 01779802 0400000US50    50     VT       Vermont   00  23873081385
10      24 01714934 0400000US24    24     MD      Maryland   00  25151895765
        AWATER                       geometry
1  29343721650 MULTIPOLYGON (((-86.9562 45...
2    489206049 MULTIPOLYGON (((-82.643 38....
3   2391577745 MULTIPOLYGON (((-117.2427 4...
4  18949864226 MULTIPOLYGON (((-97.23921 4...
5   1085996889 MULTIPOLYGON (((-96.6397 42...
6   1399179670 MULTIPOLYGON (((-75.5708 39...
7   4922329963 MULTIPOLYGON (((-65.3375 18...
8   2487215790 MULTIPOLYGON (((-95.77355 4...
9   1030243281 MULTIPOLYGON (((-73.43774 4...
10  6979171386 MULTIPOLYGON (((-76.0494 37...

USA boundaries

ggplot(data = usa) +
  geom_sf()

Plot a subset of a map

usa_48 <- usa |>
  filter(NAME %in% state.name) |>
  filter(NAME != "Alaska", NAME != "Hawaii")

ggplot(data = usa_48) +
  geom_sf()

Just another ggplot()

ggplot(data = usa_48) +
  geom_sf(
    fill = "palegreen",
    color = "black"
  )

urbnmapr

library(urbnmapr)

states_sf <- get_urbn_map("states",
                          sf = TRUE)

ggplot(data = states_sf) +
  geom_sf()

Identifying points on a map

Points

crimes <- read_csv(file = "data/nyc-crimes.csv")
crimes_homicide <- filter(.data = crimes, ofns_desc == "MURDER & NON-NEGL. MANSLAUGHTER")
crimes_homicide
# A tibble: 269 × 7
   cmplnt_num    boro_nm   cmplnt_fr_dt        law_cat_cd ofns_desc     latitude
   <chr>         <chr>     <dttm>              <chr>      <chr>            <dbl>
 1 240954923H1   BROOKLYN  1977-12-20 05:00:00 FELONY     MURDER & NON…     40.7
 2 245958045H1   BROOKLYN  2001-08-13 04:00:00 FELONY     MURDER & NON…     40.7
 3 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON…     40.8
 4 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON…     40.8
 5 16631466H8909 BROOKLYN  2006-05-24 04:00:00 FELONY     MURDER & NON…     40.7
 6 246056367H1   QUEENS    2015-05-13 04:00:00 FELONY     MURDER & NON…     40.6
 7 243507594H1   MANHATTAN 2020-06-19 04:00:00 FELONY     MURDER & NON…     40.8
 8 243688124H1   BROOKLYN  2021-01-31 05:00:00 FELONY     MURDER & NON…     40.7
 9 240767513H1   BROOKLYN  2021-02-17 05:00:00 FELONY     MURDER & NON…     40.6
10 240767512H1   BROOKLYN  2021-05-24 04:00:00 FELONY     MURDER & NON…     40.6
# ℹ 259 more rows
# ℹ 1 more variable: longitude <dbl>

Points

ggplot(
  data = crimes_homicide,
  mapping = aes(
    x = longitude,
    y = latitude
  )
) +
  geom_point()

Points

ggplot(data = nyc_json) +
  geom_sf() +
  geom_point(
    data = crimes_homicide,
    mapping = aes(
      x = longitude,
      y = latitude
    ),
    shape = 1
  )

Converting to sf data frame

crimes_homicide_sf <- st_as_sf(x = crimes_homicide, coords = c("longitude", "latitude"))
st_crs(crimes_homicide_sf) <- 4326 # set the coordinate reference system
crimes_homicide_sf
Simple feature collection with 269 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -74.08578 ymin: 40.59087 xmax: -73.73331 ymax: 40.90316
Geodetic CRS:  WGS 84
# A tibble: 269 × 6
   cmplnt_num    boro_nm   cmplnt_fr_dt        law_cat_cd ofns_desc             
 * <chr>         <chr>     <dttm>              <chr>      <chr>                 
 1 240954923H1   BROOKLYN  1977-12-20 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 2 245958045H1   BROOKLYN  2001-08-13 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 3 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 4 8101169H6113  MANHATTAN 2005-03-06 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 5 16631466H8909 BROOKLYN  2006-05-24 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 6 246056367H1   QUEENS    2015-05-13 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 7 243507594H1   MANHATTAN 2020-06-19 04:00:00 FELONY     MURDER & NON-NEGL. MA…
 8 243688124H1   BROOKLYN  2021-01-31 05:00:00 FELONY     MURDER & NON-NEGL. MA…
 9 240767513H1   BROOKLYN  2021-02-17 05:00:00 FELONY     MURDER & NON-NEGL. MA…
10 240767512H1   BROOKLYN  2021-05-24 04:00:00 FELONY     MURDER & NON-NEGL. MA…
# ℹ 259 more rows
# ℹ 1 more variable: geometry <POINT [°]>

Plotting with two sf data frames

ggplot() +
  geom_sf(data = nyc_json) +
  geom_sf(
    data = crimes_homicide_sf,
    shape = 1
  )

Choropleths

Fill (choropleths)

fb_state <- get_acs(
  geography = "state",
  variables = c(total = "B05002_001E", native = "B05002_002E",
                foreign = "B05002_013E"),
  year = 2022,
  output = "wide"
) |>
  select(GEOID, NAME, total, native, foreign) |>
  mutate(pct_foreign = foreign / total)
fb_state
# A tibble: 52 × 6
   GEOID NAME                    total   native  foreign pct_foreign
   <chr> <chr>                   <dbl>    <dbl>    <dbl>       <dbl>
 1 01    Alabama               5028092  4850885   177207      0.0352
 2 02    Alaska                 734821   676747    58074      0.0790
 3 04    Arizona               7172282  6236322   935960      0.130 
 4 05    Arkansas              3018669  2866888   151781      0.0503
 5 06    California           39356104 28913224 10442880      0.265 
 6 08    Colorado              5770790  5223188   547602      0.0949
 7 09    Connecticut           3611317  3068353   542964      0.150 
 8 10    Delaware               993635   896412    97223      0.0978
 9 11    District of Columbia   670587   580491    90096      0.134 
10 12    Florida              21634529 17060097  4574432      0.211 
# ℹ 42 more rows

Join the data

usa_fb <- left_join(x = states_sf, y = fb_state, by = join_by(state_fips == GEOID,
                                                              state_name == NAME))
usa_fb
Simple feature collection with 51 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -2600000 ymin: -2363000 xmax: 2516374 ymax: 732352.2
Projected CRS: NAD27 / US National Atlas Equal Area
First 10 features:
   state_fips state_abbv  state_name    total   native foreign pct_foreign
1          01         AL     Alabama  5028092  4850885  177207  0.03524339
2          04         AZ     Arizona  7172282  6236322  935960  0.13049682
3          08         CO    Colorado  5770790  5223188  547602  0.09489203
4          09         CT Connecticut  3611317  3068353  542964  0.15035069
5          12         FL     Florida 21634529 17060097 4574432  0.21144126
6          13         GA     Georgia 10722325  9602959 1119366  0.10439583
7          16         ID       Idaho  1854109  1747698  106411  0.05739199
8          18         IN     Indiana  6784403  6406469  377934  0.05570630
9          20         KS      Kansas  2935922  2728632  207290  0.07060474
10         22         LA   Louisiana  4640546  4443875  196671  0.04238100
                         geometry
1  MULTIPOLYGON (((1150023 -15...
2  MULTIPOLYGON (((-1386136 -1...
3  MULTIPOLYGON (((-786661.9 -...
4  MULTIPOLYGON (((2156197 -83...
5  MULTIPOLYGON (((1953691 -20...
6  MULTIPOLYGON (((1308636 -10...
7  MULTIPOLYGON (((-1357097 78...
8  MULTIPOLYGON (((1042064 -71...
9  MULTIPOLYGON (((-174904.2 -...
10 MULTIPOLYGON (((1075669 -15...

Draw the map

ggplot(data = usa_fb) +
  geom_sf(mapping = aes(fill = pct_foreign))

Use better colors

colorspace

Scale name: scale_<aesthetic>_<datatype>_<colorscale>()

  • <aesthetic>: name of the aesthetic (fill, color, colour)
  • <datatype>: type of variable plotted (discrete, continuous, binned)
  • <colorscale>: type of the color scale (qualitative, sequential, diverging)
Scale function Aesthetic     Data type Palette type    
scale_color_discrete_qualitative() color discrete qualitative
scale_fill_continuous_sequential() fill continuous sequential
scale_color_continuous_diverging() color continuous diverging

Use better colors

library(colorspace)

ggplot(data = usa_fb) +
  geom_sf(mapping = aes(
    fill = pct_foreign
  )) +
  scale_fill_continuous_sequential(palette = "mako")

Spatial aggregation

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf)
Simple feature collection with 269 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
First 10 features:
    boro_code     boro_name    shape_area    shape_leng  cmplnt_num
1           5 Staten Island 1623631283.36 325924.002076 239347883H1
1.1         5 Staten Island 1623631283.36 325924.002076 244818447H1
1.2         5 Staten Island 1623631283.36 325924.002076 246985876H1
2           2         Bronx  1187189499.3 463277.240478 242029394H1
2.1         2         Bronx  1187189499.3 463277.240478 242029394H1
2.2         2         Bronx  1187189499.3 463277.240478 243573864H1
2.3         2         Bronx  1187189499.3 463277.240478 244256173H1
2.4         2         Bronx  1187189499.3 463277.240478 240776535H1
2.5         2         Bronx  1187189499.3 463277.240478 240776535H1
2.6         2         Bronx  1187189499.3 463277.240478 240776535H1
          boro_nm        cmplnt_fr_dt law_cat_cd
1   STATEN ISLAND 2022-01-18 05:00:00     FELONY
1.1 STATEN ISLAND 2022-05-09 04:00:00     FELONY
1.2 STATEN ISLAND 2022-06-21 04:00:00     FELONY
2           BRONX 2021-06-01 04:00:00     FELONY
2.1         BRONX 2021-06-01 04:00:00     FELONY
2.2         BRONX 2021-10-24 04:00:00     FELONY
2.3         BRONX 2021-12-17 05:00:00     FELONY
2.4         BRONX 2022-01-01 05:00:00     FELONY
2.5         BRONX 2022-01-01 05:00:00     FELONY
2.6         BRONX 2022-01-01 05:00:00     FELONY
                          ofns_desc                       geometry
1   MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
1.1 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
1.2 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-74.05051 4...
2   MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.1 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.2 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.3 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.4 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.5 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...
2.6 MURDER & NON-NEGL. MANSLAUGHTER MULTIPOLYGON (((-73.89681 4...

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf) |>
  group_by(boro_name) |>
  count()
Simple feature collection with 5 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70001 ymax: 40.91553
Geodetic CRS:  WGS 84
# A tibble: 5 × 3
  boro_name         n                                                   geometry
* <chr>         <int>                                         <MULTIPOLYGON [°]>
1 Bronx           101 (((-73.89679 40.79633, -73.89713 40.7968, -73.89788 40.79…
2 Brooklyn         75 (((-73.86318 40.58406, -73.86283 40.58442, -73.8625 40.58…
3 Manhattan        46 (((-74.0086 40.68591, -74.00851 40.68596, -74.00843 40.68…
4 Queens           44 (((-73.82646 40.59059, -73.82647 40.59065, -73.82648 40.5…
5 Staten Island     3 (((-74.05054 40.56644, -74.05062 40.56651, -74.05067 40.5…

Spatial aggregation

nyc_json |>
  st_join(y = crimes_homicide_sf) |>
  group_by(boro_name) |>
  count() |>
  ggplot() +
  geom_sf(mapping = aes(fill = n))

Which is better for comparisons?

Application exercise

ae-11

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio Workbench, open the Quarto document in the repo, and follow along and complete the exercises.

Bin continuous data to discrete intervals

Bin data to discrete intervals

  • Continuous vs. discrete variables for color
  • Collapse to a discrete variable

cut_interval()

usa_fb |>
  mutate(rate_cut = cut_interval(pct_foreign, n = 6)) |>
  ggplot() +
  geom_sf(aes(fill = rate_cut)) +
  scale_fill_discrete_sequential(palette = "inferno")

cut_number()

usa_fb |>
  mutate(rate_cut = cut_number(pct_foreign, n = 6)) |>
  ggplot() +
  geom_sf(aes(fill = rate_cut)) +
  scale_fill_discrete_sequential(palette = "inferno")

ggplot2::binned_scale()

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(palette = "inferno")

ggplot2::binned_scale() with quartiles

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(
    palette = "inferno",
    n.breaks = 3
  )

ggplot2::binned_scale() with quartiles

usa_fb |>
  ggplot() +
  geom_sf(aes(fill = pct_foreign)) +
  scale_fill_binned_sequential(
    palette = "inferno",
    n.breaks = 3,
    nice.breaks = FALSE
  )

Map projection

Map projection

Map projection

Map projection

  • Coordinate reference system
  • proj4string
  • Well Known Text 2 (WKT2) string

https://spatialreference.org/ref/epsg/

US National Atlas Equal Area

proj4string

+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs

WKT2

PROJCS["US National Atlas Equal Area",
    GEOGCS["Unspecified datum based upon the Clarke 1866 Authalic Sphere",
        DATUM["Not_specified_based_on_Clarke_1866_Authalic_Sphere",
            SPHEROID["Clarke 1866 Authalic Sphere",6370997,0,
                AUTHORITY["EPSG","7052"]],
            AUTHORITY["EPSG","6052"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.01745329251994328,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4052"]],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",45],
    PARAMETER["longitude_of_center",-100],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","2163"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]

EPSG

Mercator projection

Projection using standard lines

South on the top

Adjusting color intervals and projections

Wrap-up

Wrap-up

  • Structure geographic features using vector data and simple features
  • Use geom_sf() to visualize simple features data in ggplot2
  • Consider spatially aggregating your observations for more effective communication
  • Deliberately select color palettes and projection methods