HW 05 - Effective visual design + making data maps

Homework
Modified

March 28, 2025

Important

This homework is due April 9 at 11:59pm ET.

Learning objectives

  • Critique data visualizations using a consistent set of principles for graphical excellence
  • Redesign data visualizations to improve their effectiveness
  • Generate geospatial data visualizations
  • Create effective visualizations that accurately represent data

Getting started

  • Go to the info3312-sp25 organization on GitHub. Click on the repo with the prefix hw-05. It contains the starter documents you need to complete the lab.

  • Clone the repo and start a new project in RStudio.

General guidance

Guidelines + tips

As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete the lab and other assignments in this course. There will be periodic reminders in this assignment to remind you to render, commit, and push your changes to GitHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.

Workflow + formatting

Make sure to

  • Update author name on your document.
  • Label all code chunks informatively and concisely.
  • Follow the Tidyverse code style guidelines.
  • Make at least 3 commits.
  • Resize figures where needed, avoid tiny or huge plots.
  • Turn in an organized, well formatted document.

Packages

library(tidyverse)
library(tigris)
library(sf)
library(readxl)
library(tidycensus)
library(colorspace)
library(ggthemes)
library(scales)
library(ggmap)

Exercises

Exercise 1

Critique and improve a data visualization. Find a data visualization, critique it, and design and implement an improved version.

Make sure you can access the data

You will need access to the original data for this exercise. It does not mean that the original visualization had to be created in R. This simply requires either access to the original data files, or you will need to manually recreate the data using tibble() or tribble().

In one paragraph, introduce the chart and present a copy of it with appropriate attribution.1 Describe the original purpose of the chart and the question it is attempting to answer.

In two to three paragraphs, identify the strengths and weaknesses of the chart. Give thoughtful, constructive, and considerate comments. Connect your critique to Alberto Cairo’s five qualities of great visualizations that we have learned in this class. Effective critiques are challenging to write. You are not attempting to be mean or “tear down” the original visualization. The goal of critiquing something is to improve on it.

Finally, design and implement an improved version of the visualization based on your critique. This could include a detailed list of design improvements or a sketch of the new visualization. Your improved version should address the weaknesses you identified in the original visualization and be more effective at communicating the original question. You must implement your improved version in R.

Interpret the new visualization. Does it lead to different conclusions from the original visualization? How is your redesigned chart more effective than the original?

Exercise 2

Food security in the United States. The USDA Economic Research Service (ERS) provides data on food security, or the ability to access enough nutritious, affordable food, in the United States. The data is available in the Food Security in the U.S. page and has been saved in data/FoodAccessResearchAtlasData2019.xlsx. The data is available at the Census tract-level and includes differnet measures of food security, such as food insecurity, food deserts, and low-income and low-access areas.

Use your knowledge of geospatial data visualizations to create a series of maps that visualize food security in the United States. You can choose what story you wish to tell. Feel free to focus on specific measures and/or regions of the United States (i.e. don’t just draw a map of the entire United States).

In your analysis, you should include at least three maps. Each map should be accompanied by a brief paragraph that explains the map and the story it tells.

Helpful advice
  • The dataset includes a CensusTract column which contains the unique GEOID for each Census tract. Once you obtain boundary data for the Census tracts, use this column to join the datasets together.
  • The U.S. Census Bureau redraws Census tracts every 10 years. The dataset you are using is from 2019, so you need to find geographic boundaries that correspond to Census tracts as of 2019. If you use the most recent boundaries post-2020, you will not be able to join those boundaries to the food security data correctly.
  • Feel free to use either raster map tiles and/or vector data to generate your maps.
  • Remember to adhere to effective design principles. Make sure your maps are clear, informative, and visually appealing.

Exercise 3

Visualizing election results. It is extremely common to visualize election results using maps. In the United States, election results are often visualized at the county level and look something like this:

A problem with this design is that it can be misleading. The map emphasizes the land mass of counties rather than the population they contain. This can lead to a distorted view of the election results. For example, the map above makes it look like the majority of the country voted for the Republican candidate, when in fact the majority of the population voted for the Democratic candidate.

Design and implement an improved geospatial data visualization that corrects for this distortion. There are several ways you might think to design such a map. Feel free to research some approaches, but your implementation should be original. Along with the map, document your design choices and why you believe your approach is most effective.

Helpful advice
  • The county election results can be found in data/countypres_2000-2020.csv. To visualize the two-party vote share like above, you need to calculate the total number of votes received for the Democratic and Republican candidates in each county for 2020. You can then determine the share of votes that were cast for either of the candidates.
  • Alaska is annoying. They do not report election results at the county level, but instead for the voting district for the lower legislative chamber. To effectively incorporate Alaska into your map, you will need to obtain the Alaska voting district boundaries along with the county boundaries for the other 49 states.

Generative AI (GAI) self-reflection

As stated in the syllabus, include a written reflection for this assignment of how you used GAI tools (e.g. what tools you used, how you used them to assist you with writing code), what skills you believe you acquired, and how you believe you demonstrated mastery of the learning objectives.

Wrap up

Submission

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Cornell University NetID and log in using your NetID credentials.
  • Click on your INFO 3312 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of homework should be associated with at least one question (i.e., should be “checked”).

Grading

  • Exercise 1: 20 points
  • Exercise 2: 20 points
  • Exercise 3: 10 points
  • Total: 50 points

Footnotes

  1. I strongly encourage you to store a copy of it in your repo rather than hotlinking to the original image. If it is an interactive graph, be sure to include a static screenshot of the visualization and a link back to the original source.↩︎