HW 05 - A hodgepodge of techniques

Homework
Modified

March 21, 2024

Important

This homework is due March 27 at 11:59pm ET.

Getting started

  • Go to the info3312-sp24 organization on GitHub. Click on the repo with the prefix hw-05. It contains the starter documents you need to complete the lab.

  • Clone the repo and start a new project in RStudio.

Packages

library(tidyverse)
library(lubridate)
library(sf)
library(scales)
library(patchwork)
library(colorspace)
library(gganimate)

Guidelines + tips

As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete this homework and other assignments in this course. There will be periodic reminders in this assignment to remind you to render, commit, and push your changes to GitHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.

Workflow + formatting

Make sure to

  • Update author name on your document.
  • Label all code chunks informatively and concisely.
  • Follow the Tidyverse code style guidelines.
  • Make at least 3 commits.
  • Resize figures where needed, avoid tiny or huge plots.
  • Turn in an organized, well formatted document.
Important

Any time you are asked to recreate a visualization, approximate it as closely as possible. We do not care if there are minor differences due to resolution, aspect ratio, font size, etc., as long as your visualization captures the spirit of the original.

Exercise 1

Median housing prices in the US. The inspiration and the data for this exercise comes from https://fred.stlouisfed.org/series/MSPUS. The two datasets you’ll use are median_housing and recessions, both of which are in the data folder of your repository.

  • Load the two datasets using read_csv().

  • Rename the variables as date and price.

  • Create the following visualization.

  • Identify recessions that happened during the time frame of the median_housing dataset. Now recreate the following visualization. The shaded areas are recessions that happened during the time frame of the median_housing dataset. Hint: The shaded areas are “behind” the line.

  • Create a subset of the median_housing dataset for data from 2019-22. Add two columns: year and quarter. year is the year of the date and quarter takes the values Q1, Q2, Q3, or Q4 based on date.

  • Create the following visualization.

Tip
  • The year values on the bottom of the chart are not generated as axis tick-mark labels. Those are the “Q1”, “Q2”, “Q3”, and “Q4” values from the quarter column of the housing_19_22 dataset. The scale_x_date() function is used to set the breaks and labels for the x-axis.
  • The annotate() function is used to add arbitrary annotations to a chart.
  • Annotations must be defined based on the coordinate system of the chart. However there is nothing requiring you to set those values within the limits of the \(x\) or \(y\) axis if you correctly modify the plot.

Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.

Exercise 2

Adopt, don’t shop. The data for this exercise comes from The Pudding via TidyTuesday.

  • Load the dog_travel dataset included in the data folder of your repository with read_csv().

  • Calculate the number of dogs available to adopt per contact_state. Save the result as a new data frame with variables contact_state and n.

  • Make a histogram of the number of dogs available to adopt and describe the distribution of this variable.

  • Use this dataset to make a map of the US states, where each state is filled in with a color based on the number of dogs available to adopt in that state.

    Hints:

    • Use the state_list dataset which you can find in the data folder of your repo as a lookup table to match state names to abbreviations.
    • Use a gradient color scale and log10 transformation.
  • Interpret the visualization.

Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.

Exercise 3

Country populations. For this exercise you will work with data on country populations. The data come from The World Bank. The dataset you will use is in your data/ folder and it’s called country-pop.csv.

  • Load the two dataset using read_csv().

  • Find the countries with the top 10 highest population count in 2022. Subset the data for just these 10 countries.

  • Create a racing bar chart, using gganimate for the change in population for these countries.

Important

PDFs are not a great format for rendering animated graphs, but Gradescope requires all submissions be made via PDF. In order to render an animated chart in Quarto to PDF, I have included additional settings in your YAML header as well as code chunk options in the chunk for exercise 3. Please do not delete or modify these settings. See this gist for more information.

Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.

Exercise 4

Brexit. In September 2019, YouGov survey asked 1,639 Great Britain adults the following question:

How well or badly do you think the government are doing at handling Britain’s exit from the European Union?

  • Very well
  • Fairly well
  • Fairly badly
  • Very badly
  • Don’t know

The dataset containing responses to this question can be found in the data/ folder and is called brexit.csv. Load the two dataset using read_csv() and create the following visualization. Your task is to recreate the following visualization, and to add a caption describing what each plot represents (Plots A, B, and C). Some hints to help you along the way:

  • Before you get started, filter out the “Don’t know” responses.
  • Use a diverging color scale from the colorspace package.
  • Use patchwork along with its plot_annotation() functionality to label plots.
  • “Collect” the legends (guides).
  • For ease of copy-paste, the shortlink in the caption is bit.ly/2lCJZVg.


Render, commit, and push one last time.

Make sure that you commit and push all changed documents and your Git pane is completely empty before proceding.

Wrap up

Submission

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Cornell University NetID and log in using your NetID credentials.
  • Click on your INFO 3312 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of homework lab should be associated with at least one question (i.e., should be “checked”).

Grading

  • Exercise 1: 14 points
  • Exercise 2: 12 points
  • Exercise 3: 12 points
  • Exercise 4: 12 points
  • Total: 50 points