Data wrangling (I)

Lecture 7

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2024

February 13, 2024

Announcements

Announcements

  • Meet with your project 01 teams
  • Draft proposal rendered + committed + pushed to GitHub by 11:59pm on Thursday

Visualization critique

Tortured Visualizations Department

  • What is the story?
  • How effective do you find the design?

Agenda for today

Agenda

  • Transforming and reshaping data

Transforming and reshaping a single data frame

Scenario 1

We…

have a single data frame

want to slice it, and dice it, and juice it, and process it, so we can plot it

dplyr 101

Which of the following (if any) are unfamiliar to you?

  • distinct()
  • select(), relocate()
  • arrange(), arrange(desc())
  • slice(), slice_head(), slice_tail(), slice_sample()
  • filter()
  • mutate()
  • summarize(), count()

tidyr 101

Which of the following (if any) are unfamiliar to you?

  • pivot_longer(), pivot_wider()
  • separate_wider_delim(), separate_wider_position(), unite()
  • unnest_longer(), unnest_wider(), unnest_auto()

Application exercise

Improve a sad plot

Let’s recreate this visualization and make it better!

ae-05

  • Go to the course GitHub org and find your ae-05 (repo name will be suffixed with your NetID).
  • Clone the repo in RStudio Workbench, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow.

Wrap up

Wrap up

  • Data is often messy and needs to be transformed and reshaped for effective communication
  • dplyr contains functions for transforming data
  • tidyr contains functions for reshaping data
  • Design choices are crucial to effective storytelling with data
  • There is not inherently one “right” choice, but some choices are more effective than others

Beyoncé goes country