library(tidyverse)
library(scales)
library(palmerpenguins)
library(rvest)
library(colorspace)
library(ggtext)
HW 04 - Design + details
This homework is due March 13 at 11:59pm ET.
Getting started
Go to the info3312-sp24 organization on GitHub. Click on the repo with the prefix hw-03. It contains the starter documents you need to complete the lab.
Clone the repo and start a new project in RStudio.
Packages
Guidelines + tips
As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.
Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete this homework and other assignments in this course. There will be periodic reminders in this assignment to remind you to render, commit, and push your changes to GitHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.
Note: Do not let R output answer the question for you unless the question specifically asks for just a plot. For example, if the question asks for the number of columns in the data set, please type out the number of columns. You are subject to lose points if you do not.
Workflow + formatting
Make sure to
- Update author name on your document.
- Label all code chunks informatively and concisely.
- Follow the Tidyverse code style guidelines.
- Make at least 3 commits.
- Resize figures where needed, avoid tiny or huge plots.
- Turn in an organized, well formatted document.
Exercises
Exercise 1
Mirror, mirror on the wall, who’s the ugliest of them all? Make a plot of the variables in the penguins
dataset from the palmerpenguins package. Your plot should use at least two variables, but more is fine too.
First, print the plot using the default theme and color scales. Then, update the plot to be as ugly as possible. You will probably want to play around with theme options, colors, fonts, etc. The ultimate goal is the ugliest possible plot, and the sky is the limit!
Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Exercise 2
Mirror, mirror on the wall, who’s the fairest of them all? Take the same core graph that you created for exercise 1 (e.g. same variables, layers, etc.) and update it to be as beautiful and effective as possible. You will probably want to play around with theme options, colors, fonts, annotations, etc. The ultimate goal is the prettiest, most effective possible plot, and the sky is the limit!
Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.
Exercise 3
Critique a data visualization. Find a data visualization, critique it, and design an improved version.
In one paragraph, introduce the chart and present a copy of it with appropriate attribution.1 Describe the original purpose of the chart and the question it is attempting to answer.
In two to three paragraphs, identify the strengths and weaknesses of the chart. Give thoughtful, constructive, and considerate comments. Connect your critique to principles of visual design and effective data communication that we have learned in this class. Effective critiques are challenging to write. You are not attempting to be mean or “tear down” the original visualization. The goal of critiquing something is to improve on it.
Finally, design an improved version of the visualization based on your critique. This could include a detailed list of design improvements, a sketch of the new visualization, or a new version of the visualization in R. Your improved version should address the weaknesses you identified in the original visualization and be more effective at communicating the original question.
You are not required or expected to implement your revised visualization in R. You can do so, but it is not expected. We do need sufficient detail to understand how this new chart will be an improvement.
Render, commit, and push one last time.
Make sure that you commit and push all changed documents and your Git pane is completely empty before proceding.
Exercise 4
Improve the axis tick mark labels. In March and April 2020, most regions of the United States imposed shelter-in-place orders to slow the spread of COVID-19. Essential workers and projects were exempt from many of these requirements. In New York City, many construction projects were deemed “essential” and permitted to continue. We want to explore the types of construction projects that were allowed to continue during the pandemic. A very straightforward approach is to plot the number of projects in each category as a bar chart.
# load pandemic construction data
<- read_csv(file = "data/EssentialConstruction.csv")
essential_raw
|>
essential_raw # order the categories by the total number of projects
mutate(CATEGORY = fct_infreq(f = CATEGORY)) |>
# draw a bar chart
ggplot(mapping = aes(x = CATEGORY)) +
geom_bar() +
scale_y_continuous(labels = label_comma()) +
labs(
x = NULL,
y = "Total projects"
)
Alas, we encounter a common problem when visualizing categorical data. The labels on the \(x\)-axis are too long and overlap. This makes it difficult to read the chart.
Propose and implement at least 4 different solutions to improve the readability of the categories. For each method, implement the change and describe the advantages and disadvantages of the approach.
Exercise 5
Towards the EGOT. The Emmy, Grammy, Oscar, and Tony Awards are four of the most prestigious awards in the entertainment industry. Winning all four of these awards is considered a significant accomplishment in American show business.
As of this date, only 19 people have achieved this feat. We want to visualize the winners and the time it took for them to earn an EGOT.
You can find a list of all EGOT winners and the years in which they won each award on Wikipedia. Scrape the data from the first table and clean it to reproduce the visualization below.
Some important notes:
- The color palette used in the plot is “Dark 2” from the colorspace package.
- The font is Roboto Condensed. It is already installed on RStudio Workbench.
- Some individuals won multiple awards in the same or consecutive years. To ensure each point is still visible, we offset each point based approximately on when during the year the awards ceremony is held. For the purposes of calculating these offsets, we assume the Emmy Awards are held in September, the Grammy Awards in January, the Academy Awards in February, and the Tony Awards in June.2
Wrap up
Submission
- Go to http://www.gradescope.com and click Log in in the top right corner.
- Click School Credentials \(\rightarrow\) Cornell University NetID and log in using your NetID credentials.
- Click on your INFO 3312 course.
- Click on the assignment, and you’ll be prompted to submit it.
- Mark all the pages associated with exercise. All the pages of homework lab should be associated with at least one question (i.e., should be “checked”).
Grading
- Exercise 1: 5 points
- Exercise 2: 5 points
- Exercise 3: 12 points
- Exercise 4: 8 points
- Exercise 5: 20 points
- Total: 50 points
Footnotes
I strongly encourage you to store a copy of it in your repo rather than hotlinking to the original image. If it is an interactive graph, be sure to include a static screenshot of the visualization and a link back to the original source.↩︎
Historically these are the usual times of year for the ceremonies, though sometimes there are exceptions. Notably Elton John won his Emmy at the 75th Primetime Emmy Awards. Ordinarily that ceremony would have been held in September 2023 but due to ongoing labor disputes the ceremony was delayed until January 2024.↩︎