Project 1

Modified

March 10, 2025

Important dates

  • Proposal for peer review: due Thu, Feb 13th at 11:59pm
  • Peer review feedback: due Fri, Feb 14th 💘 at 11:59pm
  • Revised proposal for instructor review: due Thu, Feb 20th at 11:59pm
  • Report and presentation: due Thu Mar 6th at 1:25pm
Important

The details will be updated as the project date approaches.

Introduction

TL;DR: Tell a story using data visualizations.

You will use a dataset from the TidyTuesday project to apply your data visualization skills to tell a story. You can choose any dataset released since 2024 as part of this project.

Your task for the project is to come up with two questions to answer, answer them with data visualizations, and write up and present your method and findings.

Deliverables

The primary deliverables for the project are:

  1. A project proposal.
  2. A report of your findings.
  3. An oral presentation with slides.

There will be additional submissions throughout the semester to facilitate completion of the final product and presentation.

Organization of files in the repository

The files in your repository are organized as a Quarto Project. This enables easy rendering of all Quarto documents within the project folder with a single command, as well as the ability to share YAML configurations across multiple documents. To render the project go to the Build tab in RStudio, and click on “Render”.

Teams

Projects will be completed in teams of 3-5 students. Every team member should be involved in all aspects of planning and executing the project. Each team member should make an equal contribution to all parts of the project. The scope of your project is based on the number of contributing team members on your project. If you have 4 contributing team members, we will expect a larger project than a team of 3 contributing team members.

Some lab section meetings will be devoted to work on the project, so all teams will be formed within each lab section (i.e. only students in your lab section can be your team members). The course staff will assign students to teams. To facilitate this process, we will provide a short survey identifying study and communication habits. Once teams are assigned, they cannot be changed.

Team conflicts

Conflict is a healthy part of any team relationship. If your team doesn’t have conflict, then your team members are likely not communicating their issues with each other. Use your team contract (written at the beginning of the project) to help keep your team dynamic healthy.

When you have conflict, you should follow this procedure:

  1. Refer to the team contract and follow it to address the conflict.

  2. If you resolve the conflict without issue, great! Otherwise, update the team contract and try to resolve the conflict yourselves.

  3. If your team is unable to resolve your conflict, please contact soltoffbc@cornell.edu and explain your situation.

    We’ll ask to meet with all the group members and figure out how we can work together to move forward.

  4. Please do not avoid confrontation if you have conflict. If there’s a conflict, the best way to handle it is to bring it into the open and address it.

Project grade adjustments

Remember, do not do the work for a slacking team member. This only rewards their bad behavior. Simply leave their work unfinished. (We will not increase your grade during adjustments for doing more than your fair share.)

Your team will initially receive a final grade assuming that all team members contributed to your project. If you have a 5-person team, but only 3 persons contributed, your team will likely receive a lower grade initially because only 3 persons worth of effort exists for a 5-person project. About a week after the initial project grades are released, adjustments will be made to each individual team member’s group project grade.

We use your project’s Git history (to view the contributions of each team member) and the peer evaluations to adjust each team members’ grades. Both adjustments to increase or decrease your grade are possible based on each individual’s contributions.

For example, if you have a 4-person team, but only 3 contributing members, the 3 contributing members may have their grades increased to reflect the effort of only 3 contributing members. The non-contributing member will likely have their grade decreased significantly.

Warning

I am serious about every member of the team equitably contributing to the project. Students who fail to contribute equitably may receive up to a 100% deduction on their project grade.

Please be patient for the grade adjustments. The adjustments take time to do them fairly. Please know that the instructor handles this entire process himself, and I take it very seriously. If you think your initial group project grade is unfair, please wait for your grade adjustment before you contact us.

The slacking team member

Please do not cover for a slacking/freeloading team member. Please do not do their work for them! This only rewards their bad behavior. Simply leave their work unfinished. (We will not increase your grade during adjustments for doing more than your fair share.)

Remember, we have your Git history. We can see who contributes to the project and who doesn’t. If a team member rarely commits to Git and only makes very small commits, we can see that they did not contribute their fair share.

All students should make their project contributions through their own GitHub account. Do not commit changes to the repository from another team member’s GitHub account. Your Git history should reflect your individual contributions to the project.

Deliverables

Proposal

Your proposal should include:

  • A brief description of your dataset including its provenance, dimensions, etc. (Make sure to load the data and use inline code for some of this information.)
  • The reason why you chose this dataset.
  • The two questions you want to answer.
  • A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).
Choosing a dataset

The dataset you choose should have some numerical and some categorical variables or you should be able to recode some of the existing variables so that you can ultimately have both numerical and categorical variables to work with.

It is also very important that the dataset you choose allows for two distinct questions to be asked and answered using a not-completely-overlapping set of variables, i.e., Question 1 requires the use of variables x, y, and z and Question 2 requires the use of variables a, b, c, and d or x, a, and b. Some shared variables are ok, but the set of variables should not be completely overlapping, i.e., Question 2 can’t also require the use of variables x, y, and z.

Framing your questions

Each of the two questions you come up with should involve more than two variables in order to answer. You should phrase them in a way that is within the scope of inference of your data. For example, if you have an observational dataset, you shouldn’t phrase your question in a causal way.

Peer review

Reviewer tasks

Critically reviewing others’ work is a crucial part of the scientific process, and INFO 3312/5312 is no exception. You will be assigned two teams to review. This feedback is intended to help you create a high quality final project, as well as give you experience reading and constructively critiquing the work of others.

The peer review assignments are as follows:

Your team name To review 1 To review 2
INFO-3312
Dank Bop Dank Vibe (repo, site) Dank Tea (repo, site)
Dank Camp Dank Bop (repo, site) Dank Vibe (repo, site)
Dank Cap Dank Camp (repo, site) Dank Bop (repo, site)
Dank Cheugy Dank Cap (repo, site) Dank Camp (repo, site)
Dank Clapback Dank Cheugy (repo, site) Dank Cap (repo, site)
Dank Extra Dank Clapback (repo, site) Dank Cheugy (repo, site)
Dank Facts Dank Extra (repo, site) Dank Clapback (repo, site)
Dank Stan Dank Facts (repo, site) Dank Extra (repo, site)
Dank Tea Dank Stan (repo, site) Dank Facts (repo, site)
Dank Vibe Dank Tea (repo, site) Dank Stan (repo, site)
INFO-5312
Giving Cap Giving Vibe (repo, site) Giving Stan (repo, site)
Giving Clapback Giving Cap (repo, site) Giving Vibe (repo, site)
Giving Stan Giving Clapback (repo, site) Giving Cap (repo, site)
Giving Vibe Giving Stan (repo, site) Giving Clapback (repo, site)

Teams will develop the review together, with discussion among all team members, but only one team member will submit it as an issue on the project repo. To do so, go to the Issues tab, click on the green New issue button on the top right, and then click on the green Get started button for the issue template titled Peer review.

This will start a new issue with a peer review form that you can fill out. You’re expected to be thorough in your review, but this doesn’t necessarily require lengthy responses.

Remember, your goal is to help the team whose project proposal you’re reviewing. The team will not lose points because of issues you point out, as long as they address them before I review their proposals. You should be critical, but respectful in your review. Peer reviews will be evaluated on the quality of the feedback left for the other teams.

Reviewee tasks

Once you receive feedback from your peers, you should address them. You should do this by directly updating your proposal or making any other updates to your repo as needed. You can do these updates all in one commit or you can spread it across multiple commits.

Regardless, in the last commit that addresses the peer review comments, you should use a keyword in your commit message that will close the peer review issues. These words are close, closes, closed, fix, fixes, fixed, resolve, resolves, and resolved and they need to be followed by the issue number (which you can find next to the issue title). So, your commit message can say something like “Finished updates based on peer review, fixes #1”.

Report

Your report should consist of three parts:

  1. Introduction (1-2 paragraphs): Brief introduction to the dataset. You may repeat some of the information about the dataset provided in the introduction to the dataset on the TidyTuesday repository, paraphrasing on your own terms. Imagine that your project is a standalone document and the evaluator has no prior knowledge of the dataset.

  2. Question 1: The title should relate to the question you’re answering.

    • Introduction (1-2 paragraphs): Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.

    • Approach (1-2 paragraphs): Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.

    • Analysis (2-3 code blocks, 2 figures, text/code comments as needed): In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with {ggplot2}. Do not use base R or lattice plotting functions.

    • Discussion (1-3 paragraphs): In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.

  3. Question 2: Same structure outlined for Question 1, but for your new question. And the title should relate to the question you’re answering.

We encourage you to be concise. A paragraph should typically not be longer than 5 sentences.

You are not required to perform any statistical tests in this project, but you may do so if you find it helpful to answer your question.

Presentation

Your presentation should generally follow the same structure as your report. Each team will have 5 minutes for their presentation, and each team member must speak (roughly equally) during this time. Your presentation will be created using Quarto, which allows you to write slides using the same reproducible document structure you’re used to.

Your presentation should be a concise oral presentation that identifies and answers the questions you ask in your report. Your presentation will be created using Quarto. Use the presentation to tell your story. Your report is about not only the story, but your process for writing the story. The presentation is not the same thing - just tell your story, and use data to support your arguments.

Distinguishing features from the report include, but are not limited to:
  • You don’t need to explain your approach to the plots.
  • We should not see any code in your slides.
  • The plots should be designed for a slide presentation. Make appropriate adjustments to any plots you created for the report (e.g. improved font size, adding/removing annotations), or design completely different plots for the presentation.

As a starting point, I recommend 1 slide for introduction, 2 slides for Question 1, and 2 slides for Question 2. You can imagine spending roughly one minute on each slide. You should feel free to have more (or fewer) slides. Your evaluation will be based on your content, professionalism (including sticking to time), and your performance during the Q&A (question and answer).

Website

Each of your projects will have a website that looks like this. You are not expected to change the styling of the website, but if you want to, you’ll need to edit the _quarto.yml file in your repo. Feel free to Google your way around it or ask on the discussion forum/office hours!

Repo organization

The following folders and files in your project repository:

  • /data/*: Your dataset

    • /data/*.csv: Your dataset in CSV format

    • /data/README.md: Metadata about your dataset including information on provenance, codebook, etc.1

  • index.qmd: Your project report

  • proposal.qmd - Your project proposal

  • presentation.qmd - Your project presentation

  • _quarto.yml: Setup file for project website

Overall grading

Total 100 pts
Proposal 10 pts
Presentation 35 pts
    Instructor 30 pts
    Peers 5 pts
Report 35 pts
Reproducibility, style, and organization 10 pts
Between team peer evaluation 10 pts

Evaluation criteria

Proposal

Category Less developed projects Typical projects
Dataset

Dataset is missing from the data folder.

Dataset lacks a codebook.

Dataset is in the data folder.

Codebook for the dataset is included as the README.md file in the data folder

Write-up The write-up is missing one or more required components. All required components are included in the write-up.
Workflow Peer review issues are left open or do not have associated commits which respond to the feedback. Peer review issues are closed via a commit message.
Teamwork One or more team members do not have commits in the repo. All team members contribute to the repo via commits.

Presentation

Teaching team

Category Less developed projects Typical projects More developed projects
Time management Only some members speak during the presentation. Team does not manage time wisely (e.g. runs out of time, finishes early without adequately presenting their project). All members speak during the presentation. Team does not exceed the five minute limit. Team maximally uses their five minutes. Clearly communicates their objectives and outcomes from the project.
Professionalism Presentation is slapped together or haphazard. Seems like independent pieces of work patched together. Presentation appears to be rehearsed. There is cohesion to the presentation. All elements of typical projects + everyone says something meaningful about the project.
Slides

Slides contain excessive text and/or content.

Team relies too heavily on slides for their presentation.

Slides are well-organized.

Slides are used as a tool to assist the oral presentation.

All elements of typical projects + graphics and tables follow best-practices (e.g. all text is legible, appropriate use of color and legends).

Slides are not crammed full of text.

Creativity/originality

Project meets the minimum requirements but not much else.

Project is incomplete or does not meet the team’s objectives.

Project appears carefully thought out. Time and effort seem to have gone into the planning and implementation of the project. All elements of typical projects + project goes above and beyond the minimum requirements.
Content

Questions are not clearly stated.

Questions are unanswered or not supported by the data visualizations.

Data visualizations are poor quality and do not adhere to practices as taught in class.

Questions are clearly articulated.

Questions are answered using supporting data visualizations.

Data visualizations follow good practices.

Conclusions made based on the visualizations are justified.

All elements of typical projects + data visualizations are of exceptional quality.

Conclusions are justified and limitations are carefully considered and articulated.

Peers

  • Content: Are the questions clearly articulated and is the data being used relevant?

  • Content: Did the team use appropriate visualizations and did they interpret them accurately?

  • Creativity and critical thought: Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project?

  • Slides: Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

  • Professionalism: How well did the team present? Does the presentation appear to be well practiced? Are they reading off of a script? Did everyone get a chance to say something meaningful about the project?

Report

Category Less developed projects Typical projects More developed projects
Introduction Explanation of the question and dataset is unclear or missing. Fails to describe relevant variables. Provides a clear explanation of the question and the dataset used to answer the question, including a description of all relevant variables in the dataset. All expectations of typical projects + clearly describes why the question is important and what is at stake in the results of the analysis. Even if the reader doesn’t know much about the subject, they know why they care about the results of your analysis.
Q1: Justification of approach The chosen analysis approach is inappropriate. Visualizations are insufficiently explained and justified. The chosen analysis approach and visualizations are clearly explained and justified. All elements of typical projects + shows careful consideration for the most effective chart designs. Goes beyond single layer simplistic charts where appropriate to effectively leverage the grammar of graphics for designing complex statistical charts.
Q1: Code Code is broken or does not work correctly. Code is hard to read for a human being and lacks stylistic consistency. Code is functional, easy to read, and properly formatted. All elements of typical projects + code is optimized using best practices and properly documented.
Q1: Visualization Visualizations are inappropriate, hard to read, or lack appropriate labeling. The visualizations are appropriate, follow best practices as taught in class, are easy to read, and properly labeled. All elements of typical projects + employ custom visual designs and/or theming. Visualizations are distinctive to the project/group.
Q1: Discussion Discussion of results is underdeveloped. Lacks a substantial connection to the visualizations. Discussion of results is clear and correct, and it has some depth without begin excessively long. All elements of typical projects + identifies clear insights derived from the visualizations. Analysis demonstrates teams understand not just how to create charts but also effectively interpret them.
Q2: Justification of approach The chosen analysis approach is inappropriate. Visualizations are insufficiently explained and justified. The chosen analysis approach and visualizations are clearly explained and justified. All elements of typical projects + shows careful consideration for the most effective chart designs. Goes beyond single layer simplistic charts where appropriate to effectively leverage the grammar of graphics for designing complex statistical charts.
Q2: Code Code is broken or does not work correctly. Code is hard to read for a human being and lacks stylistic consistency. Code is functional, easy to read, and properly formatted. All elements of typical projects + code is optimized using best practices and properly documented.
Q2: Visualization Visualizations are inappropriate, hard to read, or lack appropriate labeling. The visualizations are appropriate, follow best practices as taught in class, are easy to read, and properly labeled. All elements of typical projects + employ custom visual designs and/or theming. Visualizations are distinctive to the project/group.
Q2: Discussion Discussion of results is underdeveloped. Lacks a substantial connection to the visualizations. Discussion of results is clear and correct, and it has some depth without begin excessively long. All elements of typical projects + identifies clear insights derived from the visualizations. Analysis demonstrates teams understand not just how to create charts but also effectively interpret them.

Reproducibility, style, and organization

Category Less developed projects Typical projects
Reproducibility Required files are missing. Quarto files do not render successfully (except for if a package needs to be installed). All required files are provided. Quarto files render without issues and reproduce the necessary outputs.
Data documentation Codebook is missing. No local copies of data files. Data is in the data folder, with a codebook in the README, and is loaded from this folder in presentation and report.
File readability Documents lack a clear structure. There are extraneous materials in the repo and/or files are not clearly organized. Documents (source code files such as Quarto files or R scripts) are well structured and easy to follow. No extraneous materials.
Issues Issues have been left open, or are closed mostly without specific commits addressing them. All issues are closed, mostly with specific commits addressing them.

Between team peer evaluation

Peer reviews will be graded on the extent to which they comprehensively and constructively address the components of the reviewee’s team’s proposal.

  • 0 points: No peer reviews

  • 2 point: Only one peer review issue open, feedback provided is not constructive or actionable

  • 4 points: Both peer review issues open, feedback provided is not constructive or actionable

  • 6 points: Both peer review issues open, feedback provided is not sufficiently thorough

  • 8 points: Both peer review issues open, one of the reviews is not sufficiently thorough

  • 10 points: Both peer review issues open, both reviews are constructive, actionable, and sufficiently thorough

Late work policy

There is no late work accepted on this project. Be sure to turn in your work early to avoid any technological mishaps.

Acknowledgments

Footnotes

  1. It is ok for you to repeat some information from the TidyTuesday repository but, but make sure appropriately attribute it here.↩︎