Welcome to INFO 3312/5312

Lecture 1

Dr. Benjamin Soltoff

Cornell University
INFO 3312/5312 - Spring 2025

January 21, 2025

Agenda

Agenda

  • Course details
  • Introductions
  • Course components
  • Why we visualize

Course details

Timetable

  • Lectures
    • Tuesdays 1:25-2:40pm
    • Thursdays 1:25-2:40pm
  • Course is now 3 credits - no Friday lab section

Students on the waitlist

  • INFO 3312/5312 enrollment is restricted to IS/ISST majors and IS MPS students
  • If you are not an IS/ISST major (or are still in the process of affiliating), join the waitlist through Student Center
  • PINs distributed on a rolling basis
  • We currently have:
    • INFO 3312: 0 seats available and 27 on the waitlist
    • INFO 5312: 0 seats available and 9 on the waitlist

Themes: what, why, and how

  • What: the communication (e.g. plot, table, report)
    • Specific types of visualizations for a particular purpose (e.g., maps for spatial data, Sankey diagrams for proportions, etc.)
    • Tooling to produce them (e.g., specific R packages)
  • How: the process
    • Start with a design (sketch + pseudo code)
    • Pre-process data (e.g., wrangle, reshape, join, etc.)
    • Map data to aesthetics
    • Make visual encoding decisions (e.g., address accessibility concerns)
    • Post-process for visual appeal and annotation
  • Why: the theory
    • Tie together “how” and “what” through the grammar of graphics
    • Extend to underlying theory of cognition and information processing

Introductions

Meet the instructor

Dr. Benjamin Soltoff

Lecturer in Information Science

Gates Hall 216

Headshot of Dr. Benjamin Soltoff

Meet the course team

  • Hyunchul L.
  • Shuqian L.
  • Goretti M.
  • Angela Y.

Meet each other!

Activity

  • Form a small group (3-4 individuals) with people sitting around you

  • First, introduce yourselves to each other:

    • Your name - Prof/Dr. Soltoff
    • Your major - Political science
    • The last movie you saw - Cheaper by the Dozen
    • What you hope to get out of this class - A paycheck
  • Start with bad graphs – Share your examples of “bad” graphs and why you think they’re bad.

  • Then, share good graphs – Same deal, share your examples of “good” graphs and why you think they’re good.

  • Finally, choose the one plot from your group that you think is most striking, either because it’s bad or because it’s good, and have one team member share the graph on this discussion post.

Course components

Homepage

https://info3312.infosci.cornell.edu/

  • All course materials
  • Links to Canvas, GitHub, RStudio Workbench, etc.
  • Let’s take a tour!

Course toolkit

All linked from the course website:

Important

Make sure you can access RStudio before Friday.

Activities: Prepare, Participate, Practice, Perform

  • Prepare: Introduce new content and prepare for lectures by completing the readings

  • Participate: Attend and actively participate in lectures and labs, office hours, team meetings

  • Practice: Practice applying visualization techniques with application exercises during lecture, graded for completion

  • Perform: Put together what you’ve learned to analyze real-world data

    • Homework assignments x 6-ish (individual)
    • Team projects (2)

Teams

  • Team assignments
    • Assigned by course staff
    • Peer evaluation after completion
  • Expectations and roles
    • Everyone is expected to contribute equal effort
    • Everyone is expected to understand all code turned in
    • Individual contribution evaluated by peer evaluation, commits, etc.

Grading

Category Percentage
Homework 40%
Project 1 20%
Project 2 30%
Application exercises 10%

See course syllabus for how the final letter grade will be determined.

INFO 5312

Additional expectations:

  • INFO 5312 homework will at times be graded against a more stringent rubric
  • INFO 5312 students will be grouped together for all projects

15 minute rule

;document.getElementById("tweet-36874").innerHTML = tweet["html"];

Support

  • Attend office hours
  • Ask and answer questions on the discussion forum
  • Reserve email for questions on personal matters and/or grades
  • Read the course support page

Diversity + inclusion

  • I want you to feel like you belong in this class and are respected
  • We are committed to full inclusion in education for all persons
  • If you feel that we have failed these goals, please either let us know or report it, and we will address the issue

Accessibility

I want this course to be accessible to students with all abilities. Please feel free to let me know if there are circumstances affecting your ability to participate in class.

Course policies

 

As long as you meet
the prereqs

Prerequisites

  • INFO 2950 or INFO 5001
  • Prior experience with R and Git is required

Ideally you took INFO 2950 or 5001 with me.

If not, you need a firm understanding of R (including {tidyverse}) and Git workflows.

Late work, waivers, regrades policy

  • We have policies!
  • Read about them on the course syllabus and refer back to them when you need it

Collaboration policy

  • Only work that is clearly assigned as team work should be completed collaboratively.

  • Homeworks must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.

Sharing / reusing code policy

  • We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted

  • Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source.

  • All code must be written by you, the human being.

Generative AI

  • Use generative AI to facilitate, rather than hinder, learning

  • GAI tools for reference purposes

    How do I make a scatterplot using ggplot2 in R?

  • 🤔 GAI tools for writing my code/analysis

    • You may use GAI tools to assist in writing code in this class

    • You may not make use of the technology as a substitute for critical thinking

    • I reserve the right to orally assess any student on their submissions to verify they meet the learning objectives for the assignment

  • GAI tools for narrative

  • You are ultimately responsible for the work you turn in; it should reflect your understanding of the course content

Academic integrity

  1. A student shall in no way misrepresent his or her work.
  2. A student shall in no way fraudulently or unfairly advance his or her academic position.
  3. A student shall refuse to be a party to another student’s failure to maintain academic integrity.
  4. A student shall not in any other manner violate the principle of academic integrity.

Most importantly!

Ask if you’re not sure if something violates a policy!

Why do we visualize?

Why do we visualize?

  1. Discover patterns that may not be obvious from numerical summaries

Anscombe’s quartet

   set  x     y
1    I 10  8.04
2    I  8  6.95
3    I 13  7.58
4    I  9  8.81
5    I 11  8.33
6    I 14  9.96
7    I  6  7.24
8    I  4  4.26
9    I 12 10.84
10   I  7  4.82
11   I  5  5.68
12  II 10  9.14
13  II  8  8.14
14  II 13  8.74
15  II  9  8.77

Summary statistics for Anscombe’s quartet

# A tibble: 4 × 6
  set   mean_x mean_y  sd_x  sd_y     r
  <fct>  <dbl>  <dbl> <dbl> <dbl> <dbl>
1 I          9   7.50  3.32  2.03 0.816
2 II         9   7.50  3.32  2.03 0.816
3 III        9   7.5   3.32  2.03 0.816
4 IV         9   7.50  3.32  2.03 0.817

Scatterplots for Anscombe’s quartet

Just show me the data!

ID N Xmean Ymean σX σY R
1 142 54.26610 47.83472 16.76982 26.93974 -0.06412835
2 142 54.26873 47.83082 16.76924 26.93573 -0.06858639
3 142 54.26732 47.83772 16.76001 26.93004 -0.06834336
4 142 54.26327 47.83225 16.76514 26.93540 -0.06447185
5 142 54.26030 47.83983 16.76774 26.93019 -0.06034144
6 142 54.26144 47.83025 16.76590 26.93988 -0.06171484
7 142 54.26881 47.83545 16.76670 26.94000 -0.06850422
8 142 54.26785 47.83590 16.76676 26.93610 -0.06897974
9 142 54.26588 47.83150 16.76885 26.93861 -0.06860921
10 142 54.26734 47.83955 16.76896 26.93027 -0.06296110
11 142 54.26993 47.83699 16.76996 26.93768 -0.06944557
12 142 54.26692 47.83160 16.77000 26.93790 -0.06657523
13 142 54.26015 47.83972 16.76996 26.93000 -0.06558334

Oh no

Raw data is not enough

Why do we visualize?

  1. Discover patterns that may not be obvious from numerical summaries

  2. Convey information in a way that is otherwise difficult/impossible to convey

National risk index

National risk index

Instructions

With your peers, use the National Risk Index to answer the following questions:

  1. Which areas of the country are at high risk of climate change?
  2. Which areas of the country are at high risk of heat waves?
  3. Which regions of the country are socially vulnerable?
  4. What specific hazard types are most endangering to Ithaca, NY?

Make sure some students access the map using their phone and others use their computers.

Wrap up

This week’s tasks