Lecture 5
Cornell University
INFO 3312/5312 - Spring 2024
February 6, 2024
# A tibble: 181 × 8
iso2c country year gdp_per_cap female_labor_pct life_exp pop income_level
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 AO Angola 2021 1927. 49.9 61.6 3.45e7 Lower middl…
2 AL Albania 2021 6377. 44.1 76.5 2.81e6 Upper middl…
3 AE United… 2021 44332. 17.7 78.7 9.37e6 High income
4 AR Argent… 2021 10651. 42.5 75.4 4.58e7 Upper middl…
5 AM Armenia 2021 4973. 52.7 72.0 2.79e6 Upper middl…
6 AU Austra… 2021 60697. 47.2 83.3 2.57e7 High income
7 AT Austria 2021 53518. 46.8 81.2 8.96e6 High income
8 AZ Azerba… 2021 5408. 49.9 69.4 1.01e7 Upper middl…
9 BI Burundi 2021 221. 51.9 61.7 1.26e7 Low income
10 BE Belgium 2021 51850. 46.8 81.9 1.16e7 High income
# ℹ 171 more rows
stat |
geom |
---|---|
stat_bin() |
geom_bar() , geom_freqpoly() , geom_histogram() |
stat_bin2d() |
geom_bin2d() |
stat_bindot() |
geom_dotplot() |
stat_binhex() |
geom_hex() |
stat_boxplot() |
geom_boxplot() |
stat_contour() |
geom_contour() |
stat_quantile() |
geom_quantile() |
stat_smooth() |
geom_smooth() |
stat_sum() |
geom_count() |
stat_boxplot()
What can you say about the distribution of average life expectancy from the following QQ plot?
Each scale is a function from a region in data space (the domain of the scale) to a region in aesthetic space (the range of the scale)
The axis or legend is the inverse function: it allows you to convert visual properties back to data
Every aesthetic in your plot is associated with exactly one scale:
scale_A_B()
scale
A
: Name of the primary aesthetic (e.g., color
, shape
, x
)B
: Name of the scale (e.g., continuous
, discrete
, brewer
)What will the x-axis label of the following plot say?
00:30
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
What happens if you pair a discrete variable with a continuous scale? What happens if you pair a continuous variable with a discrete scale? Answer in the context of the following plots.
Error: Discrete value supplied to continuous scale
01:00
When working with continuous data, the default is to map linearly from the data space onto the aesthetic space, but this scale can be transformed
Name | Function \(f(x)\) | Inverse \(f^{-1}(y)\) |
---|---|---|
asn | \(\tanh^{-1}(x)\) | \(\tanh(y)\) |
exp | \(e ^ x\) | \(\log(y)\) |
identity | \(x\) | \(y\) |
log | \(\log(x)\) | \(e ^ y\) |
log10 | \(\log_{10}(x)\) | \(10 ^ y\) |
log2 | \(\log_2(x)\) | \(2 ^ y\) |
logit | \(\log(\frac{x}{1 - x})\) | \(\frac{1}{1 + e(y)}\) |
pow10 | \(10^x\) | \(\log_{10}(y)\) |
probit | \(\Phi(x)\) | \(\Phi^{-1}(y)\) |
reciprocal | \(x^{-1}\) | \(y^{-1}\) |
reverse | \(-x\) | \(-y\) |
sqrt | \(x^{1/2}\) | \(y ^ 2\) |
ae-03
- Part 1ae-03
(repo name will be suffixed with your NetID).07:00
Guides are legends and axes:
Source: ggplot2: Elegant Graphics for Data Analysis, Chp 14.
Why do 50 and 90 not appear on the \(y\)-axis?
ggplot(world_bank, aes(x = gdp_per_cap, y = life_exp)) +
geom_point(alpha = 0.5) +
scale_y_continuous(
name = "Life expectancy at birth",
breaks = seq(from = 50, to = 90, by = 10),
limits = c(50, 90)
) +
scale_x_continuous(
name = "GDP per capita",
breaks = c(0, 5e04, 1e05),
labels = c("$0", "$50,000", "$100,000")
)
ggplot(world_bank, aes(x = gdp_per_cap, y = life_exp)) +
geom_point(alpha = 0.5) +
scale_y_continuous(
name = "Life expectancy at birth",
breaks = seq(from = 50, to = 90, by = 10),
limits = c(50, 90)
) +
scale_x_continuous(
name = "GDP per capita",
labels = label_dollar(scale_cut = cut_short_scale())
)
Scale type | Default guide type | Function |
---|---|---|
Continuous scales for color/fill aesthetics | colorbar | guide_colorbar() |
Binned scales for color/fill aesthetics | colorsteps | guide_colorsteps() |
Position scales (continuous, binned and discrete) | axis | guide_axis() |
Discrete scales (except position scales) | legend | guide_legend() |
Binned scales (except position/color/fill scales) | bins | guide_bins() |
ae-03
- Part 2Recreate this plot.
10:00
guide_*()
functions to customize the appearance of guides