library(tidyverse)
library(viridis)
library(scales)
options(scipen = 999) # avoid printing in scientific notation
Adjusting scales for World Bank indicators
Go to the course GitHub organization and locate the repo titled ae-03-YOUR_GITHUB_USERNAME
to get started.
This AE is due February 7 at 11:59pm.
Data: World economic measures
The World Bank publishes a rich and detailed set of socioeconomic indicators spanning several decades and dozens of topics. Here we focus on a few key indicators for the year 2021.
gdp_per_cap
- GDP per capita (current USD)pop
- Total populationlife_exp
- Life expectancy at birth, total (years)female_labor_pct
- Labor force, female (% of total labor force)income_level
- Classification of economies based on national income levels
The data is stored in wb-indicators.rds
. To import the data, use the read_rds()
function.
<- read_rds("data/wb-indicators.rds") world_bank
Part 1: Transforming axes
Is there a relationship between a country’s per capita GDP and life expectancy? Let’s explore this relationship using a scatterplot.
ggplot(data = world_bank, mapping = aes(x = gdp_per_cap, y = life_exp)) +
geom_point() +
labs(
title = "Countries with higher GDP tend to have higher life expectancy",
x = "GDP per capita (current USD)",
y = "Life expectancy at birth (years)"
)
Seems like there is an association, but the relationship is not linear. Let’s try a log transformation on the \(x\)-axis to see if that helps.
Your turn: Log-transform the \(x\)-axis by mutating the original column prior to graphing.
By default, log()
computes natural logarithms (base-\(e\)). To compute base-10 logarithms, use log10()
.
# add code here
Your turn: Now log-transform the \(x\)-axis by using the original per capita GDP measure and an appropriate scale_x_*()
function.
# add code here
Your turn: Which is more interpretable, and why?
Add response here
Part 2: Customize scales
Let’s consider the relationship between female labor participation and per capita GDP. We’ll use the income_level
variable to color the points and provide context on the overall wealth of the countries.1
Step 1: Base plot
First, let’s generate a color-coded scatterplot with a single smoothing line.
ggplot(data = world_bank, mapping = aes(x = female_labor_pct, y = gdp_per_cap)) +
geom_point(mapping = aes(color = income_level)) +
geom_smooth(se = FALSE)
Step 2: Your turn
Now, let’s modify the scales to make the chart more readable. Log-transform the \(y\)-axis and format the labels so they are explicitly identified as percentages and currency.
# add code here
Step 3: Your turn
Add human-readable labels for the title, axes, and legend.
# add code here
Step 4: Your turn
Use the viridis color palette for income_level
.
The bright yellow at the end of the palette is hard on the eyes. You can condense the hue at which the color map ends using the end
argument to the appropriate scale_color_*()
function.
# add code here
Step 5: Your turn
Double-encode the income_level
variable by using both color and shape to represent the same variable. Condense the guides so you use a single legend.
# add code here
Step 6: Your turn
It’s annoying that the order of the values in the legend are opposite from how the income levels are ordered in the chart. Reverse the order of the values in the legend so they correspond to the ordering on the \(y\)-axis.
# add code here
Footnotes
Note that the income level is based on the GNI per capita, which is strongly correlated with GDP per capita, but not exactly the same.↩︎