Adjusting scales for World Bank indicators

Application exercise
Modified

February 7, 2024

Important

Go to the course GitHub organization and locate the repo titled ae-03-YOUR_GITHUB_USERNAME to get started.

This AE is due February 7 at 11:59pm.

library(tidyverse)
library(viridis)
library(scales)

options(scipen = 999)   # avoid printing in scientific notation

Data: World economic measures

The World Bank publishes a rich and detailed set of socioeconomic indicators spanning several decades and dozens of topics. Here we focus on a few key indicators for the year 2021.

The data is stored in wb-indicators.rds. To import the data, use the read_rds() function.

world_bank <- read_rds("data/wb-indicators.rds")

Part 1: Transforming axes

Is there a relationship between a country’s per capita GDP and life expectancy? Let’s explore this relationship using a scatterplot.

ggplot(data = world_bank, mapping = aes(x = gdp_per_cap, y = life_exp)) +
  geom_point() +
  labs(
    title = "Countries with higher GDP tend to have higher life expectancy",
    x = "GDP per capita (current USD)",
    y = "Life expectancy at birth (years)"
  )

Seems like there is an association, but the relationship is not linear. Let’s try a log transformation on the \(x\)-axis to see if that helps.

Your turn: Log-transform the \(x\)-axis by mutating the original column prior to graphing.

Note

By default, log() computes natural logarithms (base-\(e\)). To compute base-10 logarithms, use log10().

# add code here

Your turn: Now log-transform the \(x\)-axis by using the original per capita GDP measure and an appropriate scale_x_*() function.

# add code here

Your turn: Which is more interpretable, and why?

Add response here

Part 2: Customize scales

Let’s consider the relationship between female labor participation and per capita GDP. We’ll use the income_level variable to color the points and provide context on the overall wealth of the countries.1

Step 1: Base plot

First, let’s generate a color-coded scatterplot with a single smoothing line.

ggplot(data = world_bank, mapping = aes(x = female_labor_pct, y = gdp_per_cap)) +
  geom_point(mapping = aes(color = income_level)) +
  geom_smooth(se = FALSE)

Step 2: Your turn

Now, let’s modify the scales to make the chart more readable. Log-transform the \(y\)-axis and format the labels so they are explicitly identified as percentages and currency.

# add code here

Step 3: Your turn

Add human-readable labels for the title, axes, and legend.

# add code here

Step 4: Your turn

Use the viridis color palette for income_level.

Tip

The bright yellow at the end of the palette is hard on the eyes. You can condense the hue at which the color map ends using the end argument to the appropriate scale_color_*() function.

# add code here

Step 5: Your turn

Double-encode the income_level variable by using both color and shape to represent the same variable. Condense the guides so you use a single legend.

# add code here

Step 6: Your turn

It’s annoying that the order of the values in the legend are opposite from how the income levels are ordered in the chart. Reverse the order of the values in the legend so they correspond to the ordering on the \(y\)-axis.

# add code here

Footnotes

  1. Note that the income level is based on the GNI per capita, which is strongly correlated with GDP per capita, but not exactly the same.↩︎