library(tidyverse)
Telling a story of CO₂ emissions over time
Data
The dataset comes from the World Bank and contains information on CO₂ emissions, GDP per capita, and population for 215 countries from 1960 to 2019. The data is available in the file wdi_co2.csv
. We will import the data and perform some basic cleaning tasks.
# import raw data obtained using WDI API
<- read_csv("data/wdi_co2.csv")
wdi_co2_raw
# clean the data
<- wdi_co2_raw |>
wdi_clean # remove observations that are not actually countries
filter(region != "Aggregates") |>
# select relevant columns and rename to make it easier
select(iso2c, iso3c, country, year,
population = SP.POP.TOTL,
co2_emissions = EN.ATM.CO2E.PC,
gdp_per_cap = NY.GDP.PCAP.KD,
region, income
)
wdi_clean
# A tibble: 6,020 × 9
iso2c iso3c country year population co2_emissions gdp_per_cap region income
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 AF AFG Afghani… 2001 19688632 0.0553 NA South… Low i…
2 AF AFG Afghani… 1998 18493132 0.0713 NA South… Low i…
3 AF AFG Afghani… 2009 27385307 0.240 490. South… Low i…
4 AF AFG Afghani… 2000 19542982 0.0552 NA South… Low i…
5 AF AFG Afghani… 2012 30466479 0.335 571. South… Low i…
6 AF AFG Afghani… 1996 17106595 0.0823 NA South… Low i…
7 AF AFG Afghani… 1999 19262847 0.0582 NA South… Low i…
8 AF AFG Afghani… 2002 21000256 0.0668 344. South… Low i…
9 AF AFG Afghani… 2003 22645130 0.0730 347. South… Low i…
10 AF AFG Afghani… 2004 23553551 0.0549 339. South… Low i…
# ℹ 6,010 more rows
Next we rank countries based on their CO₂ emissions in 1995 and 2020, and then calculate the difference in rankings. We also create a variable that indicates if the rank changed by a lot (more than 30 positions). This requires substantial data cleaning and wrangling.
<- wdi_clean |>
co2_rankings # Get rid of smaller countries
filter(population > 200000) |>
# Only look at two years
filter(year %in% c(1995, 2020)) |>
# Get rid of all the rows that have missing values in co2_emissions
drop_na(co2_emissions) |>
# Look at each year individually and rank countries based on their emissions that year
mutate(
ranking = rank(co2_emissions),
.by = year
|>
) # Only select required columns
select(iso3c, country, year, region, income, ranking) |>
# pivot long
pivot_wider(names_from = year, names_prefix = "rank_", values_from = ranking) |>
# Find the difference in ranking between 2020 and 1995
mutate(rank_diff = rank_2020 - rank_1995) |>
# Remove all rows where there's a missing value in the rank_diff column
drop_na(rank_diff) |>
# Make an indicator variable that is true of the absolute value of the
# difference in rankings is greater than 30
mutate(big_change = if_else(abs(rank_diff) >= 30, TRUE, FALSE)) |>
# Make another indicator variable that indicates if the rank improved by a
# lot, worsened by a lot, or didn't change much.
mutate(better_big_change = case_when(
<= -30 ~ "Rank improved",
rank_diff >= 30 ~ "Rank worsened",
rank_diff .default = "Rank changed a little"
|>
)) # arrange rows by rank_diff for printing
arrange(rank_diff)
Here is what the data looked like originally:
slice_head(wdi_clean, n = 5)
# A tibble: 5 × 9
iso2c iso3c country year population co2_emissions gdp_per_cap region income
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 AF AFG Afghanis… 2001 19688632 0.0553 NA South… Low i…
2 AF AFG Afghanis… 1998 18493132 0.0713 NA South… Low i…
3 AF AFG Afghanis… 2009 27385307 0.240 490. South… Low i…
4 AF AFG Afghanis… 2000 19542982 0.0552 NA South… Low i…
5 AF AFG Afghanis… 2012 30466479 0.335 571. South… Low i…
And here is what it looks like after cleaning:
slice_head(co2_rankings, n = 5)
# A tibble: 5 × 9
iso3c country region income rank_1995 rank_2020 rank_diff big_change
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl>
1 ZWE Zimbabwe Sub-S… Lower… 75 39 -36 TRUE
2 DNK Denmark Europ… High … 160 127 -33 TRUE
3 SWE Sweden Europ… High … 132 100 -32 TRUE
4 SYR Syrian Arab Repu… Middl… Low i… 96 64 -32 TRUE
5 MLT Malta Middl… High … 128 99 -29 FALSE
# ℹ 1 more variable: better_big_change <chr>
Basic plot
Let’s create a basic plot that visualizes the changes in CO₂ emission rankings between 1995 and 2020.
Brainstorm improvements
Brainstorm methods to improve the readability and interpretability of the chart through annotations
Points to emphasize
- What is a “good” rank? What is a “bad” rank?
- What are the countries that have significantly improved or worsened their rank?
- What other aspects do you feel should be emphasized?
Methods for annotation
- Text labels
- Arrows/lines
- Rectangles
- Colors/fills
Add responses here.
Acknowledgments
- Exercise drawn from Data Visualization with R by Andrew Heiss.