Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
bill_length_mm | 87 | 75, 100 | <0.001 |
1 CI = Confidence Interval |
Lecture 22
Cornell University
INFO 3312/5312 - Spring 2024
April 18, 2024
Answer to the “why” question
Models that lend themselves naturally to interpretation
Note
These methods apply to all forms of generalized linear models (GLMs), including linear regression, logistic regression, and other forms of outcomes.
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
bill_length_mm | 87 | 75, 100 | <0.001 |
1 CI = Confidence Interval |
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
bill_length_mm | 73 | 52, 94 | <0.001 |
species | |||
Adelie | — | — | |
Chinstrap | 1,146 | -282, 2,575 | 0.12 |
Gentoo | 55 | -1,165, 1,274 | >0.9 |
flipper_length_mm | 27 | 21, 34 | <0.001 |
bill_length_mm * species | |||
bill_length_mm * Chinstrap | -41 | -73, -9.4 | 0.011 |
bill_length_mm * Gentoo | -1.2 | -30, 27 | >0.9 |
1 CI = Confidence Interval |
Characteristic | log(OR)1 | 95% CI1 | p-value |
---|---|---|---|
publication_year | 0.04 | 0.03, 0.05 | <0.001 |
1 OR = Odds Ratio, CI = Confidence Interval |
marginaleffects is an R (and Python) package that provides a simple way to interpret results from a range of regression models
Implements a standardized interface for nearly 100 model types
lm_mod <- linear_reg() |>
fit(body_mass_g ~ bill_length_mm + species, data = penguins) |>
extract_fit_engine()
pred <- predictions(lm_mod)
head(pred)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3729 30.6 122 <0.001 Inf 3669 3789
3765 30.9 122 <0.001 Inf 3705 3826
3839 32.3 119 <0.001 Inf 3775 3902
3509 33.8 104 <0.001 Inf 3443 3576
3747 30.7 122 <0.001 Inf 3687 3807
3711 30.6 121 <0.001 Inf 3651 3770
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, body_mass_g, bill_length_mm, species
Type: response
glm_mod <- logistic_reg() |>
fit(artist_gender ~ publication_year, data = artist_subset) |>
extract_fit_engine()
pred <- predictions(glm_mod, type = "response")
head(pred)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.0696 0.00640 10.9 <0.001 89.2 0.0571 0.0821
0.0823 0.00638 12.9 <0.001 124.3 0.0698 0.0948
0.0972 0.00662 14.7 <0.001 159.7 0.0842 0.1101
0.1107 0.00725 15.3 <0.001 172.5 0.0965 0.1249
0.1259 0.00847 14.9 <0.001 163.4 0.1093 0.1425
0.1428 0.01040 13.7 <0.001 140.2 0.1224 0.1632
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, artist_gender, publication_year
Type: response
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-2.59 0.0988 -26.2 <0.001 502.0 -2.79 -2.40
-2.41 0.0844 -28.6 <0.001 594.1 -2.58 -2.25
-2.23 0.0754 -29.5 <0.001 634.9 -2.38 -2.08
-2.08 0.0736 -28.3 <0.001 582.6 -2.23 -1.94
-1.94 0.0770 -25.2 <0.001 461.8 -2.09 -1.79
-1.79 0.0849 -21.1 <0.001 326.0 -1.96 -1.63
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, artist_gender, publication_year
Type: link
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
3729 30.6 121.8 <0.001 Inf 3669 3789
3765 30.9 121.7 <0.001 Inf 3705 3826
3839 32.3 119.0 <0.001 Inf 3775 3902
3509 33.8 103.9 <0.001 Inf 3443 3576
3747 30.7 121.9 <0.001 Inf 3687 3807
--- 332 rows omitted. See ?avg_predictions and ?print.marginaleffects ---
4370 66.1 66.1 <0.001 Inf 4240 4500
3245 58.5 55.5 <0.001 Inf 3131 3360
3803 45.8 83.0 <0.001 Inf 3713 3893
3913 47.5 82.4 <0.001 Inf 3820 4006
3858 46.5 83.0 <0.001 Inf 3767 3949
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, body_mass_g, bill_length_mm, species
Type: response
predictions(lm_mod, newdata = datagrid(
species = "Adelie",
bill_length_mm = c(30, 40, 50, 60),
model = lm_mod
))
species bill_length_mm Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Adelie 30 2897 67.8 42.7 <0.001 Inf 2764 3030
Adelie 40 3811 31.7 120.4 <0.001 Inf 3749 3873
Adelie 50 4726 83.0 56.9 <0.001 Inf 4563 4888
Adelie 60 5640 149.2 37.8 <0.001 Inf 5347 5932
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, species, bill_length_mm, body_mass_g
Type: response
predictions(lm_mod, newdata = datagrid(
FUN_factor = unique, FUN_numeric = median, model = lm_mod
))
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % bill_length_mm species
4218 49.5 85.2 <0.001 Inf 4121 4315 44.5 Adelie
4797 39.8 120.4 <0.001 Inf 4719 4875 44.5 Gentoo
3332 54.6 61.0 <0.001 Inf 3225 3439 44.5 Chinstrap
Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, bill_length_mm, species, body_mass_g
Type: response
Adjusted prediction at the mean - predicted outcome when all regressors are held at their mean (or mode)
gender_year_national_fit <- logistic_reg() |>
fit(artist_gender ~ publication_year * artist_nationality, data = artist_subset) |>
extract_fit_engine()
avg_predictions(gender_year_national_fit)
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.066 <0.001 443.2 0.0541 0.0802
Columns: estimate, p.value, s.value, conf.low, conf.high
Type: invlink(link)
artist_nationality Estimate Pr(>|z|) S 2.5 % 97.5 %
American 0.1661 <0.001 118.9 0.1342 0.2038
French 0.0289 <0.001 152.2 0.0181 0.0459
Other 0.0729 <0.001 192.2 0.0546 0.0968
British 0.0820 <0.001 55.9 0.0488 0.1346
Columns: artist_nationality, estimate, p.value, s.value, conf.low, conf.high
Type: invlink(link)
plot_predictions(gender_year_national_fit,
condition = c("publication_year", "artist_nationality")
) +
scale_y_continuous(labels = label_percent()) +
scale_color_OkabeIto(aesthetics = c("color", "fill"), guide = "none") +
facet_wrap(facets = vars(artist_nationality)) +
labs(
x = "Publication year",
y = "Predicted probability artist is female"
) +
theme(legend.position = "top")
Functions of two or more predictions
Comparisons are useful for
# hypothetical artist
artist <- tibble(
publication_year = 1950,
artist_nationality = "American"
)
comparisons(gender_year_national_fit, newdata = artist)
Term Contrast Estimate Std. Error z Pr(>|z|) S
artist_nationality British - American -0.01286 0.029251 -0.44 0.660 0.6
artist_nationality French - American -0.03818 0.017803 -2.14 0.032 5.0
artist_nationality Other - American -0.02687 0.019224 -1.40 0.162 2.6
publication_year +1 0.00134 0.000225 5.96 <0.001 28.6
2.5 % 97.5 % publication_year artist_nationality
-0.070190 0.04447 1950 American
-0.073077 -0.00329 1950 American
-0.064543 0.01081 1950 American
0.000899 0.00178 1950 American
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, publication_year, artist_nationality, artist_gender
Type: response
Term Contrast Estimate Std. Error z Pr(>|z|) S
artist_nationality British / American 0.704 0.61052 1.15 0.249 2.0
artist_nationality French / American 0.121 0.09944 1.21 0.225 2.2
artist_nationality Other / American 0.381 0.24246 1.57 0.116 3.1
publication_year +1 1.031 0.00782 131.84 <0.001 Inf
2.5 % 97.5 % publication_year artist_nationality
-0.4927 1.901 1950 American
-0.0742 0.316 1950 American
-0.0939 0.857 1950 American
1.0155 1.046 1950 American
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, publication_year, artist_nationality, artist_gender
Type: response
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
publication_year +10 0.0153 0.00213 7.18 <0.001 40.4 0.0111 0.0195
publication_year artist_nationality
1950 American
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, publication_year, artist_nationality, artist_gender
Type: response
In most models, the effect of a predictor depends on the values of other predictors
What comparisons do we want to make? What portion of the predictor space do we cover?
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 %
publication_year +1 0.00394 0.000716 5.51 < 0.001 24.8 0.00254
publication_year +1 0.00440 0.000906 4.86 < 0.001 19.7 0.00263
publication_year +1 0.00488 0.001113 4.38 < 0.001 16.4 0.00269
publication_year +1 0.00526 0.001286 4.09 < 0.001 14.5 0.00274
publication_year +1 0.00565 0.001459 3.87 < 0.001 13.2 0.00279
97.5 %
0.00535
0.00618
0.00706
0.00778
0.00851
--- 2231 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
publication_year +1 0.00383 0.001285 2.98 0.00292 8.4 0.00131
publication_year +1 0.00412 0.001450 2.84 0.00446 7.8 0.00128
publication_year +1 0.00454 0.001681 2.70 0.00693 7.2 0.00124
publication_year +1 0.00412 0.001450 2.84 0.00446 7.8 0.00128
publication_year +1 0.00454 0.001681 2.70 0.00693 7.2 0.00124
97.5 %
0.00635
0.00697
0.00783
0.00697
0.00783
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, artist_gender, publication_year, artist_nationality
Type: response
Term Contrast Estimate Std. Error z Pr(>|z|) S
artist_nationality British - American -0.06448 0.02837 -2.27 0.02302 5.4
artist_nationality British - American -0.07582 0.02782 -2.73 0.00642 7.3
artist_nationality British - American -0.08852 0.02875 -3.08 0.00207 8.9
artist_nationality British - American -0.09964 0.03124 -3.19 0.00143 9.5
artist_nationality British - American -0.11161 0.03570 -3.13 0.00177 9.1
2.5 % 97.5 %
-0.12007 -0.00888
-0.13035 -0.02130
-0.14486 -0.03218
-0.16088 -0.03840
-0.18158 -0.04164
--- 8954 rows omitted. See ?avg_comparisons and ?print.marginaleffects ---
publication_year +1 0.00383 0.00129 2.98 0.00292 8.4
publication_year +1 0.00412 0.00145 2.84 0.00446 7.8
publication_year +1 0.00454 0.00168 2.70 0.00693 7.2
publication_year +1 0.00412 0.00145 2.84 0.00446 7.8
publication_year +1 0.00454 0.00168 2.70 0.00693 7.2
2.5 % 97.5 %
0.00131 0.00635
0.00128 0.00697
0.00124 0.00783
0.00128 0.00697
0.00124 0.00783
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, artist_gender, publication_year, artist_nationality
Type: response
# French artist in 1950
comparisons(
gender_year_national_fit,
newdata = datagrid(
publication_year = 1950,
artist_nationality = "French"
)
)
Term Contrast publication_year artist_nationality
artist_nationality British - American 1950 French
artist_nationality French - American 1950 French
artist_nationality Other - American 1950 French
publication_year +1 1950 French
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
-0.012858 2.93e-02 -0.44 0.6602 0.6 -7.02e-02 0.044474
-0.038184 1.78e-02 -2.14 0.0320 5.0 -7.31e-02 -0.003291
-0.026865 1.92e-02 -1.40 0.1623 2.6 -6.45e-02 0.010813
0.000219 9.11e-05 2.40 0.0163 5.9 4.03e-05 0.000397
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, publication_year, artist_nationality, predicted_lo, predicted_hi, predicted, artist_gender
Type: response
Term Contrast Estimate Std. Error z Pr(>|z|) S
artist_nationality British - American -0.07161 0.027900 -2.57 0.0103 6.6
artist_nationality French - American -0.12476 0.019320 -6.46 <0.001 33.1
artist_nationality Other - American -0.08531 0.020937 -4.07 <0.001 14.4
publication_year +1 0.00128 0.000335 3.80 <0.001 12.8
2.5 % 97.5 % publication_year artist_nationality
-0.126292 -0.01693 1994 French
-0.162627 -0.08689 1994 French
-0.126349 -0.04428 1994 French
0.000618 0.00193 1994 French
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, publication_year, artist_nationality, artist_gender
Type: response
Warning
Computationally efficient, but only makes sense if the “average” individual actually exists in the dataset.
Same as with predictions
Term Contrast Estimate Std. Error z Pr(>|z|) S
artist_nationality British - American -0.08264 0.026459 -3.12 0.00179 9.1
artist_nationality French - American -0.13410 0.017412 -7.70 < 0.001 46.1
artist_nationality Other - American -0.09095 0.018761 -4.85 < 0.001 19.6
publication_year +1 0.00268 0.000409 6.55 < 0.001 34.0
2.5 % 97.5 %
-0.13450 -0.03078
-0.16823 -0.09997
-0.12772 -0.05417
0.00188 0.00348
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
avg_comparisons(gender_year_national_fit,
variables = list(publication_year = 10),
by = "artist_nationality"
)
Term Contrast artist_nationality Estimate Std. Error z
publication_year mean(+10) American 0.0505 0.01217 4.15
publication_year mean(+10) British 0.0217 0.01458 1.49
publication_year mean(+10) French 0.0178 0.00703 2.54
publication_year mean(+10) Other 0.0298 0.00893 3.34
Pr(>|z|) S 2.5 % 97.5 %
<0.001 14.9 0.02666 0.0743
0.1375 2.9 -0.00692 0.0502
0.0112 6.5 0.00405 0.0316
<0.001 10.2 0.01232 0.0473
Columns: term, contrast, artist_nationality, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Type: response
Partial derivatives of the regression equation with respect to a regressor of interest.1
Since slopes are conditional quantities, they can be aggregated in a number of ways.
Term Contrast Estimate Std. Error z Pr(>|z|) S
bill_length_mm dY/dX 64.11 7.12 9.0034 <0.001 62.0
flipper_length_mm dY/dX 27.26 3.18 8.5865 <0.001 56.6
species Chinstrap - Adelie -656.05 94.90 -6.9128 <0.001 37.6
species Gentoo - Adelie 3.65 95.21 0.0383 0.969 0.0
2.5 % 97.5 %
50.2 78.1
21.0 33.5
-842.1 -470.0
-183.0 190.3
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
Term Contrast species Estimate Std. Error z Pr(>|z|) S
bill_length_mm mean(dY/dX) Adelie 72.7 10.6 6.83 <0.001 36.8
bill_length_mm mean(dY/dX) Chinstrap 31.7 12.7 2.48 0.013 6.3
bill_length_mm mean(dY/dX) Gentoo 71.5 10.8 6.60 <0.001 34.5
2.5 % 97.5 %
51.83 93.5
6.68 56.6
50.28 92.8
Columns: term, contrast, species, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Type: response
Term Contrast flipper_length_mm species Estimate
bill_length_mm dY/dX 180 Adelie 72.69
bill_length_mm dY/dX 180 Gentoo 71.53
flipper_length_mm dY/dX 180 Adelie 27.26
flipper_length_mm dY/dX 180 Gentoo 27.26
species Chinstrap - Adelie 180 Adelie -656.05
species Chinstrap - Adelie 180 Gentoo -656.05
species Gentoo - Adelie 180 Adelie 3.65
species Gentoo - Adelie 180 Gentoo 3.65
Std. Error z Pr(>|z|) S 2.5 % 97.5 %
10.64 6.8304 <0.001 36.8 51.8 93.6
10.87 6.5830 <0.001 34.3 50.2 92.8
3.17 8.5877 <0.001 56.6 21.0 33.5
3.18 8.5844 <0.001 56.6 21.0 33.5
94.90 -6.9128 <0.001 37.6 -842.1 -470.0
94.90 -6.9128 <0.001 37.6 -842.1 -470.0
95.21 0.0383 0.969 0.0 -183.0 190.3
95.21 0.0383 0.969 0.0 -183.0 190.3
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, flipper_length_mm, species, predicted_lo, predicted_hi, predicted, bill_length_mm, body_mass_g
Type: response
Note
Identical to comparisons for representative values assuming you use default settings
Term Contrast Estimate Std. Error z Pr(>|z|) S
bill_length_mm dY/dX 72.69 10.64 6.8312 <0.001 36.8
flipper_length_mm dY/dX 27.26 3.17 8.5877 <0.001 56.6
species Chinstrap - Adelie -656.05 94.90 -6.9128 <0.001 37.6
species Gentoo - Adelie 3.65 95.21 0.0383 0.969 0.0
2.5 % 97.5 %
51.8 93.5
21.0 33.5
-842.1 -470.0
-183.0 190.3
Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, bill_length_mm, species, flipper_length_mm, body_mass_g
Type: response
ae-19
ae-19
(repo name will be suffixed with your GitHub name).