extrapolation and interpolation
The most important lesson I learned from this book: regression is reliable for interpolation, but not for extrapolation. Even further, your observations really need to cover the whole gamut of causal variables, intersections included, to justify faith in your regressions.
Imagine you have two causal variables, A and B, that are causing X. Maybe your data cover a wide range of observations of A — some high, some low, some in-between. And you have, too, the whole gamut of observations of B — high, low, and medium. It might still be the case that you haven’t observed A and B together (not seen ). Or that you’ve only observed them together (not seen ). In either case, your regression is effectively extrapolating to the other causal region and you should not trust it.
Let’s keep the math sexy. Say you meet an attractive member of your favorite sex. This person A) likes to hunt, and B) is otherwise vegetarian. Your prejudices are that you don’t like hunters () and you do like vegetarians (). By comparing the magnitudes of these preferences, you deduce that you should not get along with this attractive person, because the bad A part outweighs the good B part.
However, since you haven’t observed both A and B positive at once, your preconceptions are not to be trusted. Despite your instincts , you go out on a date with Mr or Ms (A>0, B>0) and have a fantastic time. Turns out there was a positive interaction term in the range, it also correlates positively with the noise (it wasn’t noise, just unknown knowledge), and you’ve found your soul mate.