Dummy variables
A dummy variable is a qualitative variable that can take only two values: 0
and 1. It is called a ``dummy" variable because it represents information from
a categorical variable. A dummy variable is also referred to as an indicator
variable.
It can be interpreted as a ``switch" variable were it's on (d=1) or off (d=0),
indicating whether the condition holds or not.
Some examples where a dummy variable is useful include:
1. educational status (college, no college). A company is interested in
estimating the effect of college education on salaries paid within the
company.
2. repair type (electrical, mechanical). A company provides maintenance
services for water-filtration systems. The managers believe that the repair
time is a function of the number of months since last service and the type
of repair problem.
3. sales regions (A, B, C, D). A manufacturer of copy machines would like to
predict the number of copiers sold per week, but treating the regions
differently.
4. type of population (rural, urban)
5. institution type (public, private)
6. type of firm (unionized, not unionized)
7. gender (male, female)
8. political party (republican, democrat)
9. housing data (with and without pool)
10. method of payment (check, credit card, cash)
11. days of the week (weekday, weekend)
12. season (summer, other seasons)
13. season (summer, fall, winter, spring)
Purpose
Dummy variables are used in regression models to analyze and estimate
differences among groups.
A dummy variable is an explanatory variable that is included in a regression
like other regressors in the multiple regression framework.
Number of groups
We can define n-1 dummy variables if the number of groups is n.
Otherwise if n dummy variables are defined and included in a regression,
perfect multicollinearity would not allow to estimate the regression.
Reference group
One of the two groups in a definition of a dummy variable is called the
``excluded" and the other is called the ``included" group. The latter makes
reference to the group identified with a value of 1 in the definition of the
dummy variable. The other (``excluded" group) carries a value of 0. The
``excluded" group is also referred to as the ``control" group, or the
``benchmark" group. This is the group used as reference to
make comparisons, and it represents that category for which a dummy variable
is not included in the regression. For instance, if d=1 for females and d=0 for
males, and we include d, then the left out group is males, which becomes the reference
group. The results obtained must be compared with this reference group.
Common slope
Suppose that the effect of x on y is the same for both groups, and that
regardless of the level of x there is a systematic difference between the two
groups. Graphically, the situation is depicted by two parallel lines, with
different intercepts.
(See images.)
Interactions
An interaction between a dummy variable and a quantitative variable allows the
analyst to estimate difference in the slope among groups. For instance, in the
salary equation if we include the ``product" variable sex*experience the
coefficient of that variable would indicate whether there is any difference
between the additional salary that males and females can obtain with an
additional year of experience.
(See images.)