Generalized Linear Mixed Models

Think back to intro stats when you learned to perform linear regression. You probably learned how to calculate confidence intervals and conduct hypothesis tests on regression coefficients. Whether you knew it or not, these sorts of statistical inference for the linear model usually rely on three requirements: the residuals are normally distributed, the residuals are independent, and the residuals have constant variance.

What if some of these requirements are not met? For example, what if your response is whether a shelter animal is adopted? Because this response variable is binary (rather than normal), a linear model is not appropriate. Additionally, people seeking to adopt a dog may prefer one breed over another. Because animals may be correlated (rather than independent), a linear model is not appropriate. We can relax the requirements of the linear model to model data sets such as this one.

Relaxing the linear model requirements creates a new class of models. If the responses are not necessarily normal but are more generally from an exponential family, the model is described as a generalized linear model. For example, a generalized linear model can model a binary outcome, such as whether a person votes for a particular candidate. If the responses are correlated, the model is described as a mixed model. For example, siblings may be similar, and a mixed model accounts for this correlation by incorporating random effects (unobservable random variables that are typically normally distributed with mean zero). A generalized linear mixed model (GLMM) incorporates a response from an exponential family as well as fixed and random effects. GLMMs are widely used: a Google Scholar search for generalized linear mixed models returns over 2.2 million results.

Despite their widespread use, frequentist likelihood-based inference is limited. Frequentist likelihood-based inference includes (but is not limited to) performing maximum likelihood, calculating Fisher information, conducting hypothesis tests, and constructing confidence intervals. Most methodology and software for GLMMs performs little more than maximum likelihood or does not perform likelihood-based inference at all. The challenge lies in the likelihood function, which is often an intractable integral. To perform all frequentist likelihood-based inference, the entire likelihood function is required.

My R package glmm approximates the entire likelihood function using a Monte Carlo likelihood approximation. The package maximizes the likelihood approximation and reports Monte Carlo maximum likelihood estimates. Users can conduct all likelihood-based inference because the entire likelihood function is approximated. You can learn how to use the R package glmm by following this introductory guide. R package glmm is downloaded approximately 1000 times per month.

Special thanks to Google and its Summer of Code for supporting my R package development during the summer of 2014.

Additional Links