Funnily, mixed effect regression was the first type of regression analysis I learned (I was given a huge complex data set with no prior R experience and told to analyze it). I compiled a collection of papers and link and books that I used to self teach. Right now it’s a bit disorganized, but I will slowly put some structure when I have free time. My goal is to provide the links and a description as to why they were useful/why I needed that information so one can follow along and self teach also.

To begin, the following are MUST READ books or papers or tutorials. They really set the foundation for understanding and building multilevel models and what they capture beyond regular ordinary least squares regression (they aren’t necessarily in reading order, links may need to be updated too).

The following are extra materials that are highly relevant but that I didn’t interact much with. They may (or may not) be useful.

Finally, here are some pages that go over some of the basic questions I had during implementation. I will try to cluster them into overarching topics.

Understanding the analysis

Of course, the first search I did was to understand mixed regression in general. What does it do? Why do I need it? How is it different from other analyses?

Confidence Intervals

After running your regression, how do you get confidence intervals for your betas? Typically you use confint(model) or if you want wald (asymptotic and fast but less precise) confidence intervals, you use confint(model, method=’Wald’). However, here are some links for comparing confidence intervals through other packages or the difference between prediction intervals and confidence intervals.

(Restricted) Maximum Likelihood Estimation

An important aspect to understanding these models is how the parameters are estimated (hint: not using least squares). They use Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML).


I understand the lack of p values in these models, but I come from traditional labs, so I had to learn how to draw p value based inferences from these models. There are many methods for this: likelihood ratio test (lrt) for model comparison, lmerTest for both anova and predictor style inference, etc.

Logistic Regression

I ended up modeling trial accuracy data, which is a binary outcome variable and thus requires logistic regression models. The implementation wasn’t difficult, but interpreting the results takes practice and care. These links are general tutorials that helped me understand implementation and coefficient interpretation.

Model Building

I keep getting mixed advice about this approach and its varieties. I was taught by a statistician who said stepwise approaches were ok but I read otherwise. For exploratory work this may be ok (as compared to confirmatory), but do what you want. I’ll just post the materials I used to understand these methods.

Model Complexity

When I first started, I wondered how crazy these models can get. Can I just throw every variable in? Are there costs/benefits/limitations to parsimony vs complexity?

Model Fits

Diagnosing whether the model fits well and how to do so is important. This typically involves some form of checking unexplained variance along with examining assumptions.


When the data is not robust enough for the model or the model is too complex, it will not converge. This tends to render your estimates unreliable. So this is an important issues to either fix or look into to see how bad it is.

Variance Components

I’m currently working on projects that are more interested in the variance components than the betas. The variance components tell you how much the means vary across units of your random effects, e.g., if participants is a random effect, how much their intercepts vary. Important to this topic are intraclass correlations (ICC) and variance partitioning coefficients (VPC) and their interrelations.


Related to variance components, the within/between subject variance can give you a sense  about the reliability of your measure. The within subject variance would be the residuals that aren’t captured in the model, the between would be the random effect groupings. Not all links are necessarily mixed model related, but may be useful. Note: this is intimately related to variance components/icc above so those sources will also help.

Power Analysis

The hardest part (for me) about starting a study is determining power, especially when your analyses consist of complex mixed models. I haven’t fully read through all of these links,  but I am aggregating them to read soon.


This is an approach I’m slowly starting to look into, how to make my multilevel models bayesian. Here are some packages that are helpful.

  • BRMS package
    • Uses the same R syntax as lmer but runs bayesian estimation.
    • Here is another link for it, tutorial
  • glmer2stan
    • a function like lmer but uses STAN as the compiler. A bit hard to use imo.
  • rstanarm
    • very complete tutorial on how to use this package here.


This is just stuff I learned through the process that may not be directly related to mixed models.


So these are the links I found most useful, and I will update as I continue forward. And when I have more time I will make the links more descriptive as they are cryptic at the moment.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s