Brock Ferguson, PhD

Data Scientist, Cognitive Scientist, Web Developer, and Entrepreneur

R Workshop on Using Linear Models, Logistic Regression, and Growth Curve Analyses to Analyze Eye-tracking Data

Update: The workshop tutorials (below) were updated on July 1, 2015 for a workshop at Northwestern University. They have been updated to use the dplyr library instead of plyr, to elaborate on some important topics such as model comparsion, and to fix some previously incorrect nomenclature.

A few months ago (December, 2014), I was invited by Chris Fennell and Tania Zamuner at the University of Ottawa to lead a one-day workshop that would ultimately give the students and faculty in attendance an understanding of growth curve analyses and how they might be used to analyze eye-tracking data.

In order to discuss growth curve models (a special case of a mixed-effects linear model looking at change over time), I knew that we were going to need to start with a more fundamental discussion about R basics, linear models, and generalized linear models.

I therefore put together a 4-part series of tutorials with this goal, organized as follows:

  1. Introduction to R
    What are dataframes and vectors? How do R functions work? How do statistical tests in R work? How can I import and export data?
     
  2. General Linear Models
    How can I fit linear models in R? When should I use aov() and when should I use lm()? How can I interpret parameter estimates (without the help of SPSS...)?
     
  3. Generalized Linear Models
    How can I use generalized linear models (e.g., logistic regression) to do time-based eye-tracking analyses? How can I use empirical logit regression and the arcsin-root transformation to approximate logistic regression? How do mixed-effects models' random effects (intercepts and slopes) work in lmer()? How can I compare nested and non-nested mixed-effects models?
     
  4. Growth Curve Analyses
    How do I look at non-linear change over time? What are the differences between natural and orthogonal polynomials? How can interpret estimates in a growth curve model versus an empirical logit model? How can I visualize my raw data and model fits simultaneously?

 

We covered a host of packages along the way, including dplyr, ggplot2, and lme4, gradually fitting more complex models and making better visualizations.

It was a wild day (this could honestly be an entire course worth of information) but I got the sense that people left with at least a working understanding of how to use these kinds of analyses on their own experiments.

I've therefore decided to share the workshop online here in case other people might be able to make some use of it.

As I stated in Ottawa, I'm really just trying my best to tunnel the tutorials and papers that I've read in order to learn these techniques, most notably from Dan Mirman, Dale Barr, and Florian Jaeger -- all credit should go to them while I take responsibility for any errors that might pop up. I also thank Mike Frank and the team behind the new Wordbank project at Stanford for making available the vocabulary data used in the first two tutorials.

If you are interested in this workshop, you can either download the data and code (RMarkdown format, with .html exports from Github) or browse the knitted RMarkdown files below, with source code and commentary combined in one file:

  1. Introduction to R
  2. General Linear Models
  3. Generalized Linear Models
  4. Growth Curve Analyses

 

Please let me know by email if you find any mistakes so that I can correct them. Otherwise, I hope you find the workshop useful!