Analysis 1: By-subjects and By-items ANOVAs (i.e., F1 and F2)
Analysis 2: Simultaneous Trial and Subject Random Effects
Two important questions:
- Question 1: Which random effect structure should we specify?
- Question 2: How do we get p-values for specific factors?
Analysis 3: Linear time analyses using logistic regression
A note about empirical logits, arc-sine roots, and other transformations

Load required packages.

library(ggplot2)
library(lme4)

## Loading required package: Matrix
## Loading required package: Rcpp

## Warning: package 'Rcpp' was built under R version 3.1.3

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Load our eyetracking dataset. These data are from the familiar world trials for 19-month-olds reported in Ferguson, Graf, & Waxman (2014, Cognition).

data <- read.csv('data-eyetracking.csv')

What kind of columns do we need for these analyses?

summary(data)

##        X          ParticipantName Sex            Age       
##  Min.   :  1306   ANCAT60: 1947   F:26290   Min.   :18.02  
##  1st Qu.:138485   ANCAT74: 1947   M:21456   1st Qu.:18.77  
##  Median :281476   ANCAT85: 1914             Median :19.40  
##  Mean   :280904   ANCAT39: 1911             Mean   :19.67  
##  3rd Qu.:421794   ANCAT84: 1903             3rd Qu.:20.84  
##  Max.   :563436   ANCAT77: 1901             Max.   :21.86  
##                   (Other):36223                            
##     GazeXAvg         GazeYAvg                Trial           Target     
##  Min.   :  51.0   Min.   :251.0   FamiliarBird  :7209   Animal  :30888  
##  1st Qu.: 543.5   1st Qu.:546.5   FamiliarBottle:7893   Artefact:16858  
##  Median :1224.5   Median :623.0   FamiliarCow   :8235                   
##  Mean   :1022.1   Mean   :632.2   FamiliarDog   :7240                   
##  3rd Qu.:1479.0   3rd Qu.:722.5   FamiliarHorse :8204                   
##  Max.   :1861.5   Max.   :960.0   FamiliarSpoon :8965                   
##                                                                         
##  TimeFromSubphaseOnset  TrialNumber       Animate         Inanimate     
##  Min.   :   0          Min.   :1.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1250          1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :2533          Median :4.000   Median :1.0000   Median :0.0000  
##  Mean   :2605          Mean   :3.495   Mean   :0.6552   Mean   :0.3448  
##  3rd Qu.:3917          3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :5500          Max.   :6.000   Max.   :1.0000   Max.   :1.0000  
##

This dataset is only a few steps removed from what comes out of the standard eyetracker.

I have done some prep, though:

removed trackloss
converted GazeX and Y columns to AOI 1 or 0 columns
merged in Subject information

Analysis 1: By-subjects and By-items ANOVAs (i.e., F1 and F2)

Pros: * simple * allows for generalization beyond samples participants and sampled items.

Cons: * lose all timing information * what do we do with ambiguities between analyses?

We need to aggregate across trials by target within participants:

agg_subjects <- data %>%
                group_by(ParticipantName,Sex,Age,Target) %>%
                summarise(PropAnimal = mean(Animate)) %>%
                ungroup()

Visualize our aggregated data:

ggplot(agg_subjects, aes(x=Target, y=PropAnimal)) +
                  geom_point(position=position_jitter(.3))

Use our best practices from the lm() tutorial to prepare and model these data:

agg_subjects$TargetCoded <- ifelse(agg_subjects$Target == 'Artefact', -.5, .5)

There’s no need to center here because these data are balanced (every subject is present in both conditions). But this is how we would center, anyways:

agg_subjects$TargetCoded <- scale(agg_subjects$TargetCoded, center=T, scale=F)

Here we use aov() because it allows for a repeated-measures Error() term.

As we learned before, aov() uses Type I sums of squares, but with only one factor (i.e., no correlation issues), it’s safe.

model <- aov(PropAnimal ~ TargetCoded + Error(ParticipantName/TargetCoded), data = agg_subjects)
summary(model)

## 
## Error: ParticipantName
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 29 0.5261 0.01814               
## 
## Error: ParticipantName:TargetCoded
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## TargetCoded  1 1.9671  1.9671   94.06 1.32e-10 ***
## Residuals   29 0.6065  0.0209                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looks good! That’s our F2 “subjects” ANOVA.

Now we can do an F1 “items” ANOVA as well. This just involves slightly changing our group_by() call:

agg_items <- data %>%
              group_by(Trial,Target) %>%
              summarise(PropAnimal = mean(Animate)) %>%
              ungroup()
  
agg_items$TargetCoded <- ifelse(agg_items$Target == 'Artefact', -.5, .5)

Visualize effects by items:

ggplot(agg_items, aes(x=Target, y=PropAnimal, fill=Trial)) +
                  geom_point(position=position_jitter(.3))

Normally, in an F2 analysis, we would include an Error() term because we would have observed each condition within each item and thus have a sense about the size of the condition effect for each item. However, this was a study with infants and we couldn’t do that. So we won’t include an Error() term here, and just do a between-subjects one-way ANOVA.

model <- aov(PropAnimal ~ TargetCoded, data = agg_items)
summary(model)

##             Df  Sum Sq Mean Sq F value   Pr(>F)    
## TargetCoded  1 0.17868 0.17868   150.2 0.000254 ***
## Residuals    4 0.00476 0.00119                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These F1/F2 analyses are both crystal clear (reject the null!). But, in other cases, there can be ambiguit. For example, what if one is significant and the other is marginal?

Ideally, we could have one test which allows us to control for random trial AND subject factors simultaneously. Enter lmer()…

Analysis 2: Simultaneous Trial and Subject Random Effects

Aggregate data by Trials (Items) and Participants (i.e., one datapoint for each trial).

agg_sub_items <- data %>%
                 group_by(ParticipantName,Trial,Target) %>%
                 summarise(PropAnimal = mean(Animate)) %>%
                 ungroup()
  
agg_sub_items$TargetCoded <- ifelse(agg_sub_items$Target == 'Artefact', -.5, .5)

Fit a mixed-effects model allowing the intercept (represented by a “1”) to vary by both Participants and Trials. By allowing the intercept to vary by subjects and items, we are allowing the model to estimate (and thus control for) each participants’ and trials’ mean tendency to look (cause participants to) look at the animal regardless of condition. For example, Billy may just love animals while Sammy may hate animals. Importantly, we want to know whether they word they heard (represented here by TargetCoded) caused them to look more/less to the animal above and beyond these baseline preferences.

model <- lmer(PropAnimal ~ TargetCoded + (1 | ParticipantName) + (1 | Trial), data = agg_sub_items)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: PropAnimal ~ TargetCoded + (1 | ParticipantName) + (1 | Trial)
##    Data: agg_sub_items
## 
## REML criterion at convergence: -75
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5394 -0.7203  0.1475  0.7599  2.2517 
## 
## Random effects:
##  Groups          Name        Variance  Std.Dev.
##  ParticipantName (Intercept) 2.932e-04 0.017123
##  Trial           (Intercept) 1.041e-05 0.003226
##  Residual                    3.517e-02 0.187543
## Number of obs: 169, groups:  ParticipantName, 30; Trial, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.60097    0.01552   38.73
## TargetCoded  0.36636    0.03040   12.05
## 
## Correlation of Fixed Effects:
##             (Intr)
## TargetCoded -0.295

This looks good, and converges with our previous estimate of a significant effect.

Note that, at the top of the summary output, we can see that there is very little variance in our random effect estimates. This means that the model is having trouble estimating them and is therefore keeping them all very near zero.

This is likely the result of (a) small random differences between subjects and trials in this sample and, (b) a relatively small dataset. We will want to continue keep an eye on the variance of random effects – when this is essentially 0, we may want to consider using a regular ANOVA.

But in the meantime let’s dive into this model a bit more.

We can see the fixed effects:

fixef(model)

## (Intercept) TargetCoded 
##   0.6009708   0.3663638

Here, because we centered the variable, Intercept is the overall mean looking to animal. TargetCoded represents the difference between looking to the animal between our two conditions. If we subtract 1/2 of it from the intercept, we get our mean looking to the animal when the “Artefact” was named and, if we add 1/2 of it to the intercept, we get our mean looking to the animal when the “Animal” was named.

We can also see random effects (some people use these to describe individual differences):

ranef(model)

## $ParticipantName
##            (Intercept)
## ANCAT139  3.067738e-05
## ANCAT18  -7.579061e-03
## ANCAT22  -1.976854e-03
## ANCAT23   5.495601e-03
## ANCAT26   6.284477e-04
## ANCAT39  -6.892545e-04
## ANCAT45  -3.207405e-04
## ANCAT50   7.079541e-03
## ANCAT53   1.759116e-03
## ANCAT55   3.962751e-03
## ANCAT58  -9.323336e-04
## ANCAT59  -8.403474e-03
## ANCAT60   3.787870e-03
## ANCAT64   2.168447e-03
## ANCAT69   3.883428e-03
## ANCAT72   6.344088e-04
## ANCAT74  -4.849614e-03
## ANCAT75  -7.808166e-05
## ANCAT76  -1.717914e-03
## ANCAT77  -2.516944e-03
## ANCAT79  -1.341781e-03
## ANCAT80  -5.990168e-04
## ANCAT81   2.998120e-03
## ANCAT82   1.101647e-04
## ANCAT84   6.864471e-04
## ANCAT85   1.635357e-03
## ANCAT86   9.915964e-04
## ANCAT88   4.561981e-03
## ANCAT90  -4.401428e-03
## ANCAT91  -5.007457e-03
## 
## $Trial
##                  (Intercept)
## FamiliarBird   -7.404748e-05
## FamiliarBottle -3.025787e-04
## FamiliarCow     1.469791e-04
## FamiliarDog     2.172776e-04
## FamiliarHorse  -2.902092e-04
## FamiliarSpoon   3.025787e-04

Here they are nearly zero, which corresponds to our assessment of their variance earlier.

We can also see what the model’s predictions were, for each participant/item. Here we will get the mean prediction for each subject for each type of Target trial.

agg_sub_items$prediction <- predict(model, agg_sub_items)

ggplot(agg_sub_items, aes(x=Target, y=prediction, color=ParticipantName)) +
                          stat_summary(fun.y='mean', geom='point', position=position_jitter(.3))

You can see that the model is making different predictions for each subject. By allowing their intercepts to vary, the model is accounting for the fact that some kids just like looking at animals more than others. But it’s being very conservative with its estimates because, frankly, we haven’t given it much data to go on.

Importantly, with these models, we aren’t limited to varying just the intercept by specific grouping factors. We can also vary slopes (i.e., fixed effects) by participants as well. By adding “TargetCoded” as a random slope, we allow the model to vary the magnitude of the difference between Target conditions within participants.

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) +  
##     (1 | Trial)
##    Data: agg_sub_items
## 
## REML criterion at convergence: -79.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6244 -0.5968  0.1228  0.8048  1.5375 
## 
## Random effects:
##  Groups          Name        Variance  Std.Dev. Corr 
##  ParticipantName (Intercept) 0.0019166 0.04378       
##                  TargetCoded 0.0176256 0.13276  -0.62
##  Trial           (Intercept) 0.0001628 0.01276       
##  Residual                    0.0302269 0.17386       
## Number of obs: 169, groups:  ParticipantName, 30; Trial, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.60139    0.01709   35.18
## TargetCoded  0.36690    0.03875    9.47
## 
## Correlation of Fixed Effects:
##             (Intr)
## TargetCoded -0.392

We now see an additional random effect by the ParticipantName group: TargetCoded. We also see that it is actually varying between subjects, suggesting the model is become more confident in the differences between our subjects.

Look at which participants responded stronger/weaker to the Target manipulation.

These random effects are centered with respect to the fixed effect estimated. Therefore, negative effects represented a weaker effect of TargetCoded for this participant. This could be a cool way to look at individual differences – who is responding most accurately to the words spoken?

ranef(model)

## $ParticipantName
##           (Intercept)  TargetCoded
## ANCAT139 -0.003371942  0.021645329
## ANCAT18  -0.047720934  0.088921906
## ANCAT22  -0.020133825  0.064220849
## ANCAT23   0.045645607 -0.120935077
## ANCAT26   0.010818645 -0.042459026
## ANCAT39  -0.031247615  0.143187095
## ANCAT45   0.005218355 -0.033157153
## ANCAT50   0.033985870 -0.030651227
## ANCAT53   0.024120929 -0.089905738
## ANCAT55   0.003523923  0.060704519
## ANCAT58  -0.027835325  0.121156739
## ANCAT59  -0.022980040 -0.052124161
## ANCAT60   0.057991830 -0.217142750
## ANCAT64  -0.005382758  0.069826623
## ANCAT69  -0.004259358  0.106035660
## ANCAT72   0.015638286 -0.066617596
## ANCAT74  -0.020874966  0.008064614
## ANCAT75   0.024264332 -0.124261091
## ANCAT76  -0.017811306  0.056915313
## ANCAT77  -0.021282991  0.056586355
## ANCAT79   0.023567163 -0.145922051
## ANCAT80  -0.003059242  0.002995218
## ANCAT81   0.002543200  0.046429263
## ANCAT82   0.016859287 -0.083210843
## ANCAT84  -0.011182862  0.069521464
## ANCAT85  -0.008033329  0.072558905
## ANCAT86  -0.004961929  0.046416624
## ANCAT88   0.018437049  0.001172374
## ANCAT90  -0.025136982  0.038460872
## ANCAT91  -0.007339073 -0.068433009
## 
## $Trial
##                 (Intercept)
## FamiliarBird   -0.001112706
## FamiliarBottle -0.004764037
## FamiliarCow     0.002354559
## FamiliarDog     0.003379948
## FamiliarHorse  -0.004621802
## FamiliarSpoon   0.004764037

subject_slopes <- ranef(model)$ParticipantName
subject_slopes$subject <- factor(rownames(subject_slopes))
colnames(subject_slopes) <- c('Baseline','ConditionEffect','Subject')

ggplot(subject_slopes, aes(x=Subject, y=ConditionEffect)) +
              geom_point() +
              geom_text(aes(label=Subject),hjust=0,vjust=0,size=3) +
              geom_hline(yint=0, linetype="dashed", alpha=.5) +
              theme(axis.text.x=element_blank())

With this random slope included in the model, we see more variation now in our predictions by subjects. It’s even picking up on the fact that participants consistently look to the animal and the real variance lies in whether they look away from the animal when the artefact is named.

agg_sub_items$prediction <- predict(model, agg_sub_items)

ggplot(agg_sub_items, aes(x=Target, y=prediction, color=ParticipantName)) +
                          stat_summary(fun.y='mean', geom='point', position=position_jitter(.3))

Two important questions:

Although you’ve just played around with your first two mixed-effects models, you’re likely already asking yourself two pressing questions…

Question 1: Which random effect structure should we specify?

If you are going to use mixed-effects models, you are going to require AT LEAST a random intercept for every natural “grouping” of data. However, beyond random intercepts, what random slopes should you include?

Guidelines

One way you can decide this is by adhering to guidelines. Unfortunately, you’ll find guidelines pull you in opposing directions.

For example, one school of thought (see Barr et al., 2013) is to “keep it maximal”. That is, include every random effect that your experimental design permits (i.e., every factor that appeared across subjects or trials). Another school of thought (see Bates et al., 2015) is to keep it parsimonious. Don’t overcomplexify your models with lots of random slopes, as this will make model estimates increasingly hard to reconcile with the data and risk overparamterizing your model:

# commented out because it kills knitr
# model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded + Trial | ParticipantName) + (1 | Trial), data = agg_sub_items)

Model comparison

A second way to decide is to think bottom-up from the data. Compare two models – one with your random slope and another without your random slope – and see if your random slope model is actually a better fit.

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)

model_null <- lmer(PropAnimal ~ TargetCoded + (1 | ParticipantName) + (1 | Trial), data = agg_sub_items)

anova(model,model_null) # -2 log-likelihood ratio test, gives you Chisq(df) = X and a p-value.

## refitting model(s) with ML (instead of REML)

## Data: agg_sub_items
## Models:
## model_null: PropAnimal ~ TargetCoded + (1 | ParticipantName) + (1 | Trial)
## model: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + 
## model:     (1 | Trial)
##            Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)
## model_null  5 -76.736 -61.086 43.368  -86.736                         
## model       7 -76.433 -54.524 45.217  -90.433 3.6973      2     0.1575

We’ll talk more about model comparison – a very powerful tool – in just a minute.

Variance of random effects

Another bottom-up approach is to look at the variance of the random effect estimates (like we did before) rather than thinking dichotomously about the “significance” of the random slope.

summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) +  
##     (1 | Trial)
##    Data: agg_sub_items
## 
## REML criterion at convergence: -79.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6244 -0.5968  0.1228  0.8048  1.5375 
## 
## Random effects:
##  Groups          Name        Variance  Std.Dev. Corr 
##  ParticipantName (Intercept) 0.0019166 0.04378       
##                  TargetCoded 0.0176256 0.13276  -0.62
##  Trial           (Intercept) 0.0001628 0.01276       
##  Residual                    0.0302269 0.17386       
## Number of obs: 169, groups:  ParticipantName, 30; Trial, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.60139    0.01709   35.18
## TargetCoded  0.36690    0.03875    9.47
## 
## Correlation of Fixed Effects:
##             (Intr)
## TargetCoded -0.392

Design limitations

A final approach – and my preferred approach – is to combine the bottom-up and top-down approaches to ask yourself, what random effects can my design actually allow me to estimate with reasonable precision? Here, by “precision”, I mean precision with respect to the subject’s true random effect.

For example, if I had only two trials of each type (one in which I named an animal, and the other in which I named an artefact), I can calculate each subject’s exact random slope for the effect of Target. This is a very precise estimate, but it’s artificially precise – there’s no chance I’ve accurately captured this subject’s responsiveness to which target was named. With a handful of trials of each type, I’m less precise in accounting for my exact data but I’m more precise in estimating their true responsiveness.

Think about your design and what it can estimate. Include those variables that it estimates well as random slopes. Ignore those variables that you can’t estimate well and even consider collapsing across those observations which you aren’t distinguishing with a random effect.

Question 2: How do we get p-values for specific factors?

Unlike lm() and aov(), lmer() doesn’t generate p-values… It’s unclear, in these kinds of models, how to calculate the degrees of freedom – required by the ‘t’ statistic – in order to get a p-value.

Nevertheless, we have several options for doing so:

Use model comparison

Fit your model with the factor of interest:

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)

Fit a second model without this factor:

model_null <- lmer(PropAnimal ~ 1 + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
anova(model,model_null)

## refitting model(s) with ML (instead of REML)

## Data: agg_sub_items
## Models:
## model_null: PropAnimal ~ 1 + (1 + TargetCoded | ParticipantName) + (1 | Trial)
## model: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + 
## model:     (1 | Trial)
##            Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## model_null  6 -57.654 -38.875 34.827  -69.654                             
## model       7 -76.433 -54.524 45.217  -90.433 20.779      1  5.154e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this test, the degrees of freedom represent the difference between the number of parameters in the models. Note that this will be [NUMBER OF LEVELS - 1] for a removed factor, i.e., it can be >1 for a single factor.

For many factors, you can save time by automating the removal of each factor with drop1():

drop1(model,~.,test="Chisq")

## Single term deletions
## 
## Model:
## PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + 
##     (1 | Trial)
##             Df     AIC    LRT   Pr(Chi)    
## <none>         -76.433                     
## TargetCoded  1 -57.654 20.779 5.154e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So what’s really going on when we compare these models?

The test we are actually running is called a -2 log-likelihood ratio test. The gist of this approach is that it compares the likelihood of the data under the full model (including our parameter) to the likelihood of the data under our null model (without our parameter). If the difference is great enough relative to the number of parameters difference between the models (the degrees of freedom in the test), then we can say the difference is significant.

Let’s do one by hand. We begin by re-fitting both our models with the new option REML=FALSE. anova() and drop() do this automatically for us but, because we are going to do this by hand, we need to do it manually.

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items, REML=F)

model_null <- lmer(PropAnimal ~ 1 + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items, REML=F)

Second, let’s extract the log likelihood of the data under the full model and null model:

log_model <- logLik(model)
log_model

## 'log Lik.' 45.21661 (df=7)

log_null <- logLik(model_null)
log_null

## 'log Lik.' 34.82699 (df=6)

Higher is better for log-likelihoods. Higher is more likely.

Now calculate -2*log() the ratio of these two numbers, after calculating their natural exponents:

-2*log(exp(log_null)/exp(log_model))

## 'log Lik.' 20.77924 (df=6)

We are now left with a log-likelihood ratio test statistic of 20.77924. We can now see whether this is significant by checking the Chi-square distribution with a degrees of freedom equivalent to the difference in parameters between our two models. The df’s can be retrieved by the logLik() function, or calculated as (factors removed)*(levels per factor) - (factors removed).

log_model

## 'log Lik.' 45.21661 (df=7)

log_null

## 'log Lik.' 34.82699 (df=6)

  # 1 df

pchisq(20.77924, df=1, lower.tail=F)

## [1] 5.153862e-06

Note that this technique is only appropriate for nested models – i.e., comparing a null model which is some subset of a larger model. To compare non-nested models, you should use another criterion such as AIC or BIC (also reported in drop1()).

For example, is what word the subjects heard a better predictor than a random factor I create?

agg_sub_items$RandomFactor <- ifelse(rbinom(nrow(agg_sub_items),1,.5) == 1, 'On','Off')

model1 <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items, REML=F)

model2 <- lmer(PropAnimal ~ RandomFactor + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items, REML=F)

anova(model1,model2)

## Data: agg_sub_items
## Models:
## model1: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + 
## model1:     (1 | Trial)
## model2: PropAnimal ~ RandomFactor + (1 + TargetCoded | ParticipantName) + 
## model2:     (1 | Trial)
##        Df     AIC     BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## model1  7 -76.433 -54.524 45.217  -90.433                        
## model2  7 -55.771 -33.861 34.885  -69.771     0      0          1

For BIC and AIC, lower is better (unlike log-likelihoods). They represent a score representing the fit of the model after penalizing the model for the number of additional parameters. BIC carries a stricter penalty. Interpreting the degree of difference in fit involves using agreed-upon standards, for example, that a <2 BIC difference isn’t great evidence but a 2-6 point BIC difference is good, 6-10 is very strong, etc.

Treat the t’s as z’s

This is not as dangerous as it sounds with sufficient sample sizes.

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
ts <- fixef(model)/sqrt(diag(vcov(model)))
ts

## (Intercept) TargetCoded 
##   35.181667    9.469065

2*pnorm(abs(ts),lower.tail=FALSE)

##   (Intercept)   TargetCoded 
## 3.813701e-271  2.823611e-21

Just use confidence intervals (new statistics!)

confint(model)

## Computing profile confidence intervals ...

## Warning in profile.merMod(object, signames = oldNames, ...): non-monotonic
## profile

## Warning in zetafun(np, ns): NAs detected in profiling

## Warning in profile.merMod(object, signames = oldNames, ...): non-monotonic
## profile

##                  2.5 %     97.5 %
## .sig01       0.0000000 0.08577931
## .sig02      -1.0000000 1.00000000
## .sig03       0.0000000        Inf
## .sig04       0.0000000 0.05231130
## .sigma       0.1534425 0.19948714
## (Intercept)  0.5675518 0.63533560
## TargetCoded  0.2915550 0.44279942

Use an available technique to approximate the degrees of freedom

Kenward-Roger approximation:

library(pbkrtest)
model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
model_null <- lmer(PropAnimal ~ 1 + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
KRmodcomp(model,model_null)

## Note: method with signature 'sparseMatrix#ANY' chosen for function 'kronecker',
##  target signature 'dgCMatrix#ngCMatrix'.
##  "ANY#sparseMatrix" would also be valid

## F-test with Kenward-Roger approximation; computing time: 0.50 sec.
## large : PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + 
##     (1 | Trial)
## small : PropAnimal ~ 1 + (1 + TargetCoded | ParticipantName) + (1 | Trial)
##          stat     ndf     ddf F.scaling   p.value    
## Ftest 89.5007  1.0000  7.7542         1 1.581e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

detach('package:pbkrtest',unload=TRUE)

Satterthwaite approximation:

library(lmerTest)

## 
## Attaching package: 'lmerTest'
## 
## The following object is masked from 'package:lme4':
## 
##     lmer
## 
## The following object is masked from 'package:stats':
## 
##     step

model <- lmer(PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) + (1 | Trial), data = agg_sub_items)
summary(model)

## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [merModLmerTest]
## Formula: PropAnimal ~ TargetCoded + (1 + TargetCoded | ParticipantName) +  
##     (1 | Trial)
##    Data: agg_sub_items
## 
## REML criterion at convergence: -79.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6244 -0.5968  0.1228  0.8048  1.5375 
## 
## Random effects:
##  Groups          Name        Variance  Std.Dev. Corr 
##  ParticipantName (Intercept) 0.0019166 0.04378       
##                  TargetCoded 0.0176256 0.13276  -0.62
##  Trial           (Intercept) 0.0001628 0.01276       
##  Residual                    0.0302269 0.17386       
## Number of obs: 169, groups:  ParticipantName, 30; Trial, 6
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  0.60139    0.01709 5.50900  35.182 1.07e-07 ***
## TargetCoded  0.36690    0.03875 8.09500   9.469 1.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## TargetCoded -0.392

detach('package:lmerTest',unload=TRUE)

Analysis 3: Linear time analyses using logistic regression

All of the analyses thus far have immediately disregarded time as a variable of interest. Yet we know that look changes over time and, moreover, that this change over time can offer insight into the cognitive processes underlying participants’ behaviour. So how can we look at behaviour over time?

One way to do this is to use logistic regression (family:“binomial”) on our raw dataset, predicting the binomial value of looking to the Animate or not (which is either a 1 or 0). This model estimates the log-odds (i.e., probability in logit space) of looking at the animal by the fixed and random factors specified, including a factor for TimeFromSubphaseOnset.

Logistic regression solves a problem we are now going to start taking seriously. Until now, we were predicting bounded, non-linear proportional data using a linear model. With logistic regression, and future linear models in which we employ transformations, we are going to correct for this.

When we do logistic regression, we receive our slope estimates in logits. These represent the log-odds of getting a 1 versus a 0 based on some factor. Here’s a plot showing how these relate to proportions (how we typically think of probabilities):

probabilities <- seq(0.1,0.9,by=.1)
log_odds <- log(probabilities/(1-probabilities))
plot(probabilities, log_odds)

You can see that the probability of .8 roughly equals a log-odds or logit of 1.39. You can take the exponent of this number to calculate the odds when the probability is .8…

exp(1.39)

## [1] 4.01485

… and you get roughly 4:1 odds, as you would expect.

If probabilities/proportions are so easily transformed into logits/log-odds, then why bother doing it? Well, because we are doing LINEAR modeling and log-odds are linear but proportions are not. That is, proportions have a restricted range of 0-1 thus causing ceiling and floor effects. Log-odds, on the other hand, do not.

To illustrate: A 2-to-1 increase in odds has a constant, linear effect on the log-odds regardless of where you are in logit space.

log(8/1) - log(4/1)

## [1] 0.6931472

log(16/1) - log(8/1)

## [1] 0.6931472

In contrast, a .1 increase in probabilities has a non-constant, non-linear effect in probability space:

probabilities <- seq(0.1,0.9,by=.1)
odds <- (probabilities/(1-probabilities))
plot(probabilities, odds)

This is why we see a bend at the tails when comparing proportions to log-odds – floor/ceiling effects mean that small differences in proportions correspond to larger differences in actual odds.

Okay, that said, let’s run a logistic regression model:

data$TargetC <- ifelse(data$Target == 'Animal', .5, -.5)
data$TargetC <- scale(data$TargetC, center=T, scale=F)

# note: we are not going to Center TimeFromSubphaseOnset for now

model <- glmer(Animate ~ TargetC*TimeFromSubphaseOnset + (1 | Trial) + (1 | ParticipantName), data = data, family="binomial")

## Warning: Some predictor variables are on very different scales: consider
## rescaling

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00236185
## (tol = 0.001, component 3)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

We are getting some warnings, here. It suggests we should rescale our variables (likely because Target range is -.5 to .5 while TimeFromSubPhaseOnset is 0-5500), and reports a convergence error.

Let’s try fixing the scale issue…

data$TimeS <- data$TimeFromSubphaseOnset / 1000

model <- glmer(Animate ~ TargetC*TimeS + (1 | Trial) + (1 | ParticipantName), data = data, family="binomial")
summary(model)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: Animate ~ TargetC * TimeS + (1 | Trial) + (1 | ParticipantName)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##  51965.4  52018.0 -25976.7  51953.4    47740 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4831 -0.7901  0.3924  0.6192  2.1863 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  ParticipantName (Intercept) 0.22544  0.4748  
##  Trial           (Intercept) 0.02454  0.1567  
## Number of obs: 47746, groups:  ParticipantName, 30; Trial, 6
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.467259   0.110219   13.31   <2e-16 ***
## TargetC        2.112652   0.142677   14.81   <2e-16 ***
## TimeS         -0.263791   0.007131  -36.99   <2e-16 ***
## TargetC:TimeS -0.144949   0.013950  -10.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) TargtC TimeS 
## TargetC     -0.004              
## TimeS       -0.184 -0.072       
## TargetC:TmS -0.047 -0.269  0.197

No errors! We’ll hold off on adding random slopes to this model for now (it will be really slow).

Let’s interpret these effects - what do main effects and interactions mean? Note that we centered and deviation-coded our Target variable but not our Time variable.

Therefore, the TargetC estimate represents the increase in log-odds of looking to the Target between our Animal and Artefact conditions and, because it’s positive, they look more to the animal when it is named than when the artefact is named. Because we did not center TimeS before fitting this model, this effect of TargetC is the model’s estimate at the beginning of the trial. Although we want a main effect, we know from the raw data that we should not be seeing this effect at the beginning of the trial. Instead, it should emerge through the trial as a Time*Target interaction.

The effect of TimeS represents the main effect of looking away from the Animal over time.

The Intercept represents the odds of looking at the animal at Time==0 (regardless of condition).

Visualize the fitted model:

data$prediction <- predict(model, data)

# in logit space...
ggplot(data, aes(x=TimeS, y=prediction, color=Target)) +
                            stat_summary(fun.y='mean', geom='line')

# as proportions...
data$prediction_prob <- exp(data$prediction) / (1 + exp(data$prediction) )
ggplot(data, aes(x=TimeS, y=prediction_prob, color=Target)) +
                            stat_summary(fun.y='mean', geom='line')

At first glance, three things are worth mentioning:

The proportion plot is slightly warped. This represents the warping that occurs in the transformation between probabilities and log-odds.
The model predictions look noisy. Why? Model predictions should be clean and constant given that we only have two factors and an interaction.

The problem is that we generated predictions for our datapoints that included the random effects. So, taking the mean of the predictions included idiosynractic estimates for our participants. To remove these, we can simply pass the re.form=NA parameter to the predict() function:

data$prediction <- predict(model, data, re.form=NA)

ggplot(data, aes(x=TimeS, y=prediction, color=Target)) +
                            stat_summary(fun.y='mean', geom='line')

These are a horrible fit to our data! Let’s convert these predictions to probabilities and then plot them alongside our raw data:

ggplot(data, aes(x=TimeS, y=Animate, color=Target)) +
                  stat_summary(fun.y='mean', geom='line') +
                  stat_summary(aes(y=exp(prediction)/(1+exp(prediction))), fun.y='mean', geom='line')

We fit two straight lines to two curves which clearly change in a non-linear fashion over time. This not only results in this poor visual fit but also, for example, in seeing a main effect of TargetC at timepoint 0 when we know the groups to be very similar at the start of the trial.

Therefore, we need a model that will let us capture this non-linear growth over time, and tell us how our two conditions differ with respect to it (growth curves!).

A note about empirical logits, arc-sine roots, and other transformations

Sometimes you just need to aggregate your data in a way where you must predict proportions even though your raw measure was binomial (i.e., logistic) in nature. For time analyses of eye-tracking data, this is especially common. If you aggregate across trials/items/subjects in any way, you are going to lose the sample-by-sample binomial responses collected from subjects.

When this occurs, you’ll want to implement a correction which takes the proportional data as and returns a linear-corrected DV as output.

Here we’ll walk through a couple of transformations, after binning our data by subjects across time.

# bin data into 50ms bins
data$Bin <- data$TimeFromSubphaseOnset %/% 50
# equivalent: data$Bin <- floor(data$TimeFromSubphaseOnset / 50)

head(data[, c('TimeFromSubphaseOnset','Bin')])

##   TimeFromSubphaseOnset Bin
## 1                     0   0
## 2                    17   0
## 3                    33   0
## 4                    50   1
## 5                    67   1
## 6                    83   1

Calculate the proportion of looking to the animal (PropAnimal), as well as the number of samples looking to the Animal (y), and the total number of samples (N), and the start of the Bin in ms (Time).

Of course, we’ll lose the Trial grouping variable/random effect, because we aggregate across trials.

binned <- data %>%
          group_by(ParticipantName,Target,Bin) %>%
          summarise(PropAnimal = mean(Animate), y=sum(Animate), N=length(Animate), TimeS=min(TimeS))

One transformation that will work for us is called the empirical logit (calculated as the log-odds with a constant value to keep from getting -Infinity):

binned$elog <- log( (binned$y + .5) / (binned$N - binned$y + .5) )

Fit an unweighted empirical logit model:

model <- lmer(elog ~ Target*TimeS + (1 | ParticipantName), data = binned)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: elog ~ Target * TimeS + (1 | ParticipantName)
##    Data: binned
## 
## REML criterion at convergence: 24967
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.60965 -0.89280  0.07857  0.78797  2.41023 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  ParticipantName (Intercept) 0.2608   0.5107  
##  Residual                    2.6410   1.6251  
## Number of obs: 6526, groups:  ParticipantName, 30
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)           2.41017    0.10895  22.121
## TargetArtefact       -2.34292    0.07963 -29.423
## TimeS                -0.32461    0.01790 -18.139
## TargetArtefact:TimeS  0.15459    0.02523   6.128
## 
## Correlation of Fixed Effects:
##             (Intr) TrgtAr TimeS 
## TargtArtfct -0.366              
## TimeS       -0.446  0.610       
## TrgtArtf:TS  0.316 -0.863 -0.709

Notice the empirical logit estimates are very similar to our actual logits from the previous logistic model.

We can also account for Target and TimeS random slopes by Participant. Accounting for Time as a random effect is always recommended – lots of individual noise here.

model <- lmer(elog ~ Target*TimeS + (1 + Target + TimeS | ParticipantName), data = binned)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: elog ~ Target * TimeS + (1 + Target + TimeS | ParticipantName)
##    Data: binned
## 
## REML criterion at convergence: 23919.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.92264 -0.79136  0.08027  0.73641  2.52996 
## 
## Random effects:
##  Groups          Name           Variance Std.Dev. Corr       
##  ParticipantName (Intercept)    0.77181  0.8785              
##                  TargetArtefact 1.16611  1.0799   -0.55      
##                  TimeS          0.06879  0.2623   -0.63  0.00
##  Residual                       2.18646  1.4787              
## Number of obs: 6526, groups:  ParticipantName, 30
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)           2.39639    0.16842  14.228
## TargetArtefact       -2.33756    0.21006 -11.128
## TimeS                -0.31821    0.05061  -6.288
## TargetArtefact:TimeS  0.15481    0.02302   6.725
## 
## Correlation of Fixed Effects:
##             (Intr) TrgtAr TimeS 
## TargtArtfct -0.565              
## TimeS       -0.657  0.070       
## TrgtArtf:TS  0.186 -0.297 -0.230

Another transformation we can use, beside the empirical logit, is the Arcsine-root transformation. Just like logits and empirical logits, it “stretches out” the ceiling and floor of the distribution, transforming the natural logistic curve inherent to probabilities to be appropriate for linear models.

asin(sqrt(.5))

## [1] 0.7853982

asin(sqrt(.6))

## [1] 0.8860771

asin(sqrt(.95))

## [1] 1.345283

asin(sqrt(.99))

## [1] 1.470629

plot(seq(.0,1,by=.1),asin(sqrt(seq(0,1,by=.1))))

binned$Arcsin <- asin(sqrt(binned$PropAnimal))
head(binned)

## Source: local data frame [6 x 9]
## Groups: ParticipantName, Target
## 
##   ParticipantName Target Bin PropAnimal y N TimeS    elog   Arcsin
## 1        ANCAT139 Animal   1          1 3 3  0.05 1.94591 1.570796
## 2        ANCAT139 Animal   2          1 3 3  0.10 1.94591 1.570796
## 3        ANCAT139 Animal   3          1 3 3  0.15 1.94591 1.570796
## 4        ANCAT139 Animal   4          1 3 3  0.20 1.94591 1.570796
## 5        ANCAT139 Animal   5          1 3 3  0.25 1.94591 1.570796
## 6        ANCAT139 Animal   6          1 3 3  0.30 1.94591 1.570796

model <- lmer(Arcsin ~ Target*TimeS + (1 + Target + TimeS | ParticipantName), data = binned)
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Arcsin ~ Target * TimeS + (1 + Target + TimeS | ParticipantName)
##    Data: binned
## 
## REML criterion at convergence: 8826.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3571 -0.7801  0.0671  0.7049  2.7309 
## 
## Random effects:
##  Groups          Name           Variance Std.Dev. Corr       
##  ParticipantName (Intercept)    0.062215 0.24943             
##                  TargetArtefact 0.108536 0.32945  -0.48      
##                  TimeS          0.007467 0.08641  -0.68  0.00
##  Residual                       0.216231 0.46501             
## Number of obs: 6526, groups:  ParticipantName, 30
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)           1.39616    0.04832  28.894
## TargetArtefact       -0.58691    0.06432  -9.124
## TimeS                -0.07647    0.01660  -4.608
## TargetArtefact:TimeS  0.01971    0.00724   2.723
## 
## Correlation of Fixed Effects:
##             (Intr) TrgtAr TimeS 
## TargtArtfct -0.507              
## TimeS       -0.699  0.066       
## TrgtArtf:TS  0.204 -0.305 -0.220

Clean up our workspace.

ls()

##  [1] "agg_items"                   "agg_sub_items"              
##  [3] "agg_subjects"                "binned"                     
##  [5] "data"                        "devFunOnly.lmerTest.private"
##  [7] "l.lmerTest.private.contrast" "log_model"                  
##  [9] "log_null"                    "log_odds"                   
## [11] "model"                       "model_null"                 
## [13] "model1"                      "model2"                     
## [15] "odds"                        "probabilities"              
## [17] "reml.lmerTest.private"       "subject_slopes"             
## [19] "ts"

rm(list=ls())

R Growth Curve Analysis & Eyetracking Workshop: Tutorial 3: Generalized Linear Models

Brock Ferguson

July 1, 2015