Ensuring R Generates the Same ANOVA F-values as SPSS

When switching to R from SPSS a common concern among psychology researchers is that R gives the "correct" ANOVA F-values. By "correct" they simply mean F-values that match those generated by SPSS. Because ANOVA F-values in R do not match those in SPSS by default it often appears that R is "doing something wrong". This is not the case. R simply has a different default configuration than SPSS.

The nature of the differences between SPSS and R becomes evident when there are an unequal number of participants across factorial ANOVA cells. There are a few simple steps that can be followed to ensure that R ANOVA values do indeed match those generated by SPSS. These steps involves using Type-III sums of squares for the ANOVA but there is more to it than that. I will detail the complete process in R here but a deeper discussion of the related statistical issues is provided in the excellent free e-book, Learning Statistics Using R by Dan Navarro
 

Initial R Data

> my.data <- read.csv("goggles.csv")
> my.data
   gender alcohol attractiveness
1       1       1             65
2       1       1             70
3       1       1             60
4       1       1             60
5       1       1             60
6       1       1             55
7       1       1             60
8       1       1             55
9       1       2             70
10      1       2             65
11      1       2             60
12      1       2             70
13      1       2             65
14      1       2             60
15      1       2             60
16      1       2             50
17      1       3             55
18      1       3             65
19      1       3             70
20      1       3             55
21      1       3             55
22      1       3             60
23      1       3             50
24      1       3             50
25      2       1             50
26      2       1             55
27      2       1             80
28      2       1             65
29      2       1             70
30      2       1             75
31      2       1             75
32      2       1             65
33      2       2             45
34      2       2             60
35      2       2             85
36      2       2             65
37      2       2             70
38      2       2             70
39      2       2             80
40      2       2             60
41      2       3             30
42      2       3             30
43      2       3             30
44      2       3             55
45      2       3             35
46      2       3             20
47      2       3             45
48      2       3             40

SPSS Analysis:  The numbers below are the one's we desire:

You can see the F-values for gender, alcohol, and the interaction are 2.0232, 20.065, and 11.911, respectively.

Outline of R Steps

There are three things you need to do to ensure ANOVA F-values in R match those in SPSS. I will briefly list these three steps and then provide a more details description of each.

1. Set each independent variable as a factor
2. Set the default contrast to helmert
3. Conduct analysis using Type III Sums of Squares

Step 1. Set each independent variable as a factor

By default R assumes variables are not categorical. If you have a categorical variable (as you do with ANOVA independent variables) you need to indicate to R the nature of the variables; you do this with the as.factor function. In the example below I work with a goggles data set (from Discovering Statistics Using SPSS) that investigates the effect of alcohol consumption (None,2-pints, 4-pints) and gender (male/female) or attractiveness ratings. The categorial variables have been entered into the data file numerically such that for gender 1 is Female and 2 is Male. Likewise, for alcohol 1 is None, 2 is two pints, 3 is four pints. Before running the ANOVA I need to let R know that gender and alcohol are factors and what the levels of those factors are labeled.

# Set the variables to factors
> my.data$gender <- as.factor(my.data$gender)
> my.data$alcohol <- as.factor(my.data$alcohol)

# Label the levels of each factor
> levels(my.data$gender) <- list("Female"=1,"Male"=2)
> levels(my.data$alcohol) <- list("None"=1,"2-pints"=2,"4-pints"=3)

Step 2. Set the default contrast to helmert

When an ANOVA is conducted in R it's done using the general linear model. Consequently, the contrasts need to specified in the same way as SPSS if the values are to match. 

You can see the default contrasts in R with the command belowL

> options("contrasts")
$contrasts
        unordered           ordered 
"contr.treatment"      "contr.poly" 

We need to change the default contrast for unordered factors from "cont.treatment" to "contr.helmert". We do this with the command below:

> options(contrasts = c("contr.helmert", "contr.poly"))

You can verify that the contrast has changed by using the options command again:

> options("contrasts")
$contrasts
[1] "contr.helmert" "contr.poly"   

Step 3. Conduct Analysis Using Type III Sums of Squares

Conduct your analysis:

> crf.lm <- lm(attractiveness~gender*alcohol,data=my.data)

Now you want traditional ANOVA statistics using using Type III Sums of Squares. These can be provided by the car package (car: Companion to Applied Regression). The first time (and only the first time) you use the car package you need to install it. The package give you the "Anova" function; note the capitalization in this function name is critical.

> install.packages("car",dependencies = TRUE)

Once the package is installed you only need the code below:

> crf.lm <- lm(attractiveness~gender*alcohol,data=my.data)
> library(car)
> Anova(crf.lm,type=3)
Anova Table (Type III tests)

Response: attractiveness
               Sum Sq Df   F value    Pr(>F)    
(Intercept)    163333  1 1967.0251 < 2.2e-16 ***
gender            169  1    2.0323    0.1614    
alcohol          3332  2   20.0654 7.649e-07 ***
gender:alcohol   1978  2   11.9113 7.987e-05 ***
Residuals        3488 42                        
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You can see the F-values for gender, alcohol, and the interaction are 2.0232, 20.065, and 11.911, respectively. These match the SPSS values presented above. 

Quick Summary

> my.data <- read.csv("goggles.csv")

> my.data$gender <- as.factor(my.data$gender)
> my.data$alcohol <- as.factor(my.data$alcohol)
> levels(my.data$gender) <- list("Female"=1,"Male"=2)
> levels(my.data$alcohol) <- list("None"=1,"2-pints"=2,"4-pints"=3)

> options(contrasts = c("contr.helmert", "contr.poly"))

> crf.lm <- lm(attractiveness~gender*alcohol,data=my.data)
> library(car)
> Anova(crf.lm,type=3)

Anova Table (Type III tests)

Response: attractiveness
               Sum Sq Df   F value    Pr(>F)    
(Intercept)    163333  1 1967.0251 < 2.2e-16 ***
gender            169  1    2.0323    0.1614    
alcohol          3332  2   20.0654 7.649e-07 ***
gender:alcohol   1978  2   11.9113 7.987e-05 ***
Residuals        3488 42                        
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1







Reading SPSS Data into R with Haven

When psychology researchers switch from SPSS to R a common first question is "Can I load SPSS data in R?". The answer is yes, and it's now easier than ever thanks to the Haven package which both reads and writes SPSS files. Previously, you might have used the foreign library and the read.spss command - I don't recommend this approach. Currently, the Haven package represents your best bet for quickly and accurately loading SPSS data. The Haven package is written by Hadley Wickham (of ggplot2 fame) and based on Evan Miller’s ReadStat. Moreover, it also reads Stata and SAS files.

As with any R package Haven is easily installed the first time you use it:

install.packages("haven")

For every R session in which you use the Haven package you need to activate it using the library command. As well, when you load a file using the Haven package, recognize that it will look for the file in R's working directory. You can set working directory using the menus in R or RStudio. The example below illustrates how to load SPSS data from R's working directory. I load the goggles data from Discovering Statistics Using SPSS. The lines below activate the Haven package and then read the "goggles.sav" file into a data frame called "my.data".

library(haven)
my.data <- read_spss("goggles.sav")

If working directories are confusing for you, you might prefer to use the slightly longer command below that brings up a window which you can use to select the data file you want to load. This is much easier to use, but slightly longer to type. A down side of this approach is that you need slightly different commands depending on if you are an OSX or Windows user. 

On OSX the R commands for loadings SPSS data using a file selector window are:

library(haven)
my.data <- read_spss(file.choose())

On Windows the R commands for loadings SPSS data using a file selector window are:

library(haven)
my.data <- read_spss(choose.files())

That's it! Now you know how to load SPSS data into R.  

If you use OSX, I one additional tip for you. You might want to use the consider getting Text Expander (also see update below). This software allows you to use  keystroke shortcuts in OSX applications. Consequently, you can set up a keystroke shortcut such that when you're in R or RStudio and you type ";load" it automatically replaced by the two lines above (you can use any shortcut you define instead of ";load" I suggest here). This is a quick and easy way to load files and takes much of the hassle out of this step.

Update: August 28, 2015
@hadleywickham pointed out to me that RStudio also has built in code snippets that work similar to Text Expander (thanks!).  You can read more about them: here. I gave them a quick try this morning and they work very well. It appears they work in the script window of RStudio but not the console window - which isn't an issue for me since I script everything. Hopefully you'll find this a helpful feature.