Cohen’s in simple linear regression models
Cohen’s
Consider a scenario with two conditions, control and treatment and some continuous outcome (e.g., weight). Cohen’s
where
with
The definition above may not look terribly transparent, but conceptually
Here’s an example to illustrate this. We use data from Annette Dobson (1990), An Introduction to Generalized Linear Models (p. 9, plant weight data).
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels=c("Ctl", "Trt")) weight <- c(ctl, trt) n <- length(weight)
The by-conditions means and the plot below show that the average weight was somewhat lower in the treatment condition than in the control condition.
tapply(weight, group, mean)
Ctl Trt 5.032 4.661
The standard deviation in the treatment condition (red) was also higher than in the control condition (blue).

Next, we fit a simple linear model to estimate this effect.
lm(weight ~ group) -> m summary(m)
Call: lm(formula = weight ~ group) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-15 *** groupTrt -0.3710 0.3114 -1.191 0.249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158 F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
With the default treatment coding, the intercept of this model (groupTrt
(
Is this effect small or large? Let’s calculate Cohen’s
The approach below is going to look a bit different from the conventional definition of Cohen’s groupTrt
, the second coefficient of the model fit above).1
coef(m)[2] -> b # Coefficient of interest model.matrix(m)[,2] -> x # Corresponding column of the design matrix weight - b * x -> y # The data minus the effect of interest
The variable y
now contains the data but without the effect of the treatment. Next, we calculate the pooled standard deviation. The recipe is the same as for the ordinary standard deviation, except that we divide by
We’ll do this in two steps: First, we calculate the sum of squared errors (the right part under the square root), and then the pooled standard deviation:
sum((y - mean(y))^2) -> sse sqrt(sse / (n-2)) -> sd.pooled sd.pooled
[1] 0.6963895
Now, Cohen’s
b / sd.pooled -> d d
groupTrt -0.5327478
The result,
While the procedure above produces the correct result, there is an alternative way to calculate Cohen’s
sum(residuals(m)^2) -> sse sqrt(sse / (n-2)) -> sd.pooled sd.pooled
[1] 0.6963895
The result is exactly the same as before. What this illustrates is that, in this particular scenario (single regression), Cohen’s
Note, however, that the approach using residuals won’t work in a multiple regression setting where the residual variance is going to be smaller due to the other predictors present in the model. Nevertheless, I hope that this way of thinking about Cohen’s
Finally, a word of caution: The code above is intended just for illustration purposes. For serious uses, I recommend existing R packages such as the effsize
package. There are also different ways to calculate Cohen’s
Footnotes:
For consistency with the R code, we start counting betas at 1, not 0.