9 Power
Set up our session. At minimum, make sure you have tidyverse
, lterdatasampler
, palmerpenguins
and rstatix
in your “setup.R” script.
This week’s lesson is related to power in statistical analysis.
9.0.0.1 Recommended pre-work:
Watch this video on statistical power: https://www.youtube.com/watch?v=Rsc5znwR5FA
Review the following null hypotheses for all of the tests we have used to far:
The null hypotheses for each test we have learned thus far:
Levene test - Variances across groups are equal (homogeneity of variances).
Shapiro-Wilk test - Data is drawn from a normal distribution.
t-test - Means of the two groups are equal.
ANOVA - Means of all groups are equal.
Correlation - There is no correlation between the two variables.
Simple linear regression - There is no linear relationship between the predictor and response variables.
Multiple linear regression
Overall model - All regression coefficients equal zero (i.e., there is no effect from predictors).
Individual predictors - An individual regression coefficient equals zero (no effect for a specific predictor).
Power in statistics is a measure of how effective a statistical test is at finding genuine trends/effects in your data. It tells you the likelihood that the test will correctly identify a real relationship or effect when one actually exists. In other words, power is the test’s ability to avoid missing important discoveries, an essential aspect of the reliability and accuracy of statistical analyses. Power can be influenced by the following factors:
Sample size - Larger samples provide more power.
Effect size - Larger or more dramatic effects are easier to detect.
Power is also related to our p-values: if we want our test’s conclusions to hold greater significance, we require more power.
Significance in statistics means how important or reliable a finding is. p-values help quantify this by telling us how likely it is that our results occurred by chance; smaller p-values indicate greater significance, suggesting that our findings are less likely due to chance.
9.0.1 Power in action
Let’s perform a t-test analysis on our Palmer penguins data set, where we compare bill length across two penguins species, Gentoo and Adelie.
data("penguins")
# perform the t-test
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
# species must be re-formatted from a factor to a character (t-test doesn't like factors)
mutate(species = as.character(species)) %>%
t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p df conf.low
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 -8.71 38.8 47.5 bill_length_mm Adelie Gentoo 151 123 -24.7 3.09e-68 243. -9.41
## # ℹ 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>
9.0.1.1 Sample size
In our analysis above, we find that bill length is observed to be significantly higher (p-value < 0.05) in Gentoo penguins when compared to Adelie penguins. But, what if we didn’t have as many observations to perform our test on? Let’s cut our data set down to just two observations per species:
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
mutate(species = as.character(species)) %>%
drop_na(bill_length_mm) %>%
group_by(species) %>%
slice_sample(n = 2) %>%
# we need to upgroup our data frame for statistical tests
ungroup() %>%
t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p df conf.low conf.high
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -7.9 38.1 46 bill_l… Adelie Gentoo 2 2 -2.13 0.222 1.38 -33.3 17.5
## # ℹ 2 more variables: method <chr>, alternative <chr>
Compared to the p-value in the previous test, this p-value is much larger and, depending on which random samples were chosen in slice_sample
, likely higher than our cut-off of 0.05 and would lead to an insignificant difference in bill length between the two species.
Therefore, as sample size decreases, so does our power in identifying true trends.
9.0.1.2 Magnitude of effect
Let’s look at a histogram of our bill lengths across our two penguin species. Note the last two lines of code I am adding a vertical line at the mean of each group, which we got from the output of the t-test.
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
ggplot() +
geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.50) +
geom_vline(xintercept = 38.8, col = "black") +
geom_vline(xintercept = 47.5, col = "black")
In the grand scheme of things the difference between these two populations is relatively small in magnitude: their histograms overlap, indicating that some penguins in both species often have similar bill lengths. But, what if the bills of all of the Gentoo penguins magically grew an extra 15 mm?
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
#for all Gentoo species increase bill length by 15 mm
mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>%
ggplot() +
geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.56) +
geom_vline(xintercept = 38.8, col = "black") +
geom_vline(xintercept = 62.5, col = "black") # I found this from the new mutated values separately
And now perform the t-test on this new magical data:
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
# still need to convert species to a character for the t_test function
mutate(species = as.character(species)) %>%
drop_na(bill_length_mm) %>%
#if species == Gentoo, add 15 to bill length..
mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15,
# otherwise, keep it as is.
bill_length_mm)) %>%
t_test(bill_length_mm ~ species,
var.equal = FALSE,
detailed = TRUE)
## # A tibble: 1 × 15
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p df conf.low
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 -23.7 38.8 62.5 bill_length_… Adelie Gentoo 151 123 -67.3 6.46e-159 243. -24.4
## # ℹ 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>
… when the magnitude of the difference between our observations in our groups increases, our p-value decreases. And, when you have a greater difference between populations, the number of observations required to identify significant differences generally does not have to be so high.
Lets run the above code again but keep just two observations from each group and note the p-value:
penguins %>%
filter(species %in% c("Adelie", "Gentoo")) %>%
mutate(species = as.character(species)) %>%
mutate(bill_length_mm = ifelse(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>%
# remove NA values for weight
drop_na(bill_length_mm) %>%
group_by(species) %>%
# select 2 random observations from each species:
slice_sample(n = 2) %>%
ungroup() %>%
t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p df conf.low conf.high
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -24.2 40.1 64.3 bill_l… Adelie Gentoo 2 2 -4.15 0.147 1.02 -94.7 46.3
## # ℹ 2 more variables: method <chr>, alternative <chr>
9.0.2 Note about random selection
If you are playing around with this code on your own, you may notice that the results of each test look different than what’s written in the lesson plan: this is because we are randomly selecting a subset from our penguins populations with the slice_sample()
function. In fact, you may have encountered p-values that lead to different conclusions for you (especially those with low sample sizes). The way we select samples from a population significantly impacts our statistical results and the statistical power of our tests. If our samples are representative of the population and sufficiently large, our findings are more likely to accurately reflect reality. However, if our samples are small or not truly representative, our results may be less reliable, and our tests may lack the ability to detect real effects.
9.1 Assignment
Let’s re-explore the difference in weight of cutthroat trout in clear cut (CC) and old growth (OG) forest types from our T-test and ANOVA lesson. We want to see how sample size affects our ability to detect this difference. Therefore, our research question is: “Is there a significant difference in weight between old growth and clear cut forest types?” We will set our significance level at 0.05 (i.e., our test’s p-value must be below 0.05 for us to reject the null hypothesis).
First load in your data (from the lterdatasampler
package) and create a new variable trout
for just the trout species:
data(and_vertebrates)
trout <-
and_vertebrates %>%
#filter species (remember spelling and capitalization are IMPORTANT)
filter(species == "Cutthroat trout") %>%
# remove NA values for weight, the variable we will be measuring
drop_na(weight_g)
Next, we will select a random set of trout observations at both forest types across four different sample sizes: 5, 10, 1000, 5000. Here, I have created the first object of 5 observations per forest type for you:
trout_5 <- trout %>%
#group by section to pull observations from each group
group_by(section) %>%
slice_sample(n = 5) %>%
# ungroup the data frame to perform the statistical test
ungroup() %>%
t_test(weight_g ~ section, var.equal = FALSE, detailed = TRUE)
1. Write a function called trout_subber
that takes a user-selected number of random observations (the thing that changes) from our trout
data frame across both forest types (i.e., section) (5 pts.)
HINTS: the number of observations you want to subset to will be an argument of the function. The code I’ve written above can be used as the basis of the function. You can refer back to the Write Functions primer lessons for guidance, specifically How to write a function.
2. Build upon the previous function by adding an additional step to perform a t-test on the data set at the end, and to return the results of that t-test. (NOTE: for simplicity, use the non-parametric t-test across all sub sets). (5 pts.)
3. Map over the function above, using our sample sizes of interest (i.e., 5, 10, 1000, 5000 per forest type). Repeat the process 100 times for each sample size to account for variability. The final outcome of this exercise should be a single data frame with 400 rows that includes all of our t-test summaries stacked on top of each other. (5 pts.)
HINTS: Make use of the rep()
function to create your vector of numbers to map over…
4. Using the data frame created in exercise 3, make a histogram of p-values for each sample size group (HINT: see what column name in your final data frame you should facet by). Make note of how the p-values and their variance change with sample size. (5 pts.)
9.2 Citations
Data Source: Gregory, S.V. and I. Arismendi. 2020. Aquatic Vertebrate Population Study in Mack Creek, Andrews Experimental Forest, 1987 to present ver 14. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c78d662e847cdbe33584add8f809165
Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.