9 Power

Set up our session. At minimum, make sure you have tidyverse, lterdatasampler, palmerpenguins and rstatix in your “setup.R” script.

source("setup.R")

This week’s lesson is related to power in statistical analysis.

9.0.1 Power in action

Let’s perform a t-test analysis on our Palmer penguins data set, where we compare bill length across two penguins species, Gentoo and Adelie.

data("penguins")

# perform the t-test
penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  # species must be re-formatted from a factor to a character (t-test doesn't like factors)
  mutate(species = as.character(species)) %>% 
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.            group1 group2    n1    n2 statistic        p    df conf.low
## *    <dbl>     <dbl>     <dbl> <chr>          <chr>  <chr>  <int> <int>     <dbl>    <dbl> <dbl>    <dbl>
## 1    -8.71      38.8      47.5 bill_length_mm Adelie Gentoo   151   123     -24.7 3.09e-68  243.    -9.41
## # ℹ 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>

9.0.1.1 Sample size

In our analysis above, we find that bill length is observed to be significantly higher (p-value < 0.05) in Gentoo penguins when compared to Adelie penguins. But, what if we didn’t have as many observations to perform our test on? Let’s cut our data set down to just two observations per species:

penguins %>%   
  filter(species %in% c("Adelie", "Gentoo")) %>%   
  mutate(species = as.character(species)) %>%    
  drop_na(bill_length_mm) %>%   
  group_by(species) %>%   
  slice_sample(n = 2) %>% 
  # we need to upgroup our data frame for statistical tests
  ungroup() %>%   
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p    df conf.low conf.high
## *    <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl>    <dbl>     <dbl>
## 1     -7.9      38.1        46 bill_l… Adelie Gentoo     2     2     -2.13 0.222  1.38    -33.3      17.5
## # ℹ 2 more variables: method <chr>, alternative <chr>

Compared to the p-value in the previous test, this p-value is much larger and, depending on which random samples were chosen in slice_sample, likely higher than our cut-off of 0.05 and would lead to an insignificant difference in bill length between the two species.

Therefore, as sample size decreases, so does our power in identifying true trends.

9.0.1.2 Magnitude of effect

Let’s look at a histogram of our bill lengths across our two penguin species. Note the last two lines of code I am adding a vertical line at the mean of each group, which we got from the output of the t-test.

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  ggplot() +
  geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.50) +
  geom_vline(xintercept =  38.8, col = "black") +
  geom_vline(xintercept = 47.5, col = "black")

In the grand scheme of things the difference between these two populations is relatively small in magnitude: their histograms overlap, indicating that some penguins in both species often have similar bill lengths. But, what if the bills of all of the Gentoo penguins magically grew an extra 15 mm?

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  #for all Gentoo species increase bill length by 15 mm
  mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>% 
  ggplot() + 
  geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.56) +
  geom_vline(xintercept =  38.8, col = "black") +
  geom_vline(xintercept = 62.5, col = "black") # I found this from the new mutated values separately

And now perform the t-test on this new magical data:

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  # still need to convert species to a character for the t_test function
  mutate(species = as.character(species)) %>%
  drop_na(bill_length_mm) %>%
  #if species == Gentoo, add 15 to bill length..
  mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15,
                                 # otherwise, keep it as is.
                                 bill_length_mm)) %>%
  t_test(bill_length_mm ~ species,
         var.equal = FALSE,
         detailed =   TRUE)
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.           group1 group2    n1    n2 statistic         p    df conf.low
## *    <dbl>     <dbl>     <dbl> <chr>         <chr>  <chr>  <int> <int>     <dbl>     <dbl> <dbl>    <dbl>
## 1    -23.7      38.8      62.5 bill_length_… Adelie Gentoo   151   123     -67.3 6.46e-159  243.    -24.4
## # ℹ 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>

… when the magnitude of the difference between our observations in our groups increases, our p-value decreases. And, when you have a greater difference between populations, the number of observations required to identify significant differences generally does not have to be so high.

Lets run the above code again but keep just two observations from each group and note the p-value:

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  mutate(species = as.character(species)) %>% 
  mutate(bill_length_mm = ifelse(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>%
  # remove NA values for weight
  drop_na(bill_length_mm) %>%
  group_by(species) %>%
  # select 2 random observations from each species:
  slice_sample(n = 2) %>%
  ungroup() %>%
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p    df conf.low conf.high
## *    <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl>    <dbl>     <dbl>
## 1    -24.2      40.1      64.3 bill_l… Adelie Gentoo     2     2     -4.15 0.147  1.02    -94.7      46.3
## # ℹ 2 more variables: method <chr>, alternative <chr>

9.0.2 Note about random selection

If you are playing around with this code on your own, you may notice that the results of each test look different than what’s written in the lesson plan: this is because we are randomly selecting a subset from our penguins populations with the slice_sample() function. In fact, you may have encountered p-values that lead to different conclusions for you (especially those with low sample sizes). The way we select samples from a population significantly impacts our statistical results and the statistical power of our tests. If our samples are representative of the population and sufficiently large, our findings are more likely to accurately reflect reality. However, if our samples are small or not truly representative, our results may be less reliable, and our tests may lack the ability to detect real effects.

9.1 Assignment

Let’s re-explore the difference in weight of cutthroat trout in clear cut (CC) and old growth (OG) forest types from our T-test and ANOVA lesson. We want to see how sample size affects our ability to detect this difference. Therefore, our research question is: “Is there a significant difference in weight between old growth and clear cut forest types?” We will set our significance level at 0.05 (i.e., our test’s p-value must be below 0.05 for us to reject the null hypothesis).

First load in your data (from the lterdatasampler package) and create a new variable trout for just the trout species:

data(and_vertebrates)

trout <- 
  and_vertebrates %>%
  #filter species (remember spelling and capitalization are IMPORTANT)
  filter(species == "Cutthroat trout") %>%
  # remove NA values for weight, the variable we will be measuring
  drop_na(weight_g) 

Next, we will select a random set of trout observations at both forest types across four different sample sizes: 5, 10, 1000, 5000. Here, I have created the first object of 5 observations per forest type for you:

trout_5 <- trout %>% 
  #group by section to pull observations from each group
  group_by(section) %>%
  slice_sample(n = 5) %>%
  # ungroup the data frame to perform the statistical test
  ungroup() %>%
  t_test(weight_g ~ section, var.equal = FALSE, detailed = TRUE)

1. Write a function called trout_subber that takes a user-selected number of random observations (the thing that changes) from our trout data frame across both forest types (i.e., section) (5 pts.)

HINTS: the number of observations you want to subset to will be an argument of the function. The code I’ve written above can be used as the basis of the function. You can refer back to the Write Functions primer lessons for guidance, specifically How to write a function.


2. Build upon the previous function by adding an additional step to perform a t-test on the data set at the end, and to return the results of that t-test. (NOTE: for simplicity, use the non-parametric t-test across all sub sets). (5 pts.)


3. Map over the function above, using our sample sizes of interest (i.e., 5, 10, 1000, 5000 per forest type). Repeat the process 100 times for each sample size to account for variability. The final outcome of this exercise should be a single data frame with 400 rows that includes all of our t-test summaries stacked on top of each other. (5 pts.)

HINTS: Make use of the rep() function to create your vector of numbers to map over…


4. Using the data frame created in exercise 3, make a histogram of p-values for each sample size group (HINT: see what column name in your final data frame you should facet by). Make note of how the p-values and their variance change with sample size. (5 pts.)


9.2 Citations

Data Source: Gregory, S.V. and I. Arismendi. 2020. Aquatic Vertebrate Population Study in Mack Creek, Andrews Experimental Forest, 1987 to present ver 14. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c78d662e847cdbe33584add8f809165

Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.