9 Power

Time to set up our session. There are no new packages this week, but double check you have tidyverse, lterdatasampler, palmerpenguins and rstatix in your “setup.R” script.

source("setup.R")

This week’s lesson is related to power in statistical analysis.

9.0.0.1 Recommended pre-work:

Watch this video on statistical power: https://www.youtube.com/watch?v=Rsc5znwR5FA
Review the following null hypotheses for all of the tests we have used to far:

The null hypotheses for each test we have learned thus far:

Levene test - Variances across groups are equal (homogeneity of variances).

Shapiro-Wilk test - Data is drawn from a normal distribution.

t-test - Means of the two groups are equal.

ANOVA - Means of all groups are equal.

Correlation - There is no correlation between the two variables.

Simple linear regression - There is no linear relationship between the predictor and response variables.

Multiple linear regression

Overall model - All regression coefficients equal zero (i.e., there is no effect from predictors).
Individual predictors - An individual regression coefficient equals zero (no effect for a specific predictor).

Power in statistics is a measure of how effective a statistical test is at finding genuine trends/effects in your data. It tells you the likelihood that the test will correctly identify a real relationship or effect when one actually exists. In other words, power is the test’s ability to avoid missing important discoveries, an essential aspect of the reliability and accuracy of statistical analyses. Power can be influenced by the following factors:

Sample size - Larger samples provide more power.
Effect size - Larger or more dramatic effects are easier to detect.

Power is also related to our p-values: if we want our test’s conclusions to hold greater significance, we require more power.

Significance in statistics means how important or reliable a finding is. p-values help quantify this by telling us how likely it is that our results occurred by chance; smaller p-values indicate greater significance, suggesting that our findings are less likely due to chance.

9.0.1 Power in action

Let’s perform a t-test analysis on our Palmer penguins data set, where we compare bill length across two penguins species, Gentoo and Adelie.

data("penguins")

# perform the t-test
penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  # species must be re-formatted from a factor to a character (t-test doesn't like factors)
  mutate(species = as.character(species)) %>% 
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.           group1 group2    n1    n2 statistic
## *    <dbl>     <dbl>     <dbl> <chr>         <chr>  <chr>  <int> <int>     <dbl>
## 1    -8.71      38.8      47.5 bill_length_… Adelie Gentoo   151   123     -24.7
## # ℹ 6 more variables: p <dbl>, df <dbl>, conf.low <dbl>, conf.high <dbl>,
## #   method <chr>, alternative <chr>

9.0.1.1 Sample size

In our analysis above, we find that bill length is observed to be significantly higher (p-value < 0.05) in Gentoo penguins when compared to Adelie penguins. But, what if we didn’t have as many observations to perform our test on? Let’s cut our data set down to just two observations per species:

penguins %>%   
  filter(species %in% c("Adelie", "Gentoo")) %>%   
  mutate(species = as.character(species)) %>%    
  drop_na(bill_length_mm) %>%   
  group_by(species) %>%   
  slice_sample(n = 2) %>% 
  # we need to upgroup our data frame for statistical tests
  ungroup() %>%   
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
## *    <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -10.5      39.6      50.1 bill_l… Adelie Gentoo     2     2     -4.24 0.126
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>

Compared to the p-value in the previous test, this p-value is much larger and, depending on which random samples were chosen in slice_sample, likely higher than our cut-off of 0.05 and would lead to an insignificant difference in bill length between the two species.

Therefore, as sample size decreases, so does our power in identifying true trends.

9.0.1.2 Magnitude of effect

Let’s look at a histogram of our bill lengths across our two penguin species. Note the last two lines of code I am adding a vertical line at the mean of each group, which we got from the output of the t-test.

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  ggplot() +
  geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.50) +
  geom_vline(xintercept =  38.8, col = "black") +
  geom_vline(xintercept = 47.5, col = "black")

In the grand scheme of things the difference between these two populations is relatively small in magnitude: their histograms overlap, indicating that some penguins in both species often have similar bill lengths. But, what if the bills of all of the Gentoo penguins magically grew an extra 15 mm?

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  #for all Gentoo species increase bill length by 15 mm
  mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>% 
  ggplot() + 
  geom_histogram(aes(x = bill_length_mm, fill = species), alpha = 0.56) +
  geom_vline(xintercept =  38.8, col = "black") +
  geom_vline(xintercept = 62.5, col = "black") # I found this from the new mutated values separately

And now perform the t-test on this new magical data:

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  # still need to convert species to a character for the t_test function
  mutate(species = as.character(species)) %>%
  drop_na(bill_length_mm) %>%
  #if species == Gentoo, add 15 to bill length..
  mutate(bill_length_mm = if_else(species == "Gentoo", bill_length_mm + 15,
                                 # otherwise, keep it as is.
                                 bill_length_mm)) %>%
  t_test(bill_length_mm ~ species,
         var.equal = FALSE,
         detailed =   TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.           group1 group2    n1    n2 statistic
## *    <dbl>     <dbl>     <dbl> <chr>         <chr>  <chr>  <int> <int>     <dbl>
## 1    -23.7      38.8      62.5 bill_length_… Adelie Gentoo   151   123     -67.3
## # ℹ 6 more variables: p <dbl>, df <dbl>, conf.low <dbl>, conf.high <dbl>,
## #   method <chr>, alternative <chr>

… when the magnitude of the difference between our observations in our groups increases, our p-value decreases. And, when you have a greater difference between populations, the number of observations required to identify significant differences generally does not have to be so high.

Lets run the above code again but keep just two observations from each group and note the p-value:

penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %>%
  mutate(species = as.character(species)) %>% 
  mutate(bill_length_mm = ifelse(species == "Gentoo", bill_length_mm + 15, bill_length_mm)) %>%
  # remove NA values for weight
  drop_na(bill_length_mm) %>%
  group_by(species) %>%
  # select 2 random observations from each species:
  slice_sample(n = 2) %>%
  ungroup() %>%
  t_test(bill_length_mm ~ species, var.equal = FALSE, detailed = TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.   group1 group2    n1    n2 statistic       p
## *    <dbl>     <dbl>     <dbl> <chr> <chr>  <chr>  <int> <int>     <dbl>   <dbl>
## 1    -20.8      40.0      60.8 bill… Adelie Gentoo     2     2     -15.5 0.00877
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>

9.0.2 Note about random selection

If you are playing around with this code on your own, you may notice that the results of each test look different than what’s written in the lesson plan: this is because we are randomly selecting a subset from our penguins populations with the slice_sample() function. In fact, you may have encountered p-values that lead to different conclusions for you (especially those with low sample sizes). The way we select samples from a population significantly impacts our statistical results and the statistical power of our tests. If our samples are representative of the population and sufficiently large, our findings are more likely to accurately reflect reality. However, if our samples are small or not truly representative, our results may be less reliable, and our tests may lack the ability to detect real effects.

9.1 Assignment

Let’s re-explore the difference in weight of cutthroat trout in clear cut (CC) and old growth (OG) forest types from our T-test and ANOVA lesson. We want to see how sample size affects our ability to detect this difference. Therefore, our research question is: “Is there a significant difference in weight between old growth and clear cut forest types?” We will set our significance level at 0.05 (i.e., our test’s p-value must be below 0.05 for us to reject the null hypothesis).

First load in your data (from the lterdatasampler package) and create a new variable trout for just the trout species:

data(and_vertebrates)

trout <- 
  and_vertebrates %>%
  #filter species (remember spelling and capitalization are IMPORTANT)
  filter(species == "Cutthroat trout") %>%
  # remove NA values for weight, the variable we will be measuring
  drop_na(weight_g)

Next, we will select a random set of trout observations at both forest types across four different sample sizes: 5, 10, 1000, 5000.

Below is some template code for you to use to build your function, showing how to subset the data to 5 observations per forest type:

trout_5 <- trout %>% 
  #group by section to pull observations from each group
  group_by(section) %>%
  slice_sample(n = 5) %>%
  # ungroup the data frame to perform the statistical test
  ungroup() %>%
  t_test(weight_g ~ section, var.equal = FALSE, detailed = TRUE)

1. Write a function called trout_subber that returns a user-selected number of random observations (the thing that changes) from our trout data frame across both forest types (i.e., section) (5 pts.)

HINTS: the number of observations you want to subset to will be an argument of the function. The code above can be used as the basis of the function. You can refer back to the Write Functions primer lessons for guidance, specifically How to write a function.

2. Build upon the previous function by adding an additional step to perform a t-test on the data set at the end, so the function returns the results of the t-test. (NOTE: for simplicity, use the non-parametric t-test across all sub sets). (5 pts.)

3. Map over the function above, using our sample sizes of interest (i.e., 5, 10, 1000, 5000 per forest type). Repeat the process 100 times for each sample size to account for variability. The final output of this exercise should be a single data frame with 400 rows (one row for each t-test summary). (5 pts.)

HINTS: Make use of the rep() function to create your vector of numbers to map over…

4. Using the data frame created in exercise 3, make a histogram of p-values for each sample size group (HINT: see what column name in your final data frame you should use to facet by). Make note of how the p-values and their variance change with sample size. (5 pts.)

9.2 Citations

Data Source: Gregory, S.V. and I. Arismendi. 2020. Aquatic Vertebrate Population Study in Mack Creek, Andrews Experimental Forest, 1987 to present ver 14. Environmental Data Initiative. https://doi.org/10.6073/pasta/7c78d662e847cdbe33584add8f809165

Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.