Sample Size Calculation in R.

The Why of Sample Size Calculations :

  • In designing an experiment, a key question is : How many individuals/subjects do I need for my experiment ?

  • Too small of a sample size can under detect the effect of interest in our experiment.

  • Too large of a sample size may lead to unnecessary wasting of resources and individuals.

  • We want our sample size to be ‘just right’.

  • The answer: Sample Size Calculation.

  • Goal: We strive to have enough samples to resonably detect if it really is there without wasting limited resources on too many samples.

Key features of Sample Size Calculation :

  • Effect Size: magnitude of the effect under the H1 (alternative). - the larger the effect size, the easier it is to an effect and require fewer samples.

  • Power: Probability of correctly rejecting the H0(null) if it is flse. i.e., (1β), where β= Type-II Error.

  • Significance level(α): Probability of falsely rejecting the null hypothesis even through it is true. i.e., Type-I error.

Effect Size :

  • While Power and Significance level are usually set irrespective of the data, the effect size is a property of the sample data.

  • It is essentially a function of the difference between the means of the null and alternative hypotheses over the variation (standard deviation) in the data.EffectSize|μH1μH0|σ

  • Note that, this sample size can also be calculated from the Confidence interval. But here we are ignoring that technique.

Mathematical Formulas for calculating sample Sazes :

(A) For Estimation :

For Estimation

(B) For testing :

For Proportion

For Mean

For Epidemiology Study Design

For Epidemiology Study Design

Sample Size Calculation in R :

Table of R packages & functions for calculating Sample Size for different tests
Name of testPackageFunction
One Mean T-testpwrpwr.t.test()
Two Means T-testpwrpwr.t.test()
Two Means T-test (unequal Sample)pwrpwr.t2n.test()
Paired T-testpwrpwr.t.test()
One-way ANOVApwrpwr.anova.test()
Single Proportion Testpwrpwr.p.test()
Two Proportions Testpwrpwr.2p.test()
Two Proportion Test (unequal Sample)pwrpwr.2p2n.test()
Chi-Squared Testpwrpwr.chisq.test()
Simple Linear Regressionpwrpwr.f2.test()
Multiple Linear Regressionpwrpwr.f2.test()
Correlationpwrpwr.r.test()
One Mean Wilcoxon Testpwrpwr.t.test()+15%
Mann-Whitney Testpwrpwr.t.test()+15%
Paired Wilcoxon Testpwrpwr.t.test()+15%
Kruskal Wallace Testpwrpwr.anova.test()+15%
Repeated Measures ANOVAWebPowerwp.rmanova()
Multi-way ANOVA (1 Category of interest)WebPowerwp.kanova()
Multi-way ANOVA (>1 Category of interest)WebPowerwp.kanova()
Non-Parametric Regression (Logistic)WebPowerwp.logistic()
Non-Parametric Regression (Poisson)WebPowerwp.poisson
Multilevel modeling: CRTWebPowerwp.crt2arm/wp.crt3arm
Multilevel modeling: MRTWebPowerwp.mrt2arm/wp.mrt3arm

One Mean T-test :

  • Description: This tests if a sample mean is any different from a set value for a normally distributed variable.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
1000YesNo
  • Effect size calculation: EffectSize(D)=|μH1μH0|σ

  • Example:(1) Is the average body temperature of college students any different from 98.6°F?

  • Solution:

    • Here, H0:AvgBodytemp.=98.6°F and H0:AvgBodytemp.98.6°F

    • We will guess that the effect sizes will be medium.

    • For t-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • R Package: pwr Package

    • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”))

      • d= effect size
      • sig.level= significant level
      • power= power of test
      • type= type of test
    • Answer of the problem:

          library(pwr)
          Pwer_t=pwr.t.test(d=0.5, sig.level=0.05, power=0.80, type="one.sample",alternative="two.sided")
          Pwer_t
      ## 
      ##      One-sample t test power calculation 
      ## 
      ##               n = 33.36713
      ##               d = 0.5
      ##       sig.level = 0.05
      ##           power = 0.8
      ##     alternative = two.sided
          print(paste0("Sample Size by rounding off is:",round(Pwer_t$n,0)))
      ## [1] "Sample Size by rounding off is:33"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).

    • (ii) You are interested in determining if the average sleep time change in a year for college freshman is different from zero. You collect the following data of sleep change (in hours).

      VariableValues
      Sleep Change-0.55, 0.16, 2.6, 0.65, -0.23, 0.21, -4.3, 2, -1.7, 1.9
    • (iii) You are interested in determining if the average weight change in a year for college freshman is greater than zero.

  • Solution:

      1. You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).
      • Effect size = (MeanH1MeanH0)/SD=(14,50020,000)/6000=0.917

      • One-tailed test

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.t.test(d=-0.917, sig.level=0.05, power=0.80, type="one.sample", alternative="less")$n,0)))
      ## [1] "The Sample Size is :9"
      1. Effect size =(MeanH1MeanH0)/SD=(0.4460)/1.96=0.228
      • Two-tailed test

      • R Code:

      print(paste("The Sample Size is :",round(pwr.t.test(d=-0.228, sig.level=0.05, power=0.80, type="one.sample", alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is : 153"
      1. Try it by yourself.

Two Means T-test :

  • Description: this tests if a mean from one group is different from the mean of another group for a normally distributed variable. AKA, testing to see if the difference in means is different from zero.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
1121YesNo
  • Effect size calculation: EffectSize(D)=|MeanH1MeanH0|SDpooled

  • Example:(1) : Is the average body temperature higher in women than in men?

  • Solution:

    • Here, H0:AvgdifferenceBodytemp.betweenmenandwomen=0°F and H1:AvgdifferenceBodytemp.betweenmenandwomen>0°F

    • We will guess that the effect sizes will be medium.

    • For t-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • Selected greater, because we only cared to test if women’s temp was higher, not lower (group 1 is women, group 2 is men)

    • R Package: pwr Package

    • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”))

      • d= effect size
      • sig.level= significant level
      • power= power of test
      • type= type of test
    • Answer of the problem:

      print(paste0("The Sample Size is :",round(pwr.t.test(d=0.5, sig.level=0.05, power=0.80,type="two.sample", alternative="greater")$n,0)))
      ## [1] "The Sample Size is :50"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the average daily caloric intake different between men and women. You collected trial data and found the average caloric intake for males to be 2350.2 (SD=258), while females had intake of 1872.4 (SD=420).

    • (ii) You are interested in determining if the average protein level in blood different between men and women. You collected the following trial data on protein level (grams/deciliter).

      Proteinlevels
      Male Protein1.8, 5.8, 7.1, 4.6, 5.5, 2.4, 8.3, 1.2
      Female Protein9.5, 2.6, 3.7, 4.7, 6.4, 8.4, 3.1, 1.4
    • (iii) You are interested in determining if the average glucose level in blood is lower in men than women.

  • Solution:

      1. You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).
      • Effect size = (MeanH1MeanH0)/SDpooled=(2350.21872.4)/(2582+4202)/2=477.8/348.54=1.37

      • two-tailed test

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.t.test(d=1.37, sig.level=0.05, power=0.80, type="two.sample",alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is :9"
      1. Effect size =(MeanH1MeanH0)/SDpooled=(4.594.98)/(2.582+2.882)/2=0.14
      • Two-tailed test

      • R Code:

      print(paste("The Sample Size is :",round(pwr.t.test(d=-0.14, sig.level=0.05, power=0.80, type="two.sample", alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is : 802"
      1. Try it by yourself.

Paired T-test :

  • Description: : This tests if a mean from one group is different from the mean of another group, where the groups are dependent (not independent) for a normally distributed variable. Pairing can be leaves on same branch, siblings, the same individual before and after a trial, etc.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
1121YesYes
  • Effect size calculation: EffectSize(D)=|MeanH1MeanH0|SDpooled

  • Example:(1) Is heart rate higher in patients after a run compared to before a run?

  • Solution:

    • Here, H0:bpm(after)bpm(before)0 and H1:bpm(after)bpm(before)>0

    • We will guess that the effect sizes will be large.

    • For t-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • Selected One-tailed, because we only cared if bpm was higher after a run.

    • R Package: pwr Package

    • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”))

      • d= effect size
      • sig.level= significant level
      • power= power of test
      • type= type of test
    • Answer of the problem:

      print(paste0("The Sample Size is :",round(pwr.t.test(d=0.8, sig.level=0.05, power=0.80, type="paired", alternative="greater")$n,0)))
      ## [1] "The Sample Size is :11"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if metabolic rate in patients after surgery is different from before surgery. You collected trial data and found a mean difference of 0.73 (SD=2.9).

    • (ii) You are interested in determining if heart rate is higher in patients after a doctor’s visit compared to before a visit. You collected the following trial data and found mean heart rate before and after a visit.

      Heart ratelevels
      BPM before126, 88, 53.1, 98.5, 88.3, 82.5, 105, 41.9
      BPM after138.6, 110.1, 58.44, 110.2, 89.61, 98.6, 115.3, 64.3
  • Solution:

      1. You are interested in determining if metabolic rate in patients after surgery is different from before surgery. You collected trial data and found a mean difference of 0.73 (SD=2.9).
      • Effect size = (MeanH1MeanH0)/SD=(0.73)/2.9=0.25

      • Two-tailed test

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.t.test(d=0.25, sig.level=0.05, power=0.80, type="paired", alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is :128"
      1. Effect size = (MeanH1MeanH0)/SDpooled=(98.185.4)/(26.82+27.22)/2=12.7/27=0.47
      • One-tailed test

      • R Code:

      print(paste("The Sample Size is :",round(pwr.t.test(d=0.47, sig.level=0.05, power=0.80, type="paired", alternative="greater")$n,0)))
      ## [1] "The Sample Size is : 29"

One-Way ANOVA :

  • Description: : This tests if at least one mean is different among groups, where the groups are larger than two, for a normally distributed variable. ANOVA is the extension of the Two Means T-test for more than two groups.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
11> 21YesNo
  • Effect size calculation: EffectSize(f)=η21η2 Where, η=SSTTSS=TreatmentSumSquaresTotalSumSquares

  • Example:(1) Is there a difference in new car interest rates across 6 different cities?

  • Solution:

    • Here, H0:0 and H1:≠0

    • There are a total of 6 groups (cities).

    • We will guess that the effect sizes will be small.

    • For t-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • Groups assumed to be the same size.

    • R Package: pwr Package

    • R function: pwr.anova.test(k =, f = , sig.level = , power = )

      • k= number of groups
      • f= effect size
      • sig.level= significant level
      • power= power of test
    • Answer of the problem:

      print(paste0("The Sample Size is :",round(pwr.anova.test(k =6 , f =0.1 , sig.level=0.05 , power =0.80 )$n,0)))
      ## [1] "The Sample Size is :215"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining there is a difference in weight lost between 4 different surgery options. You collect the following trial data of weight lost in pounds.

      SurgeryWeight Measures
      A6.3, 2.8, 7.8, 7.9, 4.9
      B9.9, 4.1, 3.9, 6.3, 6.9
      C5.1, 2.9, 3.6, 5.7, 4.5
      D1.0, 2.8, 4.8, 3.9, 1.6
    • (ii) You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.

  • Solution:

      1. Here,
      • η=SST/TSS=31.47/(31.47+62.87)=0.33 Note that, you can calculate SST & TSS by performing ANOVA on the dataset using aov() function.

      • Effect size(f) = η2/(1η2)=0.33/(10.33)=0.7

      • No. of groups= 4

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.anova.test(k =4, f =0.7, sig.level=0.05, power =0.80 )$n,0)))
      ## [1] "The Sample Size is :7"
      1. You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.
      • Guessed a medium effect size (0.25)

      • No. of groups= 5

      • R Code:

      print(paste("The Sample Size is :",round(pwr.anova.test(k =5, f =0.25, sig.level=0.05, power =0.80 )$n,0)))
      ## [1] "The Sample Size is : 39"

Single Proportion Test :

  • Description: : This tests when you only have a single proportion and you want to know if the proportions of certain values differ from some constant proportion.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
0121N/ANo
  • Effect size calculation: EffectSize(h)=2(arcsin(pH1))2(arcsin(pH0)))

  • Example:(1) Is there a significance difference in cancer prevalence of middle-aged women who have a sister with breast cancer (5%) compared to the general population prevalence (2%)?

  • Solution:

    • Here, H0:0 and H1:≠0

    • You don’t have background info, so you guess that there is a small effect size.

    • For h-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • Selected Two-sided, because we don’t care about directionality.

    • R Package: pwr Package

    • R function: pwr.p.test(h = , sig.level =, power =, alternative=“two.sided”, “less”, or “greater” )

      • h= effect size
      • sig.level= significant level
      • power= power of test
      • alternative= type of tail
    • Answer of the problem:

      print(paste0("The Sample Size is :",round( pwr.p.test(h=0.2, sig.level=0.05, power=0.80, alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is :196"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the male incidence rate proportion of cancer in North Dakota is higher than the US average (prop=0.00490). You find trial data cancer prevalence of 0.00495.

    • (ii) You are interested in determining if the female incidence rate proportion of cancer in North Dakota is lower than the US average (prop=0.00420).

  • Solution:

      1. Here,
      • Effect size = 2arcsin(0.00495)2arcsin(0.00490)=0.0007. Note that, in R arcsin can be calculated by the function asin(). Difference of proportion power calculation for binomial distribution (arcsine transformation)

      • One-sided test

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.p.test(h=0.0007, sig.level=0.05, power=0.80, alternative="greater")$n,0)))
      ## [1] "The Sample Size is :12617464"
      1. You are interested in determining if the female incidence rate proportion of cancer in North Dakota is lower than the US average (prop=0.00420).
      • Guess a very low effect size (0.001)

        • One-tailed test

        • R Code:

        print(paste("The Sample Size is :",round(pwr.p.test(h=-0.001, sig.level=0.05, power=0.80, alternative="less")$n,0)))
        ## [1] "The Sample Size is : 6182557"

Two Proportions Test :

  • Description: : this tests when you only have two groups and you want to know if the proportions of each group are different from one another.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
0222N/ANo
  • Effect size calculation: EffectSize(h)=2(arcsin(pH1))2(arcsin(pH0)))

  • Example:(1) Is the expected proportion of students passing a stats course taught by psychology teachers different from the observed proportion of students passing the same stats class taught by mathematics teachers?

  • Solution:

    • Here, H0:0 and H1:≠0

    • You don’t have background info, so you guess that there is a small effect size.

    • For h-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • Selected Two-sided, because we don’t care about directionality.

    • R Package: pwr Package

    • R function: pwr.2p.test(h = , sig.level =, power =, alternative=“two.sided”, “less”, or “greater” )

      • h= effect size
      • sig.level= significant level
      • power= power of test
      • alternative= type of tail
    • Answer of the problem:

      print(paste0("The Sample Size is :",round( pwr.2p.test(h=0.2, sig.level=0.05, power=.80, alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is :392"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the expected proportion (P1) of students passing a stats course taught by psychology teachers is different than the observed proportion (P2) of students passing the same stats class taught by biology teachers. You collected the following data of passed tests.

      Teaching MethodResponse
      PsychologyYes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No
      BiologyNo, No, Yes, Yes, Yes, No, Yes, No, Yes, Yes
    • (ii) You are interested in determining of the expected proportion (P1) of female students who selected YES on a question was higher than the observed proportion (P2) of male students who selected YES. The observed proportion of males who selected yes was 0.75.

  • Solution:

      1. Here,
      • p1=7/10=0.70,p2=6/10=0.60 Note that, you can calculate SST & TSS by performing ANOVA on the dataset using aov() function.

      • Effect size= h=2asin(0.60)2asin(0.70)=0.21

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.2p.test(h=-0.21, sig.level=0.05, power=0.80, alternative="two.sided")$n,0)))
      ## [1] "The Sample Size is :356"
      1. You are interested in determining if there is a difference in white blood cell counts between 5 different medication regimes.
      • Guess that the expected proportion (p1) =0.85

      • Effect Size= h=2asin(0.85)2asin(0.75)=0.25

      • R Code:

      print(paste("The Sample Size is :",round(pwr.2p.test(h=0.25, sig.level=0.05, power=0.80, alternative="greater")$n,0)))
      ## [1] "The Sample Size is : 198"

Chi-Squared Test :

  • Description: : this tests when you only have two groups and you want to know if the proportions of each group are different from one another.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
0121N/ANo
  • Effect size calculation: EffectSize(w)=χ2n×df where, χ2=(OiEi)2Ei

  • Example:(1) Does the observed proportions of phenotypes from a genetics experiment different from the expected 9:3:3:1?

  • Solution:

    • Here, H0:0 and H1:≠0

    • You don’t have background info, so you guess that there is a small effect size.

    • For w-tests: 0.1=small, 0.3=medium, and 0.5=large effect sizes.

    • Degrees of freedoms= (the number of proportions minus 1) = 4 (phenotypes) – 1 = 3

    • R Package: pwr Package

    • R function: pwr.chisq.test(w =, df = , sig.level =, power = )

      • w= effect size
      • df= degrees of freedom
      • sig.level= significant level
      • power= power of test
    • Answer of the problem:

      print(paste0("The Sample Size is :",round(pwr.chisq.test(w=0.3, df=3, sig.level=0.05, power=0.80)$N,0)))
      ## [1] "The Sample Size is :121"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the ethnic ratios in a company differ by gender. You collect the following trial data from 200 employees.

      GenderWhiteBlackAm.IndianAsian
      Male0.600.250.010.14
      Female0.650.210.110.03
    • (ii) You are interested in determining if the proportions of student by year (Freshman, Sophomore, Junior, Senior) is any different from 1:1:1:1. You collect the following trial data.


      Student 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 Grade Frs, Frs, Frs, Frs, Frs, Frs, Frs, Soph, Soph, Soph, Soph, Soph, Jun, Jun, Jun, Jun, Jun, Sen, Sen, Sen

  • Solution:

      1. Note that,
      • If they were equal the expected ratios should be the same as the overall ethnic ratios (62.5, 23.0, 6.0, 8.5)

      • Will just focus on males

      • χ2=(OiEi)2Ei=(6062.5)2/62.5+(2523)2/23+(16)2/6+(148.5)2/8.5=8

      • Effect size= w=χ2/(ndf)=8/(2003)=0.115

      • R Code:

        print(paste0("The Sample Size is :",round(pwr.chisq.test(w=0.115, df=3, sig.level=0.05, power=0.80)$N,0)))
      ## [1] "The Sample Size is :824"
      1. Note that here,
      • χ2=(OiEi)2Ei=(75)2/5+(55)2/5+(55)2/5+(35)2/5=1.6

      • Effect Size= w=χ2/(ndf)=1.6/(203)=0.163

      • R Code:

      print(paste("The Sample Size is :",round(pwr.chisq.test(w=0.163, df=3, sig.level=0.05, power=0.80)$N,0)))
      ## [1] "The Sample Size is : 410"

Simple & Multiple Linear Regression :

  • Description: : this test determines if there is a significant relationship between two or more normally distributed numerical variables. The predictor variable is used to try to predict the response variable.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
2 or >20NANAYesNo
  • Effect size calculation: EffectSize(f2)=R2 Where, R2=Goodnessoffitmeasure(i.e.,AdjustedR2)

  • Example:(1) Is there a relationship between height and weight in college males?

  • Solution:

    • Here, H0:0 and H1:≠0

    • You don’t have background info, so you guess that there is a small effect size.

    • For f2-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • For simple regression (only one predictor variable) = numerator df=1 & for multiple regression it is just the number of predictor variables.

    • Output will be denominator degrees of freedom rather than sample size; will need to round up and add 2 for simple linear regression & add p+1; (where p= No. of predictor+1, because there is only one dependent outcome variable) for multiple linear regression to get sample size.

    • R Package: pwr Package

    • R function: pwr.f2.test(u =, v= , f2=, sig.level =, power = )

      • u= numerator degrees of freedom
      • v= denominator degrees of freedom
      • f2= effect size
      • sig.level= significant level
      • power= power of test
    • To calculate sample size: Sample Size(n)= (denominator degrees of freedom(v) + Total No. of variables)

    • Answer of the problem:

      print(paste0("The Sample Size is :",round( pwr.f2.test(u=1, f2=0.35, sig.level=0.05, power=0.80)$v,0)+2)) ##--2 has add because it is a simple linear regression
      ## [1] "The Sample Size is :25"
  • Example: (2) You are interested in determining if height (meters), weight (grams), and fertilizer added (grams) in plants can predict yield (grams of berries). You collect the following trial data. Here α=0.05, & Power=(1β)=80

    VariablesValues
    Yield46.8, 48.7, 48.4, 53.7, 56.7
    Height14.6, 19.6, 18.6, 25.5, 20.4
    Weight95.3, 99.5, 94.1, 110, 103
    Fertilizer2.1, 3.2, 4.3, 1.1, 4.3
  • Solution:

    • Here, at first we have to find the AdjustedR2 value by fitting the linear model.

    • Then, we will find the sample size.

    • R Code :

    #--Data--#
    yield= c(46.8, 48.7, 48.4, 53.7, 56.7)
    height= c(14.6, 19.6, 18.6, 25.5, 20.4)
    weight= c(95.3, 99.5, 94.1, 110, 103)
    Fert= c(2.1, 3.2, 4.3, 1.1, 4.3)
    
    #-- Fitting Linear Model --#
    Model= lm(height~yield + weight + Fert)
    
    #-- Extracting Adjusted R^2 Value --#
    R_Sqared= summary(Model)$adj.r.squared
    
    #-- Calculating Effect (f2) --#
    f.2= sqrt(R_Sqared)
    
    #--  Calculating sample size --#
    ##--4 has added because it is a multiple linear Regression with 3 predictors and one dependent variable--##
    
    print(paste0("The Sample Size is :",round( pwr.f2.test(u=1, f2=f.2, sig.level=0.05, power=0.80)$v,0)+4))
    ## [1] "The Sample Size is :14"

Correlation :

  • Description: : This test determines if there is a difference between two numerical values. It is like simple regression, but is not identical.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
20NANAYesNo
  • Effect size calculation: Effect Size= r= Correlation Coefficient

  • Example:(1) Is there a correlation between hours studied and test score?

  • Solution:

    • Here, H0:r=0 and H1:r0

    • You don’t have background info, so you guess that there is a small effect size.

    • For Correlation levels (r): 0.1=small, 0.3=medium, and 0.5=large correlations.

    • Here approximate correlation power calculation is done by arctangh transformation

    • R Package: pwr Package

    • R function: pwr.r.test(r = , sig.level = , power = )

      • r= correlation
      • sig.level= significant level
      • power= power of test
    • Answer of the problem:

      print(paste0("The Sample Size is :",round(pwr.r.test(r=0.5, sig.level=0.05, power=0.80)$n,0)))
      ## [1] "The Sample Size is :28"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if there is a correlation between height and weight in men.

      MalesMeasures
      Height178, 166, 172, 186, 182
      Weight165, 139, 257, 225, 196
    • (ii) You are interested in determining if, in lab mice, the correlation between longevity (in months) and average protein intake (grams).

  • Solution:

      1. Here,
      • first, calculate the correlation value, and then calculate the sample size.

      • R Code:

      #-- Data --#
      MH= c(178,166,172,186,182)
      MW= c(165,139,257,225,196)
      
      #-- correlation value --#
      
      r= cor(MH,MW)
      print(paste0("The Sample Size is :",round(pwr.r.test(r=0.37, sig.level=0.05, power=0.80)$n,0)))
      ## [1] "The Sample Size is :54"
    • (ii)You are interested in determining if, in lab mice, the correlation between longevity (in months) and average protein intake (grams).

      • Guessed large (0.5) correlation

      • R Code:

      print(paste("The Sample Size is :",round(pwr.r.test(r=0.5, sig.level=0.05, power=0.80)$n,0)))
      ## [1] "The Sample Size is : 28"

Non-Parametric T-tests :

  • Description: versions of the t-tests for non-parametric data.
    • One Mean Wilcoxon: sample mean against set value
    • Mann-Whitney: two sample means (unpaired)
    • Paired Wilcoxon: two sample means (paired)
NameNumeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
One Mean Wilcoxon:1000NoNA
Mann-Whitney:1121NoNo
Paired Wilcoxon:1121NoYes
  • Effect size calculation: Effect Size(Cohen’s D:)=|μH1μH0|σ;|μH1μH0|σpooled;μdiffσdiff

  • Example:(1) (for t-tests, 0.2=small, 0.5=medium, and 0.8 large effect sizes)

      1. One Mean Wilcoxon: Is the average number of children in Grand Forks families different than 1?
    • Solution:
      • Here, H0:1child and H1:>1child

      • You don’t have background info, so you guess that there is a medium effect size.

      • Select one-tailed (greater)

      • R Package: pwr Package

      • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”)) + 15%

        • d= effect size
        • sig.level= significant level
        • power= power of test
        • type= type of test
      • Answer of the problem:

      Pwer_t=pwr.t.test(d=0.5, sig.level=0.05, power=0.80, type="one.sample", alternative="greater")
      
      ##-- Nonparametric Correction : adding 15% --##
      print(paste0("Sample Size : ",round((Pwer_t$n*1.15),0)))
      ## [1] "Sample Size : 30"
      1. Mann-Whitney: Does the average number of snacks per day for individuals on a diet differ between young and old persons?
    • Solution:
      • Here, H0:0difference in snack number,  and H1:≠0difference in snack number

      • You don’t have background info, so you guess that there is a small effect size

      • Select two-sided

      • R Package: pwr Package

      • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”)) + 15%

        • d= effect size
        • sig.level= significant level
        • power= power of test
        • type= type of test
      • Note: “Parametric t-test + 15% Approach” for calculating Sample Size for Non Parametric test

      • Answer of the problem:

      Pwer_t=pwr.t.test(d=0.2, sig.level=0.05, power=0.80, type="two.sample", alternative="two.sided")
      
      ##-- Nonparametric Correction : adding 15% --##
      print(paste0("Sample Size : ",round((Pwer_t$n*1.15),0)))
      ## [1] "Sample Size : 452"
      1. Paired Wilcoxon: Is genome methylation patterns different between identical twins?
    • Solution:
      • Here, H0:0% methylation and H1:≠0% methylation

      • You don’t have background info, so you guess that there is a large effect size

      • Select one-tailed (greater)

      • R Package: pwr Package

      • R function: pwr.t.test(d = , sig.level = , power = , type = c(“two.sample”, “one.sample”, “paired”)) + 15%

        • d= effect size
        • sig.level= significant level
        • power= power of test
        • type= type of test
      • Answer of the problem:

      Pwer_t= pwr.t.test(d=0.8, sig.level=0.05, power=0.80, type="paired", alternative="greater")
      
      ##-- Nonparametric Correction : adding 15% --##
      print(paste0("Sample Size : ",round((Pwer_t$n*1.15),0)))
      ## [1] "Sample Size : 13"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if the average number of pets in Grand Forks families is greater than 1. You collect the following trial data for pet number.

      VariableValues
      Pets1, 1, 1, 3, 2, 1, 0, 0, 0, 4
    • (ii) You are interested in determining if the number of meals per day for individuals on a diet is higher in younger people than older. You collected trial data on meals per day.

      VariableValues
      Young meals1, 2, 2, 3, 3, 3, 3, 4
      Older meals1, 1, 1, 2, 2, 2, 3, 3
    • (iii) You are interested in determining if genome methylation patterns are higher in the first fraternal twin born compared to the second. You collected the following trial data on methylation level difference (in percentage).

      VariableValues
      Methy.Diff(%)5.96, 5.63, 1.25, 1.17, 3.59, 1.64, 1.6, 1.4
  • Solution:

      1. You are interested in determining if the average income of college freshman is less than Rs.20,000. You collect trial data and find that the mean income was Rs.14,500 (SD=6000).
      • Effect size = (MeanH1MeanH0)/SD=(1.31.0)/1.34=0.224

      • One-tailed test

      • R Code:

      Pwer_t= pwr.t.test(d=0.224, sig.level=0.05, power=0.80, type="one.sample", alternative="greater")
      
      #-- Non-parametric Correction --# 
      print(paste0("The Sample Size is :",round(Pwer_t$n*1.15,0)))
      ## [1] "The Sample Size is :143"
      1. Try it by yourself.
      1. Try it by yourself.

Kruskal Wallace Test :

  • Description: : this tests if at least one mean is different among groups, where the groups are larger than two for a non-normally distributed variable. (AKA, non-parametric ANOVA). There really isn’t a good way of calculating sample size in R, but you can use a rule of thumb:

    • Run Parametric Test
    • Add 15% to total sample size
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
11>21NoNo
  • Effect size calculation: Effect Size = Same as the effect size for the ANOVA.

  • Example:(1) ** Is there a difference in draft rank across 3 different months? **

  • Solution:

    • Here, H0:r=0 and H1:r0

    • There will be a total of 3 groups (months)

    • You don’t have background info, so you guess that there is a medium effect size.

    • For f-test : 0.1=small, 0.25=medium, and 0.4=large correlations.

    • No Tails in ANOVA

    • Groups assumed to be the same size.

    • R Package: pwr Package

    • R function: pwr.anova.test(k =, f = , sig.level = , power = )

      • k= number of groups
      • f= effect size
      • sig.level= significant level
      • power= power of test
    • Answer of the problem:

      ##-- Balanced one-way analysis of variance power calculation --##
      Pwr_Anova=  pwr.anova.test(k =3 , f =0.25 , sig.level=0.05 , power =0.80 )
    
      #-- Non-parametric Correction --#
      print(paste0("The Sample Size is :",round((Pwr_Anova$n*1.15),0)))
    ## [1] "The Sample Size is :60"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining there is a difference in hours worked across 3 different groups(faculty, staff, and hourly workers). You collect the following trial data of weekly hours.

      GroupsWorking Hours
      Faculty42, 45, 46, 55, 42
      Staff46, 45, 37, 42, 40
      Hourly29, 42, 33, 50, 23
    • (ii) You are interested in determining there is a difference in assistant professor salaries across 25 different departments.

  • Solution:

      1. Here,
      • η2=SST/TSS=286.5/(286.5+625.2)=0.314 Note that, you can calculate SST & TSS by performing ANOVA on the dataset using aov() function.

      • Effect size(f) = η2/(1η2)=0.314/(10.314)=0.677

      • No. of groups= 3

      • R Code:

        ##-- Balanced one-way analysis of variance power calculation --##
        Pwr_Anova=  pwr.anova.test(k =3, f =0.677, sig.level=0.05, power =0.80)
      
        #-- Non-parametric Correction --#
        print(paste0("The Sample Size is :",round((Pwr_Anova$n*1.15),0)))
      ## [1] "The Sample Size is :9"
      1. You are interested in determining there is a difference in assistant professor salaries across 25 different departments.
      • Guess small effect size (0.10)

      • No. of groups= 25

      • R Code:

        #-- Balanced one-way analysis of variance power calculation --#
        Pwr_Anova= pwr.anova.test(k =25, f =0.10, sig.level=0.05, power =0.80)
      
        #-- Non-parametric Correction --#
        print(paste0("The Sample Size is :",round((Pwr_Anova$n*1.15),0)))
      ## [1] "The Sample Size is :104"

Repeated Measures ANOVA :

  • Description: : this tests if at least one mean is different among groups, where the groups are repeated measures (more than two) for a normally distributed variable. Repeated Measures ANOVA is the extension of the Paired T-test for more than two groups.
Numeric Var(s)Cat. Var(s)Cat. Var Group #Cat. Var # of interestParametricPaired
11> 21YesNo
  • Effect size calculation: EffectSize(f)=σmσ Where, σm=j=1K(mjm)2k=StandardDeviationofgroupmeans mj=jthgroupmean,j=1(1)K m=Overallmean K=numberofgroups σ=overallstandarddeviation

  • Example:(1) Is there a difference in blood pressure at 1, 2, 3, and 4 months post-treatment?

  • Solution:

    • Here, H0:0 and H1:≠0

    • 1 group, 4 measurements

    • We will guess that the effect sizes will be small.

    • For t-tests: 0.2=small, 0.5=medium, and 0.8=large effect sizes.

    • For the nonsphericity correction coefficient, 1 means sphericity is met. There are methods to estimate this but will go with 1 for this example.

    • R Package: WebPower Package

    • R function: wp.rmanova(ng = NULL, nm = NULL, f = NULL, nscor = 1, alpha = 0.05, power = NULL, type = 0)

      • ng= number of groups

      • nm= number of measurements

      • f= effect size

      • nscor= nonsphericity correction coefficient

      • alpha= significant level of test

      • power= statistical power

      • type= (0,1,2) The value “0” is for between-effect; “1” is for within-effect; and “2” is for interaction effect.

      • Note:

        • Within-effects: variability of a particular value for individuals in a sample
        • Between-effects: examines differences between individuals
    • Answer of the problem:

    library(WebPower)
    print(paste0("The Sample Size is :",round(wp.rmanova(n=NULL, ng=1, nm=4, f=0.1, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))
    ## [1] "The Sample Size is :1092"
  • Example:(2) Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):

    • (i) You are interested in determining if there is a difference in blood serum levels at 6, 12, 18, and 24 months post-treatment. You collect the following trial data of blood serum in mg/dL.

      MonthsBlood Serum
      6 Months38, 13, 32, 35, 21
      12 Months38, 44, 35, 48, 27
      18 Months46, 15, 53, 51, 29
      24 Months52, 29, 60, 44, 36
    • (ii) You are interested in determining if there is a difference in antibody levels at 1, 2, and 3 months post-treatment.

  • Solution:

      1. Here,

      -Effect Size: f=(27.837.3)2+(38.437.3)2+(38.837.3)2+(25.237.3)24/12.74=0.608

      • To get sphericity, ran ANOVA
      library(ez)
      ## Warning: package 'ez' was built under R version 4.1.3
      ## Registered S3 methods overwritten by 'car':
      ##   method                          from
      ##   influence.merMod                lme4
      ##   cooks.distance.influence.merMod lme4
      ##   dfbeta.influence.merMod         lme4
      ##   dfbetas.influence.merMod        lme4
      data=data.frame(Patient= factor(rep(c(1,2,3,4,5),4)),
                      Month= factor(c(rep("6 Months",5),rep("12 Months",5),rep("18 Months",5),rep("24 Months",5))),
                      Serum= c(38,13,32,35,21,38,44,35,48,27,46,15,53,51,29,52,29,60,44,36))
      anova3= ezANOVA(data, dv=Serum, wid=Patient, within=.(Month),detailed=TRUE)
      anova3
      ## $ANOVA
      ##        Effect DFn DFd     SSn    SSd         F           p p<.05       ges
      ## 1 (Intercept)   1   4 27825.8 1506.7 73.872171 0.001006882     * 0.9212804
      ## 2       Month   3  12   706.6  870.9  3.245378 0.060146886       0.2291032
      ## 
      ## $`Mauchly's Test for Sphericity`
      ##   Effect         W         p p<.05
      ## 2  Month 0.1556327 0.4348287      
      ## 
      ## $`Sphericity Corrections`
      ##   Effect       GGe     p[GG] p[GG]<.05       HFe      p[HF] p[HF]<.05
      ## 2  Month 0.4844127 0.1187469           0.6892662 0.09014564
        print(paste0("The Sample Size is :",round(wp.rmanova(n=NULL, ng=1, nm=4, f=0.608, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))
      ## [1] "The Sample Size is :31"
      1. You are interested in determining if there is a difference in antibody levels at 1, 2, and 3 months post-treatment.
      • Guess a nonsphericity correction of of 1 and medium effect 0.25

      • One group, three measurements, type 1

      • R Code:

        print(paste("The Sample Size is :",round(wp.rmanova(n=NULL, ng=1, nm=3, f=0.25, nscor=1, alpha=0.05, power=0.80, type=1)$n,0)))
      ## [1] "The Sample Size is : 156"
Rajesh Majumder
Rajesh Majumder
PhD Student, Statistician, Research Assistant