Quanto software power calculation
We also found that case-parent studies require more samples than case-control studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a population-based genetic association study. In genetic epidemiological research, both case-control and case-parent trio designs have been used widely to evaluate genetic susceptibilities to human complex diseases and markers, such as single-nucleotide polymorphisms SNPs , to localize disease gene variants [ 1 - 5 ].
The sample size for detecting associations between disease and SNP markers is known to be highly affected by disease prevalence, disease allele frequency, linkage disequilibrium LD , inheritance models e. Previous studies have shown that a population based case-control design can be more powerful than a family-based study design in identifying genes predisposing human complex traits, both for qualitative traits and for quantitative traits [ 8 - 11 ].
However, some studies reported that the case-parent design is much more powerful than the case-control design in evaluating genetic risk for common complex diseases, because case-control studies are susceptible to bias due to phenotype misclassification or population stratification [ 12 , 13 ]. Recently, genome-wide association studies GWASs using thousands of cases and controls reported many susceptibility SNPs for human traits by the end of June, www.
Since a GWAS evaluates hundreds of thousands of SNP markers, it requires a much larger sample size to achieve an adequate statistical power [ 14 - 18 ]. Testing a large number of SNP markers leads to a large number of multiple comparisons and thus increases false positive rates. Either the Bonferroni correction or the false discovery rate is generally applied to avoid false positive type I error rates.
However, the Bonferroni-corrected p-value, the significance threshold set to 0. Therefore, estimating a sufficient sample size to achieve adequate statistical power is critical in the design stage of genetic association [ 20 - 24 ]. Statistical power is the probability to reject a null hypothesis H 0 while the alternative hypothesis H A is true. It is affected by many factors. For instance, a larger sample size is required to achieve sufficient statistical power.
Although a researcher collects a large number of samples, all samples may not be necessary to be analyzed to detect evidence for association. A large sample size improves the ability of disease prediction; however, it is not cost-effective that a researcher genotypes more than the effective sample size [ 25 ]. Unless researchers estimate sample size and statistical power at the research design stage, it leads to wasted time and resources to collect samples.
An effective sample size can be defined as the minimum number of samples that achieves adequate statistical power e. On the other hand, too small a sample size to detect true evidence for an association increases false negative rates and reduces the reliability of a study.
False negative rates are increased by multiple factors that cause systematic biases, and such biases reduce statistical power [ 26 ].
However, many researchers tend to overlook the importance of statistical power and sample size calculations. In this study, we evaluated statistical power with increasing numbers of markers analyzed under various assumptions and compared the sample sizes required in case-control studies and case-parent studies.
We computed the effective sample size and statistical power using a web browser program, Genetic Power Calculator developed by Purcell et al. We conducted power and sample size calculations under various assumptions about genetic models i.
The values tested for heterozygous odds ratio OR het were 1. We assumed Hardy-Weinberg equilibrium at the disease-susceptible allele. The Bonferroni p-value that was specific to the number of SNP makers tested was applied to cover 3 billion base pairs of the human genome i. We fixed the proper range of sample sizes from to 2, cases, because the power is too low when the sample size is below cases or trios , and the cost is too high to realistically collect samples when the sample size is above 2, [ 7 , 22 ].
In contrast, the effective sample size to test a single SNP under the recessive model was too large to collect with a limited budget, even if the homozygous OR is greater than 4 e. It reveals difficulty in detecting a disease allele that follows a recessive mode of inheritance with a moderate sample size. As shown in Fig. A high-risk allele showing a high OR requires a smaller sample size to be detected under the same assumption. While an allele with an OR of 1.
The higher prevalence and the higher LD were associated with increased statistical power: for instance, as the LD increased from 0. In many clinical settings, researchers are able to obtain more data from affected individuals than healthy individuals. On the other hand, there are more healthy participants than participants with a disease in a population-based study.
In Table 2 , we compared the number of cases to the number of case-parent trios to perform a case-control study and a study using case-parent trios by increasing the number of SNPs being analyzed. Genetic association studies with larger numbers of SNP markers require a larger sample size to reduce false positive association due to testing multiple hypotheses.
The sample size required in a case-parent study is generally larger than that of a case-control study design. However, the sample sizes required in both study designs increase tremendously in a GWAS. Under the same assumptions as shown above, the number of samples increased from cases for a single SNP analysis to 1, cases and 1, cases for analyses of K SNPs and 1 M SNPs, respectively, based on the threshold of p-value, calculated using a strict Bonferroni correction for multiple hypotheses comparisons.
The statistical power to test the same number of subjects was higher for the case-control design than for the case-parent trio design Fig.
MAF, minor allele frequency; D', linkage disequilibrium. Both designs of the case-control study and case-parent study are used widely in the field of genetic epidemiology for studying associations between genetic factors and the risk of disease. Over the past 2 decades, there has been a steep increase in the number of genetic association studies, and these studies have successfully reported a number of gene variants associated with human complex diseases [ 1 , 4 , 5 , 28 ]. Recently, GWASs, a new frontier in genetic epidemiology, have identified thousands of new gene variants related to human diseases [ 29 ].
The population-based studies with a large sample size have increased statistical power, which leads to smaller variance. However, it requires too much money and takes too long to collect a sufficient number of samples, and these large-scale studies are more likely to be affected by systematic bias and noise [ 25 , 30 ].
A lower sample size is required under the dominant model in any assumption, while the recessive model requires too many samples under the same assumptions to achieve adequate statistical power. CPU Vcore. Core Clock. Memory Clock. Optical Drives 0 1 2 3 4 5. PCI Express Cards 0 1 2 3 4 5.
Liquid Cooling Kit 0 1 2. YTD Video Downloader. Adobe Photoshop CC. VirtualDJ Avast Free Security.
WhatsApp Messenger. Talking Tom Cat. Clash of Clans. Subway Surfers. TubeMate 3. Google Play. Biden to send military medical teams to help hospitals. N95, KN95, KF94 masks. GameStop PS5 in-store restock. Baby Shark reaches 10 billion YouTube views. Microsoft is done with Xbox One. Windows Windows.
0コメント