Input
Discussion
Input Explained
Allocation Ratio
Continuity Correction
Output
Tolerance
Equality analysis
Population analysis
Pairwise tests
Alternative Procedure
Further Reading
In case of the usual two sample size calculation: A) one can give proportions in the top two boxes, positive numbers between 0 or 1, in which case one would mostly not give a standard deviation, and, does mostly not apply continuity correction; or, B) one can give averages or means, any positive number, in which case one would mostly also give standard deviations, and, one does mostly apply continuity correction. After having done one of the two, give the allocation ratio, will be 'one' in most cases.
One sample analysis concerns testing an observed sample mean against an expected -invariant- population or historical mean: A) give proportions in the top two boxes, positive numbers between 0 or 1, in which case standard deviations would mostly not be given; or, B) give averages or means, any positive number, in which case standard deviations would mostly be given, and, one does mostly apply continuity correction. Input the invariant historical mean in the top 'exp' box and the sample mean in the 'obs' box.
For equality analysis give at least one mean or average, the mean for the current situation, in the top box. If required give a second average in the second box. One has to give a tolerance level in the third box. In case the average is not a proportion, a standard deviation will normally be given in fourth box. Then, give the allocation ratio, will be '1' (one) in most cases. Lastly, continuity correction doesn't function for equality analysis.
Pairwise sample size concerns two measurements on the same subjects. For proportions, give the two proportions, which are on the diagonal of changers in a two by two table, in the top two boxes. The sample size for the McNemar is calculated. For mean or averages, give the sum of mean differences in the top box and the standard deviation of this mean in the third (Std.dev 1) box. The sample size for the pairwise t-test is calculated.
Please note that in one sample and pairwise analysis the program echoos the size for the one sample. Allocation ratio is irrelevant in this case.
For determining the sample size for one or two correlations, go here
For determining the sample size for a population prevalence, go here
Discussion. The sample size calculation procedure provides a basic method of calculating sample sizes for two group comparisons. The procedures implemented here are only applicable for simple random samples and to compare two means, proportions or averages.
Simple random sampling denotes one stage, only one sampling procedure taking place, and simple random sampling implies that each individual or sample 'unit' has an equal chance of being selected into the sample. The sample size calculations offered here have to be applied carefully in the case of cluster or stratified sampling. Cluster or stratified sampling presume that we first sample groups (the clusters or strata), units are then drawn from each cluster in a second sampling procedure. In case there is no or little difference between clusters, or there are differences which are not related to, or correlated with, the variables under study, simple random procedures can be applied to estimate sample sizes. In case the difference between clusters is very large, but within clusters very small, and it concerns correlated characteristics, procedures have to be used which are not yet implemented in SISA. Sample size calculations using the procedures offered here give a good estimate of the minimum number required in any case.
To estimate a sample size prior to doing the research, requires the postulation of an effect size, mostly called 'd'. This effect size might be related to a correlation, an f-value, or a non-parametric test. In the procedure implemented here 'd' is the difference between two averages, means or proportions. Effect size 'd' is mostly subjective, it is the difference you want to discover as a researcher or practitioner and it is a difference that you find relevant. Sometimes it is less subjective, for example, if you want to confirm if some difference reported by someone else can also be found in your environment.
However, if cost aspects are included 'd' can be calculated objectively. For example, the cost of a disease is $25. Old prophylaxis prevents 50% of cases, the cost of old prophylaxis equals $5 per case. A new prophylaxis is developed which is considered more effective, it costs $10 per case. To implement a cost effective solution the effect size should be at least 0.3, or 30%. Old prophylaxis prevents 50%, new prophylaxis should prevent at least 80%, to allow the additional costs of new prophylaxis to be offset by lower treatment costs
[d=($(treatment)*effectiveness(old_prophylaxis)+$(old_prophylaxis)-$(new_prophylaxis))/$(treatment)=($25*0.5+$5-$10)/$25=0.3].
Input Explained is made as simple as possible. However, there is some flexibility which allows for special situations, based on the input consisting of proportions (A) or means (B). A) The program recognizes a proportion by there being a number between 'zero' and 'one' in the first box and a standard deviation of 'zero' in the third box, in which case the program expects to also find a proportion in the second box. A specific proportion related method will be used to estimate the required sample size. B) In the case of a value of 'one' or more in the first box a method will be used which considers that the values given are means. There are the following possibilities to give standard deviations related to the means: B.1) if a standard deviation is given in the first standard deviation box only, it is considered the correct standard deviation to do the sample size calculation; B.2) if two standard deviations are given the program uses these to calculate and apply the standard deviation of the difference between the two means, which will lead to a safe (higher) calculation of the sample size; B.3) if no standard deviation is given the mean is considered to be a rate, i.e., a number of counts which is Poisson distributed. The program obtains the standard deviation of each of the two means by taking the square root of the mean.
If one or two standard deviations are given, the program will treat the given means always as a mean, even if one or both are between 'zero' and 'one'. This way the program can consider a value between 'zero' and 'one' in two ways, as a proportion, to be compared with another proportion using a special method, or as a mean, can be compared with any other possible value but requires a standard deviation provided by the user.
Allocation ratio is an additional parameter. An allocation ratio of 'one' is used when you use two similar sized groups. An allocation ratio of 'two' is used when one group is twice as large as the other group. The program echoos the size for only one group. The size of the other group is the size given by the program multiplied with the allocation ratio. The allocation ratio can be any value, with or without decimals. In case of unequal variances most power can be accomplished for the lowest number of cases by allocating more cases to the group with the highest variance. The program sometimes makes a suggestion on the optimum allocation ratio. The allocation ratio is not relevant and functional for population and pairwise analysis.
Continuity correction. In the case of means, thus not proportions, the issue is simple, continuity correction is continuity correction and should be applied when the required sample size is expected to be relatively small. However, although the literature continues to discuss the issue under the header 'continuity correction', this is not strictly speaking the difference between using continuity correction, or not, for testing the difference between two proportions. The issue is, in statistical testing for the difference between two proportions one can use various methods, such as the Chi-square (see two by two tables), the t-test and the Fisher (or Binomial in the case of population analysis). Well, the method without continuity correction estimates the sample size given a particular power and alpha level required for doing a Chi-square. Continuity correction corrects the number found this way towards doing a Fisher (Binomial), which is less powerful compared with the Chi-square and requires a larger sample size. The t-test, often applied to test for a difference between two proportions, is somewhere in between the two, although, particularly with smaller sample sizes and smaller proportions to test, take it that the t-test is mostly closer to the Chi-square than the Fisher. Please consider that the continuity correction tick box has a different function in calculating the sample size for Pairwise analysis
Output. In the output the program gives the number of cases for a certain statistical power and alpha level. Two sets of numbers are given, one for double sided and one for single sided testing. Power is the chance that, if 'd' exists in the real world, one gets a statistically significant difference in the data. Usually the powerlevel is taken to be 80%. There is an 80% chance to discover a really existing difference in the sample. Alpha is the chance that one would conclude one has discovered an effect or difference 'd', while in fact this difference or effect does not exist. Usually, alpha is set at 5%, which means that in 5%, or one in twenty, of projects the data signals that 'something' exists, while in fact it does not. Other levels for power and alpha can be used. Only do that if you have a good reason for it.
Single sided is used when you know the direction of the effect, new treatment is better than old treatment, small cars are cheaper than big ones. Use double sided if you do not know the direction, which treatment has less complications, are males different from females, which type of car is faster, big one or small one? Double sided is more often used, because it gives a more conservative (higher) estimate of the required sample size.
Population analysis works exactly the same as the two sample analysis, the main difference being that the value of one parameter is not an estimate but is exactly known. There are two cases in which this is considered to be the case: a) the 'historical' situation where some sort of an arrived opinion on the numerical value of a phenomenon exists; b) in the case the numerical value is a population value, for example, the number of deaths in a community can be exactly known. In the case of 'a', exactly seems a relative concept and Bayesian methods might be preferred. In the case of 'b' the methods proposed here are valid. Fill-in the population proportion or mean in the top box, fill-in the postulated sample proportion or mean in the second box. Input is further the same as described under the input heading and similar consideration as discussed above apply. 'Check' the population method.
There are some subtle differences which we will discuss now. First, allocation ratio is not relevant and is not considered in the analysis. The program gives you the required size of the one sample which is used to test an estimate against a population value.
For means or averages, give the expected population mean in the top -exp- box and the observed sample mean in the second -obs- box. If you use the population standard deviation give it in the Std.dev. 1 box (the calculation will be based on approximating the normal distribution). If you use the sample standard deviation give it in the Std.dev. 2 box (the calculation will be based on approximating the t-distribution).
In the case of proportions it should be considered that the underlying nature of the data is quite different from data used to test for a difference between two estimated proportions. In the case of two estimated proportions the data consists of a two by two table and all methods for table analysis apply. In this case, by not using or using continuity correction SISA allows you to estimate the sample size required for Chi-square testing or for Fisher testing respectively. In the case of comparing a population value with a sample estimate it concerns data which compares an expected with an observed distribution in a one dimensional array with two categories. In this case one would usually not use the Chi-square, and using the Fisher would be impossible. The t-test also works differently. For the analysis the Binomial is the most appropriate test in this situation, use it in SISA online, use SISA's MsDOS version if you have a large sample size, or a Poisson approximation if you have a very large sample. Alternatively, but not preferred, you can use a normal approximation of the Binomial, consult Wonnacott and Wonnacott or Blalock on how to do this. SISA's t-test procedure supports this. Now, the problem is, that the formulae available for doing sample size calculation for population analysis are in the case of proportions meant for the chi-square and the Fisher, the one is not very appropriate for this type of data, the other impossible. Crudely, the sample size calculation without continuity correction seems reasonable for the normal approximation, the sample size calculation with continuity correction seems reasonable for the binomial.
The more usual type of analysis considers that 'new situation' is probably different, and an improvement mostly, compared with 'old situation'. Say, however, one wants to lower the number of operators, or nurses, or make some other cost saving. In this case we could expect the change to result in a deterioration of outcomes, compared with the current situation. This leads to a different view on changes and differences, although it is up to the reader to decide to what extent which situations applies to them. In the classical model discussed above one wants to run relatively little risk to implement a possibly ineffective change or to make a lot of fuss about a non-existing difference, therefore one wants to not too easily discover a difference and alpha levels are set at a relatively low value (say 0.05) while a not too high power is considered acceptable (80%). In equality analysis the assumption is that one does want to discover a difference, as such a difference might mean bad news and one does not want to deny bad news. Alpha levels are therefore set at a high level (say 0.1) and a high power (95%) is also required.
Different calculations are required, these calculations are done if one 'checks' Equality method. The input is a bit complicated. Two means or averages can be given. One is the current outcome mean, for example the proportion cured or defective, or the mean score on a measurement or assessment scale. A second mean can be given if one already knows that the change will produce changes in outcome, and if one is of the opinion that such changes are acceptable. This is an exceptional situation, mostly the two means given will be the same, one expects no change in outcome.
A 'tolerance (delta)' parameter is subsequently given. One wants that Beta percent of the observations is within the tolerance level given. Beta is the powerlevel, the tolerance will mostly be set at quite a small value. The tolerance parameter is only relevant for equality analysis. In case the means does not concern proportions, a standard deviation should be given in the 'Standard Deviation 2' box. The program considers that the standard deviation is the same for both means. One does not have to give a standard deviation in the case of proportions, keep the value in the Standard Deviation 2 box at 'zero'.
If a standard deviation is not given the program will calculate one for you, according to the methods set out for the usual sample size calculation methods discussed above.
Pairwise analysis is when you do two measurements on a single individual and then compare the outcome of the two measurements. Mostly a time factor is involved, a measurement is done, something "happens", an "intervention" for example, after which the measurement is done again. The before and after measurements are compared.
In the case of means or averages the score for each individual on the measurement before is substracted from the score on the measurement after the intervention. These differences are for all individuals added together producing a mean difference with an associated standard deviation. Your nill hypothesis is that the mean is zero, overall (in net terms) the respondents did not change. The sample size calculated is de sample size required to detect a postulated net change over all individuals. Give the expected mean difference, or net change, for all individuals in the top box and the associated standard deviation in the third (Std.Dev. 1) box. A sample size for the paired t-test is calculated after clicking the paired button. The calculation will approximate the t-test. In the unlikely event you want to approximate the normal distribution mark the continuity correction box.
For proportions the analysis concerns the numbers of people who change between two groups, here denoted as 'A' and 'B'. It is customary to study the changes in a crosstable, with the numbers of 'A' and 'B' before and after the intervention in the two marginals and the changing respondents inside the table, with a diagonal of non-changes (cells aa and bb) and a diagonal of changers (cells ab and ba). The program calculates the sample size for doing the McNemar. There are two strategies. The preferred strategy is to give the proportion of respondents on the total you expect to change from group 'A' to group 'B' (the proportion which are in cell ab on the total) in the top box and the proportion of people you expect to change from group 'B' to group 'A' in the second box.
The second and much less prefered strategy is to give the proportion of people who where in group 'A' before the intervention in the top box and the proportion of people who where in groep 'A' after the intervention in the second box. It concerns the marginal change (which is in fact what we are interested in). Click the continuity correction box to do this analysis. The program then estimates the numbers of changers inside the table from the marginals, independence is presumed.
The alternative procedure is a generalization of the sample size procedure which is often used for a single parameter, such as a prevalence or an incidence. It concerns the number of cases required to estimate a single parameter with a particular pre-defined precision/certainty. In the case of the two sample procedure it concerns the difference between the two parameters, in the one sample case the difference between the expected and the observed. Input in the top five boxes works the same as for the usual sample size procedures. Following you specify the width of the confidence interval (which is a measure of the precision/certainty of an estimate) for the difference you want to estimate in the "Power" box (not as a percentage). One (hundred) minus the confidence interval is given in the "Alpha" box. Thus, 5 gives a 95% confidence interval.
Limitations: The procedure does not work for equivalence testing, it is not yet implemented for pairwise analysis. Continuity correction does not work, it is all not continuity corrected.
Note: this is an unusual way to calculate a sample size. Only use this if you want to know the answer to the question of the number of cases required to estimate a single parameter with a certain predefined precision.
Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons. British Medical Journal 1995;311:1145-8.
Machin D, Campbell M, Fayers P, Pinol A. Sample size tables for clinical studies, 2nd Edition. London, Edinburgh, Malden and Carlton: Blackwell Science 1997.
Sahai H, Kurshid A. Formulae and tables for the determination of sample sizes and power in clinical trials for testing differences in proportions for the two-sample design: A review. Statistics in Medicine 1996;15:1-21.