T-test, number of cases and mean difference and standard deviation of difference respectively. Or, number of cases and t-value of difference.
Wilcoxon Ranks Test, number of cases and sum of signed difference.
McNemar test, give two integer numbers of changers from the diagonal of a two by two tables in the top two boxes. Number of positive changers and number of negative changers.
Pairwise tests concerns the comparison of the same group of individuals, or matched pairs, being measured twice, before and after an 'intervention'. Using this methodology the respondents or their matched 'partners' function as their own control, lowering the level of unexplained variance or 'error'. Matched or pairwise data can be presented as in the following table:
Before | After | D(ifference) | Sq(D-Mean) | Ranked-D | ||
John Steve Liz Mary Paul Joy Mike Nick Linda Peter |
18 37 12 42 7 31 59 21 8 56 |
28 34 17 40 20 35 66 27 21 61 |
10 -3 5 -2 13 4 7 6 13 5 |
17.64 77.44 0.64 60.84 51.84 3.24 1.44 0.04 51.84 0.64 |
8 -2 4.5 -1 9.5 3 7 6 9.5 4.5 |
|
Total | 291 | 349 | 58 | 265.6 | 45 | |
sd=19.05 | sd=16.74 | Mean=5.8 | sd=5.43 | Signed=3 |
The first three columns show the data, the following columns a number of calculations which we will now discuss.
As the data shows, the 10 respondents have improved 5.8 'points' on average after the intervention. One possible way to test if the difference between the respondents before and after the intervention is statistically significant is to apply the procedure t-test as implemented in SISA. Giving the values above {mean1=29.1 (291/10); mean2=34.9; n1=10; n2=10; sd1=19.05; sd2=16.74} the t-test procedure produces a t-value of 0.72, with an associated single sided ('tailed') p-value of 0.2376. The intervention does not produce statistically significant improvements in the respondents score, according to this method of testing.
However, doing it this way does not do justice to the data, as it concerns paired observations much of the differences within the two sets of data which the usual t-test (for two independent samples) considers might already have been taken out. In that case the t-test tests is too conservative, differences between the two sets of paired observations are not declared statistically significant as quickly as it should.
T-test for paired observations. (Also known as the t-test for two correlated samples). This t-test tests if the sum of the change between the two groups differs statistically significantly from zero. For doing the t-test procedure you have to give the number of cases, which is an integer number, in the top box, and which is 10 in the case of the above example. In the second box the mean difference is given (5.8 in the above example), and the standard deviation of the difference must be given as positive number with or without decimals in the third box. The standard deviation is only used for the t-test procedure. How the standard deviation is calculated is shown in the fifth column of the example above. One takes the sum of the squares of the difference between the observations minus the mean of the difference. This sum is then divided by the number of cases (minus one) and the square root is taken. In the case of the example like this: squareroot-out-of(265.6/(10-1))=5.43. One can use a calculator with statistical function to calculate the standard deviation. Enter the scores for the difference, and take the sample standard deviation, the one with the "s" symbol, not the one with the funny little "sigma" symbol.
Alternatively to giving a mean and a standard deviation for the t-test, one can give the t-value of the difference and leave the value zero (0) in the standard deviation box. The t-value is calculated like this: (mean-nilhypothesis)/(standarddeviation/squareroot(n))=(5.8-0)/(5.43/sqrt(10)). The nil hypothesis in this case equals zero (0), we expect no or zero change given the standard error of the measurement (the standarddeviation/sqrt(n) is commonly refered to as the standard error). The t-value of the difference equals 3.378, the program gives you the p-value of 0.00407.
The t-test is critical with regard to characteristics of the data, the assumptions for the test. It is considered that the differences between the two different observations are at the interval level. Thus, someone who improved four points improved twice as much compared with someone who improved two points. It is further considered that the differences are normally distributed, thus, in the case of there being, for example, a number of smaller difference being counterballanced by one very large difference, or there only being differences of more or less the same size, or there being a ceiling on the maximum size of the differences, the t-test would not be valid. Following two tests are presented for in case the assumptions of interval level and normal distribution are not met.
Wilcoxon Matched Pairs Signed Ranks Test. This test considers that the data are at an ordinal-metric level, i.e., that the original data can be validly ordered, that the data after the intervention can be ordered, and that the difference between the two sets of data can be validly ordered. This assumption is slightly less critical than the interval level assumption necessary for the t-test. The assumption of there being a normal distribution does not have to be met, this is particularly practical if the maximum change is somehow limited. A positive aspect of the Wilcoxon test is that it is a very powerful test. If all the assumptions for the t-test are met the Wilcoxon has about 95% of the power of the t-test.
The Wilcoxon test requires as input the number of changed cases as an integer value in the top box and the sum of the negative ranks as a positive integer number in the second box, no standard deviation is required. The sum of the negative ranks is calculated in the sixth column of the above table. This column shows the rank number of the differences between the two measurements in the fourth column. This might not be immediately obvious and therefore we repeat the calculation here again. In column four we have the following differences, D, which re-ordered from small to large give: -2, -3, 4, 5, 5, 6, 7, 10, 13, 13. As we have 10 observations these observations should be apportioned the rank numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. In the case of ties (two or more similar values) we average the rank numbers. We have to do this for the (D) values 5 and 13, doing this results in the following ranking: 1, 2, 3, 4.5, 4.5, 6, 7, 8, 9.5, 9.5. In the case of an individual which did not change, i.e. the difference between the first and the second measurement is zero, this individual is excluded from the analysis. Thus, cases which did not change are not present in the data, table or analysis, neither in the ordering of the data nor in the number of cases.
Two of the rank numbers refer to negative values, the values -2, -3, with rank numbers 1 and 2 respectively. The sum of these values equals 3, the sum of signed ranks. Giving this value in the appropriate box shows that the expected value of the sum of signed ranks equals 27.5, with a standard deviation of 9.81. The associated z value of the difference between observed and expected sum of signed ranks equals 2.497 with a p-value of 0.00625. According to the Wilcoxon test, the difference between the respondents score before and after the intervention is statistically significant.
McNemar Change
Test. This test studies the change in a group of respondents
measured twice on a dichotomous variable. It is customary in that case to
tabulate the data in a two by two table. The following example from
Siegel illustrates how the test
works.
Preference Before Debate |
Preference after TV Debate | ||
Reagan | Carter | Total | |
Reagan | 27 | 7 | 34 |
Carter | 13 | 28 | 41 |
Total | 40 | 35 | 75 |
In this table one group of voters is asked twice about their voting intention, before and after a television debate. As can be seen, 13 respondents changed their preference from Carter to Reagan while 7 respondents changed their preference from Reagan to Carter. The test asks the question if the number of respondents changing is similar in the direction from Reagan to Carter as in the other direction. Thus, the question is if there is a statistically significant difference between the seven and the 13. Fill in the two change values (13 and 7) in the top two boxes (leave a zero in the bottom box), click the McNemar button, and the program provides you with the answers, probability value equals 0.179, 0.125 with Yate's continuity correction. The two candidates were not statistically significantly different in changing the voters preference. The program also gives you a Binomial alternative to the McNemar test, the probability of getting more or the same number out of a total number while the expectation was 50% one way and 50% the other way. Use the Binomial if the number of cases is small or if you have a cell with less than 5 observations. Note that the Binomial is a single sided test, use it if you want to test if the change goes into a particular direction and if the change is statistically significant in that regard. The Chi-square is a double sided test, no prior expectation with regard to direction. Double the p-value of the Binomial test to get a double sided test. The doubled Binomial p-value will be very close to the p-value of the Yate's Chi-square, mostly.
What is interesting is mostly not what happens inside the table, but what happens in the marginals (the green bit). As you can see, before the TV debate Carter had the support of 41 of the 75 voters, after the debate this decreased to 35. Carter lost the support of 6 voters, the net difference between the voters who changed in favour of Carter and the voters who changed in favour of Reagan.
One of the problems with the McNemar is that it requires knowledge of the inside of the table, while we are mostly interested in the marginals. Machin and Campbell propose that, in the case the inside of the table is not known, to estimate the inside of the table from the marginals. This presumes that the likelihood of changing from Carter to Reagan, or from Reagan to Carter, is independent of being a Reagan or a Carter supporter in the first place. SISA allows you to do this less preferred way of doing the McNemar also. Fill in two numbers of cases, one for each marginal, in the top two boxes, and a total number of cases in the bottom box. Click the McNemar button. (Agresti discussess this same topic in his example 10.1.3. He uses a t-test instead of the Chi-square or the Binomial alternative.)
The McNemar can be used as a alternative for the signs test. Place the number of positive changers in the top N+ box, and the number of negative changes in the second N- box. Disregard the number of non changers. In the top table above we would have 7 positive and 3 negative changers, the McNemar test gives a Chi2 for equal number of changers in both groups of 1.6 with a probability of 0.2059. The positive direction of the change might well have been caused by chance fluctuation.
The signs test including this McNemar version has the disadvantage that it is not a very powerful test. However, it is also a test with few assumptions and does not require that the data has particular characteristics. It can be used on interval, ordinal and nominal data. For example, to see if after some treatment more diseased people got cured (+1), then healthy people diseased (-1).