Simple Interactive Statistical Analysis
RxC table concerns a basic two dimensional crosstable procedure. The procedure matches the values of two variables and counts the number of occasions that pairs of values occur. It then presents the result in tables and allows for various statistical tests.
Input.
One case per row individual level data has to be given in two columns, one column for the table rows and one for the table columns. Separators between the two columns can be spaces, returns, semicolons, colons and tabs. Any format within the columns will do. Both numbers, letters and words will be read and classified. Numbers are treated by name, thus 10 and 10.0 are in different categories and 5 is larger than 12. For table input you have to give the number of rows and columns in your table and the table is read unstructured, row after row. The input is presumed to consist of whole counted integer numbers without decimals or scientific notation. Seperators between numbers can be spaces, commas, dots, semicolons, colons, tabs, returns and linefeeds.
Descriptives.
Show Tables presents the usual cross tables, tables which counts the occurrence of combinations of each row with each column label. Separate tables give the cell, row and column percentages/probabilities of these combinations.
List or Flat Table gives in separate rows each unique combination of row and column labels and how often these combinations are counted. In two further columns the cell and column percentages are given. Flat table is the default format for most spreadsheet programs, it forms the basis for the pivot tables in M.S.excel, and it is the mostly preferred format of input for GLM analysis. To do the reverse, change a flat table into a cross table, in SISA or SPSS, enter the flat table (without the sums and totals) into the data input field, two columns of labels followed by a column of counts, and weigh the labels by the counts. For the other orientation, rows first, you have to turn the table.
Ordinal pairs form the basis of many analyses of ordinal association, such as Goodman and Kruskal's Gamma and Kendall's Tau. Concordant pairs consist of individuals paired with other individuals who score both lower on the column and lower on the row variable. Discordant pairs consist of individuals paired with other individuals who are lower on the one, and higher on the other variable. Tied pairs are individuals paired with others who have the same score on either the rows or the columns.
Statistics.
Chi squares are the usual nominal procedures to determine the likelihood of independence between rows and columns.
Goodman and Kruskal's Gamma and Kendall's Tau are based on the ordinal pairs, counted with the option above. You will get the sample standard deviations and p-values for the difference between the observed association and the expected (no) ordinal association of 0 (zero). Gamma is the difference between the number of concordant and discordant pairs divided by the sum of concordant and discordant pairs; Tau-a is the difference between the number of concordant and discordant pairs divided by the total number of pairs. Gamma usually gives a higher value than Tau and is (for other reasons as well) usually considered to be a more satisfactory measure of ordinal association. The p-values are supposed to approach the exact p-value for an ordinal association asymptotically, and the program shows that they generally do that reasonably well. But, beware of small numbers: the p-values for the gamma and Tau become too optimistic!
Goodman and Kruskal’s Lambda is an example of a Proportional Reduction in Error (PRE) measure. PRE measures work by taking the ratio of: 1) an error score in predicting someone’s most likely position (in a table) using relatively little information; with: 2) the error score after collecting more information. In the case of Lambda we compare the error made when we only have knowledge of the marginal with the reduced error after we have collected information regarding the inside of the table. Two Lambda’s are produced. First, how much better are we able to predict someone’s position on the column marginal: Lambda A. Second, how much better are we able to predict someone’s position on the row marginal: Lambda B. The program gives the proportional improvement in predicting someone’s score after collecting additional information.
To guess the name of a man on the basis of the weighted table below, and the only information we have is the distribution of mans' names in the sample, we would guess John, with a (38+34)/113*100=63.7% chance of an erroneous guess. However, if we know the name of mans' partners, we would guess John if the partner is Liz, with a (10+8)/41*100=43.9% chance of an error, Peter if the partner is Mary (44.7% errors), Steve if the partner is Linda (58.8%). The average reduction in errors in the row marginal, weighted by cell size (Lambda-B), equals 23.6%, the average weighted error rate in guessing a man's name after knowing the women's name, equals 63.7*(1-0.236)=48.7%. This 48.7% can also be calculated as: (10+8+6+11+8+12)/113. With a p-value of 0.00668 we significantly improve our probability of guessing a man's name correctly, after considering the name of the man's partner. Same for guessing a woman's name, only now you have to use the Lambda–A. Lambda is always positive, and the significance test always single sided, because information on the inside of the table will always lead to an improvement compared with knowing only the marginal.
Cohen's Kappa is a measure of agreement and takes on the value zero when there are no more cells on the diagonal of an agreement table than can be expected on the basis of chance. Kappa takes on the value 1 if there is perfect agreement, i.e. if all observations are on the diagonal and a row score is perfectly predictive of a column score. It is considered that Kappa values lower than 0.4 represent poor agreement between row and column variable, values between 0.4 and 0.75 fair to good agreement, and values higher than 0.75 excellent agreement. Kappa only works in square tables.
Bowker Chi-square tests to see if there is a difference in the scoring pattern between the upper and the lower triangle (excluding the diagonal) of a table. Each cell in the upper triangle is compared with its mirror in the lower triangle, the difference between the two cells is Chi-squared and summed. If cell i,j equals cell j,i the contribution of this comparison to the Bowker Chi square is zero. If the Bowker Chi-square is statistically significant the pattern of scoring in the upper triangle is different from the scoring in the lower triangle beyond chance. Note that the pattern of scoring between the two triangles is dependent on two factors. First, whether there is a 'true' difference in the pattern of scoring. Second, the level of marginal heterogeneity. Marginal heterogeneity means that the marginals are different; this increases the Bowker Chi-square. The Bowker Chi-square is the same as the McNemar Chi-square in a two by two table. Bowker Chi-square only works in square tables.
Options.
For Read weights a third column is added in the data input field and the third value is the case weight of the previous two values. The case weights in the third column must be numerical, if not the case including its previous two values is ignored. Weighted cross tables are produced and a weighting corrected Chi-square is presented . For a discussion of data weighting and the correction applied please read this paper.
Lowercase All. Lowercase all non numerical text characters for both the table rows and columns. Use this option if you want to categorize text data case insensitive.
Transpose/Turn Table. Change the rows into columns and the columns into rows.
Sort Descending. Sorts the values descending. Separate for rows and columns.
Show Rows or Columns limits the number of rows displayed. Particularly relevant if you request a large Table. Can also be used to exclude particularly high or low (after "Sort Descending") (missing) values from the analysis.
Solve problems into 99999.9. Change the data sequence -carriage return-line feed-tab- and the sequence -tab-carriage return-line feed- into 99999.9 if labels or delete the case if weigths. Wil mostly solve the problem of system missing values in data copied and pasted from SPSS. Might cause other problems.
Example.
If you copy and paste the following data into the input field:
john mary
peter linda
john liz
steve mary
john linda
steve mary
steve linda
You get the following table:
Table of Counts |
| linda | liz | mary | Sum |
john | 1 | 1 | 1 | 3 |
peter | 1 | 0 | 0 | 1 |
steve | 1 | 0 | 2 | 3 |
Sum | 3 | 1 | 3 | 7 |
Pearson: 3.111 (p= 0.53941). There is no statistically significant relationship between between boys names and girls names, although this conclusion has to be viewed with care as the table is based on very few observations.
You could count (in a flat table) how often each of the pairs of names occurs in a sample, and weigh each of the pairs with these counts.
john linda 10
john liz 23
john mary 8
peter linda 6
peter liz 11
peter mary 21
steve linda 14
steve liz 8
steve mary 12
And you get the following table:
Table of Weights |
| linda | liz | mary | Sum |
john | 10 | 23 | 8 | 41 |
peter | 6 | 11 | 21 | 38 |
steve | 14 | 8 | 12 | 34 |
Sum | 30 | 42 | 41 | 113 |
Weighted Pearson: 17.77 (p= 0.00137). After considering how often pairs of names occur in a sample there is a highly significant relation between certain boys and certain girls names.
Limitation.
The formatting and tabulating of large data sets might take a while in which case there might be warnings, just select "continue" and in the end the computer will get there.
The procedure is meant for relatively small tables. Number of cells is in principle limited to 120, but might be less dependent on your browser and other settings. Is also rather less with weighted data as more info has to be transferred.
All software and text copyright by SISA