Weighting


Purpose of the program
Installing the program
How to input data
What you find in the output field
Options
Warranty
License
Glossary
Questions and answers
References
Download

Purpose. This program calculates sample weights according to the cell weight procedure for use in the analysis of survey sample data. Two sets of weights will be presented, one set of larger weights Wi which gives the number of people in the population each respondent in a particular i-th strata represents, and a set of smaller weights wi which are centered around the value one ('1') and represents the weight each respondent has as a new weighted respondent. The larger weights multiplied with the sample for each strata add up to the population size, the smaller weights multiplied with the sample for each strata add up to the sample size. The scale factor of the small weights relative to the large weight is given by the program, multiplying a small wi weight with this factor will give you the value of the large weight Wi for the stratum concerned. The design effect, the design factor and the effective sample size for the resulting set of weights is determined. It is possible to specify a value above which extreme weights will be trimmed. This value pertains to the values of the smaller weights. The not trimmed weights will be recalculated to ensure that the larger weights represent population numbers and the smaller weights sample numbers.

TOP of page

Installing the program. Double click the program name on the SISA or Quantitative Skills website, select save and save the file to a directory of your choice on your computer. Run the program by double clicking the program name on your computer. You can make a short cut to the program by right clicking the program name on your computer, make the shortcut, right click the short cut, and drag or copy and paste the short cut to another location. This program does not have an installation program, what you download is the program itself. Remove the program by right clicking the program and removing it. The program does not have a uninstallation program, is also not required, as no additional files are installed on your computer. What you see is what you get and no more.

TOP of page

Input. The program expects a column of population data followed by a column of sample sizes, one row for each strata. The data can be typed directly into the input field or pasted from an external source into the input field.

The table below concerns 6 strata determined by two dimensions, gender with two categories and age with three categories. If you copy the yellow cells from this table and paste them into the weighting programs input field, press calculate, this will result in a set of weights which have a design effect of 1.4463 and an effective n of 1550.2. One of the small weights has a value above 2.

gender

age

population

sample

male

young

34705

165

female

young

25659

560

male

middle

21773

557

female

middle

27967

205

male

old

22471

185

female

old

29318

193

 

Additionally it is possible to give a value above which large weights will be trimmed. If the the data from the table above is used, and the weight of 2.423 in the first strata is trimmed to two, the resulting set of weights has a design effect of 1.3468 and an effective n of 1607.1 . Similarly, small weights can be raised by giving a value in the "raise at" box.

TOP of page

Output. The program outputs the population size, the sample size, the population size and the sample size for each strata as given by the user, for each strata the larger weights which refer to the number of people in the population each respondent represents, for each strata the number of weighted respondents each unweighted respondent represents, the scale factor which is the factor with which you have to multiply the smaller weights to obtain the larger weights, the design effect, which is the factor with which the variance of the sample will approximately increase because of the weighting, the design factor, which is the factor with which the standard error of the sample will approximately increase because of the weighting, the effective sample size, which is the reduced n after weighting

TOP of page

Options. There are two options. The split option splits the data after each n-th user specified strata. For this option to work the user needs to specify an integer value at which to split the data which is larger than one and smaller than the total number of strata. There are several reasons to split a file. First, if one wants to calculate the small weights wi or the design effect separate for subgroups in the sample. For example, the resulting design effect values can be used in SISA's t-test procedure to test differences between groups considering the different designs in each group. Normally the large weights Wi are invariant between subgroups, however, if one trims or raises weights the large weights Wi will become somewhat variant between groups, as the weights changed in subgroups can be different weights compared with the weights changed in the sample overall.

The decimals options sets how many decimals are used in numbers in the output

TOP of page

Warranty Although this program has been tested extensively, no program is ever bug or error free, and you should always check your results carefully. This software is provided "as is" and without warranties as to performance or merchantability or fitness for a particular purpose. The entire risk arising out of use or performance of the software remains with you. This program does not install any files on your computer or change settings without your permission .

TOP of page

License. This is free software. It can be freely distributed and installed. This program is provided to you by Quantitative Skills Research and Statistical Consultancy. Copyright: Quantitative Skills and Daan Uitenbroek PhD, 2008.

TOP of page

Glossary

Small “standardized” weights wi center around 1 and will be different for each strata. Gives the multiplication factor by which groups of respondents are “weighted”. The weights wi are calculated by dividing the population proportion for each strata by the sample proportion for the same strata, thus wi=Pi/pi. The sum of the wi*sample size of the strata concerned is equal to the total sample size, thus: S wi*ni=n+, whereby small n denotes sample.

Large “unstandardized” weights Wi are always larger than one and will be different for each stratum. Gives the number of people in the population each respondent represents. The weights Wi are calculated by dividing the population number for each strata by the sample number for the same strata, thus Wi=Ni/ni. The sum of the Wi*sample size of the strata concerned is equal to the total population size, thus: S Wi*ni=N+, whereby capital N denotes population.

Scale factor f is the factor by which one has to multiply the smaller weights to obtain the larger weights, Wi=fwi. f is simply calculated by dividing the total population size by the sample size, thus f=N+/n+.

Trimming weights. The purpose of trimming weights, specifying a maximum value weights might have, is to address the problem that a few cases sometimes become disproportionally important in estimating overall statistics because of them having a large sample weight. These large weights also make a large contribution to the design effect and correspondingly to the true sample variance. Trimming extreme weights will reduce the design effect and the variance of estimates and estimates can be determined more precise. However, trimming weights introduces bias; estimates are no longer centered around their true values but might be a bit off. The level of bias depends on the correlation between the values of the weights and the estimated variables. To determine if trimming weights is worth the bias one can study if trimming weights lowers the Mean Square Error of estimates (Potter, 1988). That hardly ever proves to be the case and trimming weights is mostly not advised. It must further be noted that the larger the sample becomes the smaller the rationale for trimming weights, although it is impossible to give an exact rule on this. After trimming a weight the other weights have to be recalculated to ensure that Σ wi*ni=n and Σ Wi*ni=N continue to be true. The program only allows you to trim standardized weights wi (so not Wi) which have a value larger than 1, so the input must be a value above one, which might be decimal.

Raising weights. Is the opposite of trimming weight and concerns the specification of a minimum value for the weights. It addresses the issue of a larger number of people becoming disproportionally unimportant in estimating overall statistics because of them having a small sample weight. The rationale for specifying a minimum value for weights is not as clear cut as for trimming weights, however, raising weights will reduce the design effect. The program only allows you to raise weights wi (so not Wi) which have a value smaller than 1, the input must be a value between zero and one.

Design effect (DEFF) as presented in this program is the factor by which the variance of an estimated mean increases after weighting the data with the suggested weights. The design effect in this program is calculated according to formula 4.2 as suggested by Kish (1992).

The design factor (DEFFT) is the amount by which the standard error of an estimated mean increases after weighting the data with the suggested weights. One minus the design factor times 100 (1-DEFFT*100) is the percentage by which a confidence interval around a mean increases due to the weighting. You can study the effect of the designfactor on the confidence interval around a mean by using SISA's One Mean procedure. The design factor is the square root of the design effect, DEFFT=√DEFF.

The effective n^ is an estimate of the n after considering the extra variance caused by the weighting The effective n^ is the unweighted sample n divided by the design effect, n^=n/DEFF.

Cell weighing. In cell weighing all categories for each strata and all crossings between categories are incorporated in the weighting. Thus if one has three main strata on which to weigh, with c,r,t categories respectively, than the number of cells 'C' and weights 'W' to consider equals C=W=r*c*t. An advantage of cell weighting is that it is relatively easy to apply and gives unbiased results. A disadvantage of cell weighing is that it requires relatively large samples, weighing variables should not be too skewed, and weights tend to have a large variance contained in them. Alternatives to cell weighing are raking and various regression methods (Kalton & Flores-Cervantes, 2003). A simple example of raking and some more discussion can be found on the SISA website.

TOP of page

Q&A. Questions and answers about survey data weighing you can find here.

TOP of page

References

Cochran W.G. Sampling Techniques, 3rd Edition. John Wiley, 1977

Kish L. Methods for Design Effects. J Off Stat 1995;11:55-77. (http://www.jos.nu/Articles/abstract.asp?article=11155)

Kish, L. Confidence intervals for clustered samples. Amer. Soc. Rev. 22 1957, 154-165.

Kalton G, Flores-Cervantes I. Weighting Methods. J Off Stat 2003;19:81-97. (http://www.jos.nu/Articles/abstract.asp?article=192081)

Kish L. Weighting for Unequal Pi. J Off Statistics 1992;8:183-200. (http://www.jos.nu/Articles/abstract.asp?article=82183)

Potter F. A study of procedures to identify and trim extreme sample weights. Proceedings of the Survey Research Methods Section, Am Stat Assoc 1990; 225-230. (http://www.amstat.org/Sections/Srms/Proceedings/papers/1990_034.pdf)

Potter F. A survey of procedures to control extreme sampling weights. Proceedings of the Survey Research Methods Section, Am Stat Assoc 1988. (http://www.amstat.org/Sections/Srms/Proceedings/papers/1988_083.pdf)

Sturgis P. Analysing Complex Survey Data: Clustering, Stratification and Weights. 2004. (http://sru.soc.surrey.ac.uk/SRU43.html)

Uitenbroek DG. Design, data weighing and designeffects in Dutch regional health surveys. 2008. (http://www.quantitativeskills.com/sisa/papers/paper7.htm)

TOP of page

Download

Download the program here by double clicking this link and saving the program to a directory of your choice.

TOP of page





Compare Car Rentals!
Help SISA and compare two rental cars!
An easy way to find the best option.
www.quantitativeskills.com








Weighting