Summer 1990 // Volume 28 // Number 2 // Tools of the Trade // 2TOT2

Previous Article Issue Contents Previous Article

Controlling Error in Evaluation Instruments


Emmalou Van Tilburg Norland
Assistant Professor
Department of Agricultural Education and Leader, Evaluation
Ohio Cooperative Extension Service
Ohio State University-Columbus

Probably the most difficult and time-consuming task in designing and conducting any program evaluation is developing the instrument to collect the information. Faculty usually search for already-existing instruments which may "fit the bill," but often these instruments don't exist, or need major revisions before using. As we develop a data collection instrument, the most important concerns are that the instrument be reliable and valid.

Reliability and validity both indicate the extent to which error is present in the instrument. Reliability is an indication of the precision of the instrument - does it consistently measure whatever it measures? An instrument measuring reliably controls for random error in the measure. On the other hand, an instrument deemed valid is controlling for the systematic error in the measure. We can say that good validity assures that we're measuring what we've planned to measure.


Low reliability indicates that the score produced by the instrument, which represents the characteristic being measured (attitude, knowledge, reactions), may fluctuate greatly if we used the instrument again with the same group of individuals. Numbers representing characteristics aren't useful if they change each time we measure. For example, if an individual received a score of 50 out of 100 on a pesticide use knowledge test and then takes the test again, and receives a score of 98, the instrument used to collect that information would be unreliable. We'd have difficulty determining whether the individual's true score (actual knowledge of pesticide use) was high (98) or low (50).

Several methods are used to determine reliability of the instrument. All involve administering the instrument to a small sample of respondents during a pilot test. A common procedure using two administrations of the instrument is the test/re-test procedure. The instrument is given to the same group of individuals twice (about one week apart) and the two sets of scores are correlated, resulting in a coefficient of stability. A correlation above .7 would indicate acceptable reliability. Other techniques for measuring reliability are available (for example, Cronbach's Alpha).


Validity addresses the amount of systematic or "built-in" error contained in the measure. Here's an example of an instrument that would collect invalid information: a mail questionnaire designed to evaluate a nutrition program for low-income families written at a college reading level. Most likely, the instrument isn't just measuring knowledge of nutrition, but also, systematically measuring reading level. A thorough check for difficult words and clarity in the questions during instrument development should help alleviate the systematic error in the measure.

This thorough check, by a panel of experts (both experts in the subject matter and experts in measurement), addresses the issue of content validity. All instruments should be reviewed by such a panel to determine whether the content of the instrument is appropriate. Are all the questions related to the focus of the instrument? Are there questions missing? Are there inappropriate questions?

Another way to check for content validity is the use of a field test-administer the instrument to a small sample of respondents to gain their reactions to the questions. Are the questions too difficult? Do the individuals in the field test interpret the questions the same way as the designer had intended?

An additional validity concern is face validity. The question addressed in face validity is: "Does the instrument look like it's measuring what it claims to be measuring?" Face validity is an important issue for Extension faculty when collecting information from clientele, and can jeopardize results when individuals completing the instrument question the "true purpose" of the instrument. Determine face validity by asking a sample of respondents (perhaps during the field test) to comment on the instrument.

Final Points

There's little use in collecting information full of error. The validity of the instrument is determined before the reliability. The members of the field and pilot test can be the same, but the tests are conducted at different points in time (the field test first). The discussion on reliability and validity should appear in the instrument development section of any report.