December 2004 // Volume 42 // Number 6 // Feature Articles // 6FEA1

Previous Article Issue Contents Previous Article

Smith Lever 3(d) Extension Evaluation and Outcome Reporting--A Scorecard to Assist Federal Program Leaders

Abstract
The Government Performance Results Act requires that federal agencies and programs set goals and measure outcomes (USGAO, 1996); however, program managers find it difficult to make the transition from measuring program outputs to developing outcome-related measures (USGAO, 1997). The Hoffman EEOR Scorecard was developed to help federal Smith Lever 3(d) program leaders with this problem by blending the LOGIC Evaluation Model with the utilization of Extension evaluation and outcome reporting (EEOR) ideal practices. The utility of this question-based scorecard for all Smith Lever 3(d) programs is exemplified through its use with the CSREES Extension Integrated Pest Management Implementation Program.


Bill Hoffman
Doctoral Candidate
George Washington University
Washington, DC
hoffy@gwu.edu

Barbara Grabowski
Associate Professor of Education
Penn State University
University Park, Pennsylvania
bgrabowski@psu.edu


The Government Performance Results Act (GPRA) requires that all federal agencies and programs set goals and measure outcomes (USGAO, 1996). Goals that are the product of national leadership and stakeholder input help to clearly articulate program priorities and prevent mission creep. Measuring program outcomes can quantify productivity, determine efficiency and effectiveness of processes used, and highlight the usefulness of programs in terms of accomplishment of program goals.

For many program managers, the most difficult aspect of GPRA implementation is the transition from measuring program outputs to developing outcome-related program measures (USGAO, 1997). The United States Department of Agriculture's Cooperative State Research, Education and Extension Service (CSREES) is one of many agencies whose program managers have found this to be a challenging mandate.

CSREES administers funding for Extension programs that intend to help the citizenry put university research to practical use through various forms of educational programming (ECOP, 1997). Extension programming is one area where outcome measurement challenges have been documented (Nelson, 1999).

The Hoffman EEOR Scorecard of LOGIC model-based questions was developed to illuminate the utilization of Extension evaluation and outcome reporting (EEOR) ideal practices by Smith Lever 3(d) programs, one sub-set of CSREES Extension funded programming efforts. This scorecard was developed from an extensive review of the Extension program literature within the context of GPRA (Hoffman, 2003). This article provides a brief overview of this research, including an example of its findings for one Smith Lever 3(d) program: Extension IPM Implementation. The lead author of this publication is professionally responsible for the state reporting function of that program.

Review of Current Literature

Current literature from evaluation, GPRA implementation guidance, and Extension evaluation contributed to the development of the scorecard.

Evaluation Background

A central concept in Extension program evaluation and the GPRA is the differentiation between outcomes and outputs. Outcomes refer to results of program objectives that are defined by the underlying purpose of the federal investment (Nelson, 1999). They include variables such as improvement in agricultural profitability, increases in agricultural systems efficiency, enhanced environmental quality, and decreases in farm worker injuries. Outputs refer to the activities or efforts of a program used to produce outcomes (Nelson, 1999). They include variables such as number of training sessions held, the number of participants trained, the number of publications developed, or the number of farms visited.

Change agents such as Extension educators achieve outcomes directly through programming outputs and indirectly through secondary interpersonal educational networks that exist within social systems (Rogers, 1998). This includes program participants sharing information with peers and clients, which has the potential to multiply the effects of Extension educational activity. For this reason, Extension programming can be expected to achieve outcomes that exceed those that directly result from programming outputs.

Output information can help to contextualize outcome data by helping to explain the program's role in achieving these outcomes. However, output information in the absence of outcome data does not illuminate program effectiveness, efficiency, or productivity toward reaching an educational program's objectives (USGAO, 1996).

GPRA Implementation Guidance

The United States General Accounting Office distinguishes between different types of outcomes. "Ultimate outcomes" are those that represent the achievement of the underlying purpose of the federal investment (USGAO, 1998). An example of an ultimate outcome is decreased surface water pollution caused by dairy farming operations. Outcomes that contribute or lead to this ultimate purpose are known as "intermediate outcomes." An example of an intermediate outcome that could lead to the aforementioned ultimate outcome is the adoption of environmentally friendly manure management practices by dairy farmers.

If research supports a strong connection between intermediate and ultimate outcomes, the measurement of intermediate outcomes alone can be used to satisfy GPRA requirements (USGAO, 1998). These are commonly referred to a "proxy measures."

Currently used evaluation models in the instructional systems and Extension education evaluation fields make similar distinctions between outcomes and outputs as well as different types of outcomes. Examination of the LOGIC model can help to clarify these distinctions and provide guidance for federal Extension evaluation and outcome reporting.

Extension Evaluation

The University of Wisconsin's LOGIC model is pictured in Figure 1 (UWEC, 2002). The model has at its roots Kirkpatrick's four-level and Bennett's seven-level evaluation models (Kirkpatrick, 1959; Bennett, 1975).

Figure 1.
University of Wisconsin's LOGIC model. (Retrieved from http://www.uwex.edu/ces/pdande/copyright.html and reprinted according to guidelines from the publisher)

In short, the LOGIC Model states that inputs lead to outputs, which are either activities or participation, and those outputs lead to outcomes and impact in the short, medium, and long term.

The model defines three outcome types: Learning, Action, and Conditions. Though measurements of learning through pre-tests and post-tests of participants can be considered an intermediate outcome, data that describes how this learning is transferred to action is much more valuable (Houlton, 1996). Action outcomes include changes in behavior and adoption of practices that have resulted, in part, from the aforementioned learning. Action outcomes generally represent intermediate outcomes that may reveal progress toward ultimate outcome progress. Condition outcomes are advancements in social, economic, civic, and environmental conditions that are generally analogous to the "ultimate outcomes" described earlier.

Non-outcome categories of the LOGIC model include Inputs, Activity Outputs, Participation Outputs, External Factors, and Assumptions. Inputs of resources are invested to support learning activities (Bennett, 1975). The LOGIC model overcomes Houlton's criticism (1996) of Kirkpatrick's earlier work by acknowledging the role of external factors, which include new technologies and social pressures that can slow or accelerate practice adoption.

Finally, the LOGIC model acknowledges the importance of assumptions made by educators regarding how educational programming may influence outcomes. These assumptions include the mix of educational tactics and the proper audiences to target, which the educator perceives will provide the greatest impact within resource constraints. Though these non-outcome categories do not address outcomes themselves, they describe the process and strategy used by educators to achieve outcomes through input investment.

Methods

Based on the reviewed literature, three Extension evaluation and outcome reporting ideal practices were designated. From these, a series of LOGIC model-based questions, that is, a scorecard, was developed to examine their utilization. This section discusses these activities and outlines limitations of the research.

Extension Evaluation and Outcome Reporting (EEOR) Ideal Practices

Guidance provided by the GAO regarding GPRA implementation and the nature of Extension work suggests three Extension evaluation and outcome reporting (EEOR) ideal practices to be followed by federal program managers:

EEOR Ideal Practice #1--National Outcome Definition and Measurement: Define and measure national ultimate program (condition) outcomes, using research-supported proxies (learning and action outcomes) where appropriate.

Ideal EEOR Practice #2--Sub-National (State) Outcome Reporting: Have a user-friendly system for individual awardees (henceforth referred to as "state programs") or groups of state programs to report on nationally defined outcomes or proxies directly. Locally defined outcomes could be used and reported if they are consistent with and complementary to nationally defined and measured goals.

EEOR Ideal Practice #3--Sub-National (State) Non-Outcome Reporting: Report non-outcome data (outputs, inputs, external factors, assumptions) to contextualize outcomes, not as program results.

Articulating desired national outcomes and measuring progress toward them helps to clarify programmatic purposes. Measurement of intermediate (action) outcomes can be substituted for ultimate (condition) outcomes if there is a strong, research-supported link between the two phenomena. An example is measuring the action phenomenon of the number of servings of fruits and vegetables consumed per day as a proxy for the health benefits associated with this activity.

National ultimate and intermediate outcomes can often be measured through third party data, such as surveys conducted by other agencies of the federal government. A user-friendly state outcome reporting system can provide evidence of a local program's role in attaining national outcomes. Finally, non-outcome data such as number of participants and external factors can be useful to contextualize reported outcomes. While non-outcome data from all of these categories are of some potential use, this data should be used to contextualize rather than replace outcome measurement.

The aforementioned three EEOR ideal practices would not necessarily ensure complete GPRA compliance themselves. However, their utilization would go a long way toward overcoming an impediment to GPRA implementation: Defining and measuring outcome goals instead of outputs.

Development of an Evaluation Scorecard

Simply asking "does the program utilize practice x?" would not yield the depth of answer desired. The LOGIC model was used to develop the Hoffman EEOR Scorecard to assess how and in what ways these programs utilize these three EEOR ideal practices. This scorecard is shown in Table 1. This table also references the components of the LOGIC model that the questions intend to illuminate.

Table 1.
The Hoffman EEOR Scorecard for Use in Illuminating EEOR Ideal Practice Utilization

EEOR Ideal Practice

Evaluated by the following questions...

...Based on the following LOGIC model components

EEOR Ideal Practice #1 -- NATIONAL OUTCOMES: Define and measure national ultimate program (condition) outcomes, using proxy measurements where appropriate

Does the national program leadership articulate the ultimate national outcome(s) desired by the program in terms of measurable social, economic, civic, or environmental conditions?

Condition Outcomes

Does the national program leadership measure progress toward these outcomes directly on a national level?

Condition Outcomes

Does the national program leadership measure progress toward these outcomes indirectly through the use of proxy measurements (learning or action outcomes) that are measured on a national level?

Learning & Action Outcomes

EEOR Ideal Practice #2 -- Have a user-friendly system for individual or groups of state programs to report on nationally defined outcomes or proxies directly. Locally defined outcomes could be used and reported if they are consistent with and complementary to nationally defined and measured goals

Are state level programs asked to provide data on nationally defined outcomes?

Learning & Action Outcomes

Are state level programs allowed/encouraged to define and report on their own state level outcomes?

Does reported data (optional or mandatory) reflect changing conditions, action, and/or participant learning?

Can outcome data from these state level programs be aggregated to produce national statistics?

Do these data provide evidence of the program's contribution to progress toward national objectives?

EEOR Ideal Practice #3 -- Report non-outcome data to contextualize outcomes, not as program results

Are state level programs asked to provide data on nationally defined outputs?

Activity & Participation Outputs

Are state level programs allowed/encouraged to define and report on their own state level outputs?

Does reported data reflect program activities or program participation?

Can output data from these state level programs be aggregated?

Do these data provide evidence of the program's contribution to progress toward national objectives?

Are state level programs asked to provide data on additional funding sources and levels (other federal funds, state funds, local funds) that support the program?

Inputs

Are state level programs asked to provide narratives that could provide a place to report program assumptions and external factors (context) that could affect program results?

Assumptions & External Factors

Is output, input, assumption, & external factor reporting used as a complement to or as a substitute for outcome reporting?

Differentiation of Outcomes & Non-Outcomes

Limitations of the Research

It is important to note that these questions were designed to illuminate the utilization of selected EEOR ideal practices that are consistent with GPRA compliance. Utilization of these practices alone will not guarantee complete GPRA compliance.

Answers were obtained primarily through publicly available extant data including requests for applications, plans of work, annual reports, and other components of CSREES reporting systems. To supplement this, some CSREES National Program Leaders were consulted to provide further clarification. This focus on extant data had the potential to produce less than exhaustive information regarding the program's evaluation and results reporting efforts, particularly if a majority of these efforts take place "behind the curtain" and are not publicly documented.

Abridged Example Report of Findings

The original research examined the following programs: Extended Food and Nutrition, Children, Youth and Families at Risk, Extension Integrated Pest Management, Farm Safety combined with Youth Farm Safety Certification, Extension Indian Reservation Program, Sustainable Agricultural Research and Extension, and Regional Rural Development. Due to the space limitations of this forum, this article provides an abridged example of findings for the Extension Integrated Pest Management (IPM) Program. This includes a brief explanation of the IPM program and examination of compliance with each of the three EEOR practices. To aid the reader, LOGIC model components are italicized when mentioned in the regular text and included in parentheses when referred to indirectly.

Explanation of IPM Program

The Integrated Pest Management Program teaches common pest management principles to a wide variety of audiences. CSREES provides formula funding to states and territories to further these efforts. One of the co-authors works directly with the state outcome-reporting element of this program.

Program Utilization of EEOR Ideal Practice #1: National Program Outcome Definition and Measurement

The IPM Program's utilization of practice #1 is summarized in Table 2.

Table 2.
IPM Program Utilization of EEOR Ideal Practice #1 Based on Inquiry Findings

Ideal Practice

Logic Model Investigative Question

Fulfilled?

Utilization
Assessment

EEOR Ideal Practice #1 - NATIONAL OUTCOMES:

Define and measure national ultimate program (condition) outcomes, using proxy measurements (learning & action outcomes) where appropriate

Define and articulate condition related outcomes

Yes

Defines action outcomes (proxies)

New measures are currently being developed

Measure progress on condition related outcomes directly

No

Measure progress on learning or action proxies

Yes

The Smith Lever IPM Program articulates four broad national goals:

  1. To safeguard human health and the environment through improved utilization of integrated pest management strategies and systems (conditions outcomes through action outcomes).

  2. To increase the range of benefits obtained through improved utilization of integrated pest management strategies and systems (condition outcomes through action outcome).

  3. To increase the implementation of effective integrated pest management strategies and systems (action outcome).

  4. To enhance collaborations among stakeholders interested in the development and implementation of improved integrated pest management strategies and systems (activity output to improve action outcomes). (Reprinted by permission of CSREES from the Performance Planning and Reporting Web site, 2002.)

From 1995 to 2000, the national program leadership defined and measured progress toward the intermediate outcome of IPM adoption (action outcome) through third party data. A goal was set of 75% nationwide IPM adoption by the year 2000, which is a research-supported proxy for reduced pesticide use.

The program is currently concluding the stakeholder input phase of a process to define new national measures with a stronger emphasis on condition outcomes (Hoffman, 2002). These new national measures are being developed in response to a 2001 General Accounting Office report that urged a stronger tie between program objectives and reductions in pesticide use (GAO, 2001). Results of this process will influence future measurement of conditions and action outcome proxies produced and measured nationally by the program.

Program Utilization of EEOR Ideal Practice #2: IPM State Outcome Reporting

The IPM Program's utilization of practice #2 is summarized in Table 3.

Table 3.
IPM Program Utilization of EEOR Ideal Practice #2 Based on Inquiry Findings

Ideal Practice

Logic Model Investigative Question

Fulfilled?

Utilization
Assessment

EEOR Ideal Practice #2 - STATE OUTCOMES:

Have a user-friendly system for individual or groups of state programs to report on nationally defined outcomes or proxies directly. Locally defined outcomes could be used and reported if they are consistent with and complementary to nationally defined and measured goals.

Do States Report On:

Yes-If currently proposed guidelines are adopted

Nationally defined outcomes

Yes

Locally defined outcomes

Yes

Changing conditions, actions, and/or learning

Conditions

Actions

Data that can be aggregated

No

Evidence of contribution toward national objectives

Yes

Statewide program coordinators choose commodities or pest management situations important to their state as areas of program emphasis and then decide which outcome and non-outcome indicators best match the efforts on that commodity. Maine may choose to report on pest management efforts in potatoes, sweet corn, and apples. Michigan could choose to report on broccoli, blueberries, and potatoes (all five commodities are grown in both states). The two states also choose to report progress using one or all of the following 16 Smith Lever IPM Program indicators of outcomes, outputs, inputs, and processes:

  1. Number of production units or entities using IPM (action outcome),
  2. Transition from high risk to lower risk pesticides (action outcome),
  3. Total amount of high risk pesticides applied (action outcome),
  4. Diversity of IPM practices adopted (action outcome),
  5. Economic benefit obtained (condition outcome),
  6. IPM Personnel employed (input),
  7. Satisfied IPM clientele (participation output),
  8. IPM strategies and systems validated (activity output),
  9. IPM educational materials delivered (activity output),
  10. People participating (participation output),
  11. Producers trained (participation output),
  12. Private sector personnel trained (participation output),
  13. Public sector personnel trained (participation output),
  14. Other individuals trained (participation output),
  15. Public events involving collaborations (activity output), and
  16. Non-federal dollars leveraged (input). (Reprinted by permission of CSREES from the Performance Planning and Reporting Web site, 2002)

Though numbers 1-5 can provide evidence of the individual state program's role in achieving national outcomes, this commodity and indicator selection latitude often prevents meaningful outcome data aggregation. This lack of data aggregation is important for two reasons.

First, if the national leadership of the program would like to assess its outcomes related to blueberries, this data would be incomplete unless all major blueberry-producing states choose to report on that commodity. Second, even if all major blueberry-producing states choose to report on the commodity, this data would be difficult to compile unless each state self-selected the same outcome indicators. If the program were trying to "roll up" the state outputs to come up with national outcome data, this would present a serious problem. The fact that the national program leadership measures national outcomes using third party data makes this lack of aggregation somewhat less important.

Furthermore, it is possible under current guidelines for a state to select only from indicators 6-16, thus not reporting on outcomes. Efforts are currently underway to require at least one outcome indicator for each program and encouraging one outcome indicator from each area of emphasis.

Program Utilization of EEOR Ideal Practice #3: IPM State Non-Outcome Reporting

The IPM Program's utilization of practice #3 is summarized in Table 4.

Table 4.
IPM Program Utilization of EEOR Ideal Practice #3 Based on Inquiry Findings

Ideal Practice

Logic Model Investigative Question

Fulfilled?

Utilization
Assessment

EEOR Ideal Practice #3 - STATE NON-OUTCOMES:

Report non-outcome data to contextualize outcomes, not as program results.

Nationally defined outputs

Yes

A few state programs use non-outcome data as program results, this window will close if proposed guidelines are adopted

Locally defined outputs

Yes

Activities and/or participation

Both

Data that can be aggregated

No

Evidence of progress toward national objectives

Yes

Input data

Yes

Assumptions and/or external factors

Both

Used as a complement to or as a substitute for outcome reporting

Usually used to complement outcomes but outcomes are absent in some state reports

For the crops identified, programs can choose to report on non-outcome indicators numbers 6-16 from the 16-item list above.

In addition to this crop-specific data, the state programs are asked to provide program wide narratives and resource information. Five-year plans of work and annual reports are used to report assumptions and external factors in narrative form, and alternate funding (input) data in numerical form.

As mentioned earlier, it is possible under current guidelines for non-outcome data to completely replace outcome measurement on the state level through local indicator selection. The national program leadership is currently attempting to close this loophole.

Results from Using the Scorecard for the Extension IPM Program

When the Extension IPM Program's Extension outcome and reporting practices were compared to the Ideal EEOR practices using the Hoffman EEOR Scorecard, three major areas for further improvement were identified:

  • Demarcation of outcome measures verses non-outcome supporting data for state level reporting to ensure the collection of both;

  • Use of third party data for use as an efficient measurement tool; and

  • Multi-state cooperation for goal setting and outcome measurement to foster more meaningful data collection, reporting, and aggregation.

Current proposed guidelines designed to separate outcome measures from non-outcome supporting data should be implemented as soon as practical, and/or this tactic should be a part of any future proposed changes in the state evaluation and reporting system. As new national outcome measures are formed, every effort should be made to seek out third party data as guided by the scorecard at the federal and state levels to improve the overall quality of evaluation and outcome reporting and ease the reporting burden on individual awardees. As these measures are more closely linked to condition outcomes, data availability on condition outcomes and closely linked action outcome proxies should be thoroughly investigated. Finally, cooperation among states to coordinate outcome measurement could provide greater opportunities for data aggregation and more meaningful results interpretation.

Conclusion

When a judge examines a group of dogs, chickens, or cows at an animal show, he or she typically compares each member of the class to a theoretical ideal animal. Regardless of their ranking within the class, the owners and breeders of those animals are given valuable information on ways to improve their kennel, flock, or herd so successive generations of their stock may approach that ideal. The three EEOR ideal practices described in this article, along with the scorecard to evaluate their utilization, are not unlike that theoretical ideal animal that is used for comparisons.

Using the Hoffman EEOR Scorecard and making such comparisons can help Smith Lever 3(d) program leaders identify how closely their practices come to the three EEOR ideal practices. Such a comparison is potentially useful in diagnosing where current evaluation efforts could be improved and the general direction that this improvement could take. This information can help program leaders to:

  • Alter the program's overall evaluation and outcome reporting framework to further GPRA compliance,

  • Identify third party data that can serve as an outcome measurement indicators for ultimate programmatic outcomes through national goal clarification,

  • Draw clear distinctions between outcome versus non-outcome measurements that could foster clear communications to individual awardees, and

  • Ensure that reporting efforts undertaken by individual awardees and/or groups of awardees complement national measurement efforts.

For the example program documented in this article, Extension IPM, this comparison yielded three major areas for further improvement:

  • Demarcation of outcome measures verses non-outcome supporting data for state level reporting to ensure the collection of both;

  • Use of third party data for use as an efficient measurement tool; and

  • Multi-state cooperation for goal setting and outcome measurement to foster more meaningful data collection, reporting, and aggregation.

Based on examination by this scorecard, the Extension IPM program is pursuing these three areas of potential improvement at this time.

References

Bennett, C. F. (1975). Up the hierarchy. Journal of Extension [On-line], 13(2). Available at: http://www.joe.org/joe/1975march/index.html

Extension Committee on Organizational Policy (ECOP) (1997). Strategic directions of the cooperative extension system. Retrieved May, 2002 from: http://www.reeusda.gov/part/gpra/direct.htm

GPRA Page (CSREES Web site). (n.d.). Retrieved May, 2002 from: http://www.reeusda.gov/part/gpra/gprahome.htm

Hoffman, W. (2003). Smith lever 3(d) program evaluation and outcome reporting -- a federal perspective. Unpublished master's thesis. Penn State University.

Houlton, E. F., III (1996). The flawed four-level evaluation model. Human Resource Development Quarterly, 7.

Kirkpatrick, D. L. (1998). Evaluating training programs (2nd ed.). San Francisco: Berrett-Koehler.

Minimum standards for Extension IPM implementation program annual reporting. (2002). (CSREES Working Paper) Washington, DC.

Nelson, D. E. (1999). Generic observations with regard to the 2000-2004 plans of work. Retrieved May, 2002 from: http://www.reeusda.gov/part/areera/generic.htm

Richardson, J. G. (2001). Proactively addressing accountability in Extension. The Forum [On-line], 6, 2. Retrieved July, 2002 from: http://www.ces.ncsu.edu/depts/fcs/pub/2001sp/richardson.html

North Carolina State University (NCSU). (n.d). The performance planning and reporting system. Retrieved June, 2002: http://www.pprs.info

Rogers, E. M. (1995). Diffusion of innovations (4th ed.). New York: The Free Press.

Stake, R. E. (1975). Evaluating the arts in education, A responsive approach. Columbus: Merrill.

Taylor-Powell, E., Steele, S., & Douglah, M. (1996). Planning a program evaluation. Retrieved July 2002, from University of Wisconsin-Extension-Cooperative Extension, Program Development and Evaluation Unit Web site: http://www1.uwex.edu/ces/pubs/pdf/G3658_1.PDF

United States General Accounting Office. (1996). Executive guide: Effectively implementing the government performance results act. GAO/GGD-96-118. Washington, DC: USGAO.

United States General Accounting Office. (1997). Managing for results: Analytic challenges in measuring performance. GAO/HEHS/GGD-97-138. Washington, DC: USGAO.

United States General Accounting Office. (1998). Managing for results: Measuring program results that are under limited federal control. GAO/GDD-99-16. Washington, DC: USGAO.

University of Wisconsin Cooperative Extension (UWCE). Evaluation logic model (2002). Retrieved November, 2003 from http://www.uwex.edu/ces/pdande/evaluation/evallogicmodel.html

United States General Accounting Office. (2001). Agricultural pesticides: Management improvement needed to further promote integrated pest management. GAO-01-815. Washington DC: USGAO.

Worthen, B. R., Sanders, J. R., & Fitzpatrick, J. L. (1997). Program evaluation: Alternative approaches and practical guidelines, 2nd ed. Boston: Addison, Wesley, Longman.