June 2016

June 2016 // Volume 54 // Number 3 // Feature // v543a1
Better Crunching: Recommendations for Multivariate Data Analysis Approaches for Program Impact Evaluations
Abstract
Extension program evaluations often present opportunities to analyze data in multiple ways. This article suggests that program evaluations can involve more sophisticated data analysis approaches than are often used. On the basis of a hypothetical program scenario and corresponding data set, two approaches to testing for evidence of program impact are compared. These approaches are (a) a bivariate approach involving contingency table analysis (chisquare, Kendall's tau tests) and (b) a multivariate approach involving logistic regression. Both approaches address the primary evaluation questions, but the multivariate approach introduces additional variables, allowing for a more comprehensive understanding of program dynamics. Multivariate approaches can enhance insights about programs and increase opportunities for dissemination of research results.
Introduction
Data analysis is a critical aspect of the Extension program evaluation process. As evaluators gain experience with their craft, they become better acquainted with options for data analysis and appreciate that in most evaluation settings, no single correct approach to analyzing data exists. However, although a variety of data analysis approaches often can be used to sufficiently answer the fundamental questions in an evaluation, some approaches may be more powerful or more enlightening than others. This principle is the basis for my recommendations in this article.
Evaluators of Extension programs often conduct data analyses by using a fairly basic bivariate analysis framework, examining only two variables per statistical analysis. For example, ttests may be used to compare pretest scores with posttest scores or intervention group scores with comparison group scores. These approaches are statistically valid (if assumptions are met), but there are often opportunities for more powerful approaches that can provide a multifaceted understanding of what happened as a result of a program. Multivariate approaches, in particular, can take into account multiple variables to shed light on the range of factors that influence participant outcomes. Thereby, the capacity of Extension program evaluations to provide evidence of program impacts (Workman & Scheer, 2012) may be enhanced.
In this article, I illustrate recommendations for using multivariate data analysis approaches by describing a hypothetical program scenario and showing how relevant data might be analyzed both in a standard way and in a multivariate way that involves regression analysis. This article is not intended to be a comprehensive primer on how to conduct regression analyses but rather is meant to stimulate readers to think about the possibilities inherent in multivariate approaches and the ways they can enhance understanding of Extension programs. Interested readers can consult more comprehensive sources (suggested later in this article) for specific details.
Hypothetical Scenario
For demonstration purposes, I created a hypothetical educational program and program evaluation scenario and corresponding data set.
Educational Program
The program was an eightsession Supplemental Nutrition Assistance Program Education (SNAPEd) course for adults, delivered at 10 local sites over 1 year. One important variation across the sites was the program delivery schedule: monthly sessions versus weekly sessions. For each site, the decision about the program delivery schedule was based on factors relating to the specific instructional setting, such as the availability of facilities or instructional staff.
Evaluation Setting and Data Collection
The data set included 200 study participants, of whom 120 participated in the SNAPEd course (program group) and 80 comprised a comparison group. Measurement on the outcomes of interest occurred via a questionnaire administered in a group setting. The program group completed a pretest questionnaire at the beginning of the first class session and a posttest questionnaire at a group event 2 months after the program's conclusion. The 2month delay in posttest administration was considered the most appropriate timing for capturing the occurrence of behavioral change among participants. The comparison group members participated in an Extension program unrelated to nutrition topics, and these individuals completed questionnaires in settings similar to those for the program group.
Outcomes of Interest and Evaluation Questions
Although such a program could be associated with numerous target outcomes of interest, I highlighted one: the participant's use of nonfat or lowfat milk (hereafter referred to as lowfat milk) in place of whole milk. The use of lowfat milk was addressed on the questionnaire with a single dichotomous item, which was coded with the values 0 (usually does not use lowfat milk) and 1 (usually uses lowfat milk).
There were two primary evaluation questions to be addressed by the analysis:
 Did participation in the SNAPEd course increase participants' use of lowfat milk, as compared to those in a group that did not participate in the SNAPEd course?
 Was a participant's postprogram use of lowfat milk associated with the number of program sessions he or she attended? That is, was there a program dosage effect?
There could have been numerous other evaluation questions related to behavioral outcomes of interest, but I chose to address these two questions. For the first question, the primary predictor variable (independent variable) is group membership (condition): program group or comparison group. However, in trying to ascertain what might account for participants' use of lowfat milk, several secondary evaluation questions focusing on other potential predictor variables could be applied. Two of these variables, both related to individual participant characteristics, were gender (78% of the sample was female, 22% was male) and previous experience in SNAPEd classes (44% of the sample had previous experience, 56% did not). Another predictor variable, related to program delivery, was the session schedule (monthly vs. weekly).
Options for Data Analysis
I implemented two data analysis strategies to answer the primary evaluation questions. I used a standard bivariate strategy to test for differences in the outcome variable (use of lowfat milk) between the program group and the comparison group at pretest and again at posttest to determine whether the relative statuses of the groups changed. I used a multivariate strategy to analyze the posttest outcome as the dependent variable in a regression analysis that simultaneously examined program participation and several other predictors of interest. The analyses in this simulation were conducted through use of the statistical package SPSS 22.
A note about clustering effects: In this evaluation, participants were clustered, or nested, within their local program sites. Cluster variables such as program site can create biases in analyses because participants within a cluster tend to be more similar to each other than to participants in other clusters. The clustering effect is measured with the intraclass correlation coefficient (ICC), and when clustering exists, it can be adjusted for in the statistical analysis (e.g., McCoach & Adelson, 2010). For simplicity's sake, for this scenario I assumed that the clustering effect within the sites was found to be very low (ICC less than .05) so that statistical adjustment was not required.
Testing for Program Impact: A Bivariate Strategy
The bivariate analysis approach I used to answer the first primary evaluation question—illustrated by Tables 1 and 2—involved the following steps:
 Compare the two groups at pretest to determine whether there is a significant difference between them in their scores on the target outcome variable. If the groups are reasonably equivalent, there is not a statistically significant difference.
 Compare the groups again at posttest to see whether they differ. If the program has been effective in creating behavior change relating to use of lowfat milk, the proportion of the program group that uses lowfat milk will be higher than the corresponding proportion of the comparison group.
When an outcome variable is continuous, often the preferred analytic approach is to compare the two groups' change scores (mean pretesttoposttest change in the program group versus that same change in the comparison group). In this scenario, with a dichotomous outcome variable, a change score approach would be somewhat more complicated to present, so for illustrative purposes, I am presenting comparisons at pretest and comparisons at posttest separately.
Pretest Analysis
As shown through the use of a contingency table (see Table 1), at pretest, 34.2% of the program participants (41 of 120) reported that they usually use lowfat milk, compared to 33.8% of the comparison group members (27 of 80). Because these data consisted of counts and proportions, they could be analyzed through the use of a chisquare statistical test. The chisquare analysis showed that the percentages in the two groups were not significantly different from each other. I therefore concluded that the groups did not differ with regard to their use of lowfat milk at pretest.
Score on pretest item  
Group 
0 (usually does not use lowfat milk) 
1 (usually uses lowfat milk) 
Total 
SNAPEd program group 
79 65.8% 
41 34.2% 
120 100% 
Comparison group 
53 66.3% 
27 33.8% 
80 100% 
Total 
132 66.0% 
68 34.0% 
200 100% 
Note. Chisquare test: X^{2} = 0.004 (1 df), ns. 
Posttest Analysis
At posttest, the percentages had changed, as shown in Table 2. At that time, 65% of the program participants (78 of 120) reported that they usually use lowfat milk, compared to 45% of the comparison group (36 of 80). The chisquare analysis showed that this difference in proportions was statistically significant (p = .005), leading me to conclude that the program group proportion was statistically significantly higher than the comparison group proportion.
Score on posttest item  
Group 
0 (usually does not use lowfat milk) 
1 (usually uses lowfat milk) 
Total 
SNAPEd program group 
42 35.0% 
78 65.0% 
120 100% 
Comparison group 
44 55.0% 
36 45.0% 
80 100% 
Totals 
86 43.0% 
114 57.0% 
200 100% 
Note. Chisquare test: X^{2} = 7.834 (1 df), p = .005. 
Interpretation of the Pretest and Posttest Analyses
To summarize the findings, the two groups were about equal at pretest, whereas the program group was higher than the comparison group at posttest. It would be justified to conclude that the SNAPEd course had a positive impact on participants' use of lowfat milk. Even though the comparison group appeared to increase its use of lowfat milk as well (45% of participants at posttest, compared to only 33.8% at pretest), the difference in proportions between the two groups clearly favored the program group.
Analysis of Program Dosage Effect
Table 3 illustrates the bivariate approach I used to analyze the effect of program dosage. For the 120 individuals in the program group, the variable number of sessions attended was collapsed from eight categories to four categories (1–2, 3–4, 5–6, 7–8), which were crosstabulated with the posttest scores on the lowfat milk use item. Inspection of the table suggests that a strong relationship existed, with higher proportions of respondents reporting that they use lowfat milk as the consistency of attendance increased. The chisquare test for this distribution was statistically significant (p < .001), indicating that the percentages across the four categories of program attendance were different from one another. Further, a common directional test used with ordered variables, Kendall's tauc, showed significance as well.
Score on posttest item  
Number of program sessions attended 
0 (usually does not use lowfat milk) 
1 (usually uses lowfat milk) 
Total 
1–2 
16 64.0% 
9 36.0% 
25 100% 
3–4 
16 59.3% 
11 40.7% 
27 100% 
5–6 
6 15.4% 
33 84.6% 
39 100% 
7–8 
4 13.8% 
25 86.2% 
29 100% 
Total 
42 35.0% 
78 65.0% 
120 100% 
Note. Chisquare test: X^{2} = 28.555 (3 df), p < .001. Directional test: Kendall's tauc = .474, p < .001. 
Thus, I can conclude that the number of sessions attended was indeed associated with participants' adoption of the desired behavior—use of lowfat milk. However, note that this bivariate analysis does not take into account participants' pretest scores, which are almost always a powerful predictor of posttest scores. Thus, an important limitation of this approach is that it cannot address the question of whether the effect of attendance is independent of participants' pretest statuses.
Testing for Program Impact: A Multivariate Strategy
Using a multivariate regression approach, I examined the ability to predict participants' posttest scores on the basis of several predictor variables, considered simultaneously. Once again, the predictor variable of greatest interest is condition (program group or comparison group). The test of this variable addressed the question of whether the program influenced participants' behaviors and was comparable to the chisquare tests described in Tables 1 and 2. But I also examined the predictive power of several other individuallevel characteristics, which I entered as additional independent variables in the regression. These variables included pretest score, gender, age, and previous SNAPEd class experience. For the analysis of effect of program dosage, predictors of interest included the fourcategory number of sessions attended variable, session frequency (program delivery schedule—monthly vs. weekly), and pretest score. These analyses showed the independent effect of each predictor variable, controlling statistically for the others.
For these analyses, I used a form of regression known as logistic regression, which is appropriate for use when the dependent variable is dichotomous (i.e., taking on only two possible values). For outcome variables that are ordinal or interval (Boone & Boone, 2012), the appropriate approach is often multiple linear regression, as long as several basic assumptions are met regarding how the scores are distributed within the sample (Kahane, 2007; Kuethe & Borchers, 2012). Regression analyses can be conducted using almost any major statistical package, such as SPSS (see, e.g., Aldrich & Cunningham, 2015), Stata (Acock, 2014), SAS (Delwiche & Slaughter, 2012), and others.
Logistic regression produces a statistic called an odds ratio for the different categories of each predictor variable (see, e.g., Institute for Digital Research and Education, 2015). The calculation and interpretation of the odds ratio for this scenario can be illustrated as follows: With the predictor variable being SNAPEd course participation (Yes = 1, No = 0) and the outcome variable being use of lowfat milk (Yes = 1, No = 0), the odds ratio for the association between these two variables was equal to the odds of using lowfat milk (lowfat use = 1) for participants (participation = 1) divided by the odds of using lowfat milk (lowfat use = 1) for nonparticipants (participation = 0).
If the odds ratio were greater than 1.00 and statistically significant, it would mean that the likelihood of using lowfat milk was greater for participants than for nonparticipants. By contrast, if the odds ratio were less than 1.00 and significant, it would mean that the likelihood of using lowfat milk was lower for participants than for nonparticipants.
In this example, nonparticipation is known as the referent, or comparative, category of the predictor variable. For predictor variables with more than two categories, one category is selected as the referent, and there is an odds ratio for each of the other categories.
I addressed the two primary evaluation questions by using two separate logistic regression analyses. The analysis of program impact included all 200 cases in the sample. The analysis of program dosage effect included only the 120 individuals who participated in the SNAPEd course.
For interested readers, the SPSS commands and output tables for these regression analyses are reproduced in the Appendix.
Regression 1: Testing for Program Impact
The results from the first regression analysis are displayed in Table 4, which illustrates a standard way of presenting logistic regression results. For each predictor variable, Table 4 includes the number of responses associated with each category, the odds ratio for each category (other than the referent categories), the significance level of the odds ratio, and the 95% confidence interval (upper and lower bounds) for the odds ratio.
Turning first to the most important statistical test, that of condition, Table 4 shows that the results were statistically significant (p = .003). Other significant predictor variables were pretest score (p = .006) and previous SNAPEd class experience (p = .013). The odds ratios for gender and age group were nonsignificant.
The statistically significant odds ratio for condition can be interpreted as follows: The odds that a participant in the program group will "usually use lowfat milk" are 2.55 times greater than the odds that a person in the comparison group will "usually use lowfat milk." A common misperception is that an odds ratio is a comparison of actual probabilities (i.e., program participants are 2.55 times as likely to use lowfat milk as nonparticipants), but that interpretation is incorrect. See the "Stata FAQ" page on the Institute for Digital Research and Education website (http://www.ats.ucla.edu/stat/stata/faq/oratio.htm) for a good description of the difference.
Predictor variable  n  OR  Sig.  95% CI 
Condition  
Comparison group  80  Referent  
Program group  120  2.55  .003**  1.38, 4.71 
Pretest score  
0 (usually does not use lowfat milk)  132  Referent  
1 (usually uses lowfat milk)  68  2.54  .006**  1.31, 4.94 
Gender  
Female  156  Referent  
Male  44  .649  .244  .31, 1.34 
Age group  
18–25  51  Referent  
26–35  62  .732  .445  .33, 1.63 
36–45  47  1.016  .972  .43, 2.40 
46–55  40  .928  .871  .38, 2.30 
Previous SNAPEd class experience  
No  112  Referent  
Yes  88  2.184  .013*  1.18, 4.06 
*p < .05. **p < .01. 
Regression 2: Effect of Program Dosage
To examine the effect of program dosage (session attendance) on attainment of the outcome behavior, I included the fourcategory session attendance variable (displayed in Table 3) as a predictor variable. I also included the session frequency variable (weekly = 0, monthly = 1) to determine whether the variation in program delivery schedule influenced the outcome as well. Finally, I included pretest scores in the regression model for statistical control purposes, making the test more powerful and allowing for the testing of the effect of program dosage independent of pretest scores.
Results are presented in Table 5. The results for the number of sessions attended predictor variable provide a good illustration of how to interpret odds ratios for a predictor with multiple categories. For this analysis, within the fourcategory session attendance variable, the category representing the lowest level of attendance (1–2 sessions) was selected as the referent category, against which all other attendance categories were compared. The choice of which category will serve as referent is made when defining the statistical analysis, so it should be a category for which the comparisons make logical sense. The second category (3–4 sessions) had an odds ratio of .99 (very close to 1.0) and was nonsignificant, indicating that the odds for lowfat milk use among participants in this attendance category were not different from those in the lowest category (1–2 sessions). However, the odds ratios for the third category (5–6 sessions) and fourth category (7–8 sessions) were significant (p < .001), indicating that participants in these two attendance categories were more likely to be using lowfat milk at posttest than those who attended only one or two sessions.
Predictor variable  n  OR  Sig.  95% CI 
Number of sessions attended  
1–2  25  Referent  
3–4  27  .99  .991  .28, 3.47 
5–6  39  13.00  <.001***  3.45, 49.00 
7–8  29  15.60  <.001***  3.56, 68.40 
Pretest score  
0 (usually does not use lowfat milk)  79  Referent  
1 (usually uses lowfat milk)  41  4.02  .010*  1.39, 11.61 
Session frequency  
Weekly  64  Referent  
Monthly  56  .23  .005**  .08, .65 
*p < .05. **p < .01. ***p < .001. 
Pretest scores were a significant predictor of posttest scores, as expected (OR = 4.02, p = .010). Finally, session frequency was found to be significant as well (OR = .23, p = .005). The odds ratio for the session frequency variable, considerably smaller than 1.0, indicated that participants at sites having a monthly program delivery schedule (category coded 1) were significantly less likely to be using lowfat milk at posttest than participants at sites having a weekly program delivery schedule (category coded 0). This finding suggests that for this program, a strategy of having weekly sessions is superior to a strategy of having monthly sessions, at least with respect to this specific outcome.
Discussion
Additional Conclusions Reached Through Use of the Multivariate Approach
Both data analysis approaches—bivariate and multivariate—addressed the primary evaluation questions, allowing me to conclude that (a) the SNAPEd course did produce increases in the targeted behavioral outcome, use of lowfat milk, and (b) the number of sessions attended was an important determinant of the extent of the program's impact. The multivariate approach, however, allowed me to draw several additional conclusions:
 There was no relationship between either gender of participant or age of participant and the targeted behavioral outcome. The program appears to be equally effective with men and women as well as with younger and older participants.
 Participants with previous SNAPEd class experience were more likely to report drinking lowfat milk at posttest than those who were new to SNAPEd. This finding may warrant followup investigation, and it may have practical implications regarding how to identify target audiences for new SNAPEd programs.
 A closer look at the significance of session attendance showed that an important cut point, in terms of program effectiveness, seems to occur at around five sessions attended out of the program's total of eight. Participants who attended three or four sessions did not differ from those who attended only one or two relative to the targeted behavioral outcome, but those who attended five or more sessions showed a markedly higher mean outcome than the group having the lowest attendance rate. This finding could have important implications for future program planning regarding what constitutes effective program dosage.
 A weekly program schedule was more effective than a monthly program schedule in producing the desired outcome behavior. This may be because weekly sessions promote higher levels of participant interest and motivation, allow for easier recall of previous material, or provide other benefits. Reasons aside, this finding has important implications for the scheduling of future programs.
Selection of Predictor Variables to Be Used in a Multivariate Approach
Which predictor variables should be included in a regression model? One reason to include a predictor is theoretical interest: A researcher wishes to know whether, and how, the predictor is related to the outcome. A second reason is statistical control: A researcher wishes to make the analysis more powerful. In the analyses described in this article, pretest status was included for control purposes and, as expected, turned out to be a powerful predictor of posttest status. Although I did not have a strong theoretical interest in the relationship of the pretest score to the posttest score, the ability to test the impact of the entire program gained in statistical power by including the pretest score. Lipsey and Hurley (2009) provide further background about experimental power. Kahane (2007) and LewisBeck and LewisBeck (2015) offer good introductory information on regression analysis.
The Multivariate Approach and Increased Opportunities for Dissemination of Research Results
A further advantage of selecting and using more powerful approaches for statistical data analysis in Extension evaluations is that doing so can increase the potential for evaluation studies to be published in highquality, peerreviewed journals. Journal reviewers and editors are highly attentive to issues of data analysis in a manuscript, and an analytic approach that explores an evaluation's primary questions in an appropriately complex way, bringing multiple factors into account to explain findings about program impact, will have a considerable advantage over an approach that is less informative. Some recent examples of Extension evaluations use multivariate regression models (Cutz, Campbell, Filchak, Valiquette, & Welch, 2015; Kaiser et al., 2015; Worker, 2014).
Conclusions
For most program impact evaluations in Extension, there will be multiple approaches available for analyzing your data. A bivariate approach and a multivariate approach may both represent valid, unbiased, and logically consistent ways to address your primary evaluation questions. But the multivariate approach often will be more powerful in its ability to shed light on important aspects of a program's impact, identifying other variables that are influencing the results. By giving attention and thought to your decisions about data analysis, you can "crunch your data" for maximum insight and usefulness.
Acknowledgments
I thank John Geldhof and Mary Arnold at Oregon State University and Norm Constantine at the University of California, Berkeley, for their very helpful comments on an earlier draft of this article, as well as other suggestions regarding the description of these data analyses.
References
Acock, A. C. (2014). A gentle introduction to Stata (4th^{ }ed.). College Station, TX: Stata Press.
Aldrich, J. O., & Cunningham, J. B. (2015). Using IBM® SPSS® Statistics: An interactive handson approach (2nd ed.). Thousand Oaks, CA: Sage.
Boone, H. N., Jr., & Boone, D. A. (2012). Analyzing Likert data. Journal of Extension [online], 50(2) Article 2TOT2. Available at: http://www.joe.org/joe/2012april/tt2.php
Cutz, G., Campbell, B., Filchak, K. K., Valiquette, E., & Welch, M. E. (2015). Impact of a 4H Youth Development program on atrisk urban teenagers. Journal of Extension [online], 53(4) Article 4FEA8. Available at: http://www.joe.org/joe/2015august/a8.php
Delwiche, L. D., & Slaughter, S. J. (2012). The Little SAS® book: A primer (5th ed.). Cary, NC: SAS Institute Inc.
Institute for Digital Research and Education. (2015). Stata FAQ: How do I interpret odds ratios in logistic regression? UCLA: Statistical Consulting Group. Retrieved from http://www.ats.ucla.edu/stat/stata/faq/oratio.htm
Kahane, L. H. (2007). Regression basics (2nd ed.). Thousand Oaks, CA: Sage.
Kaiser, L., Chaidez, V., Algert, S., Horowitz, M., Martin, A., Mendoza, . . . Ginsburg, D. C. (2015). Food resource management education with SNAP participation improves food security. Journal of Nutrition Education and Behavior, 47(4), 374–378.e1.
Kuethe, T. H., & Borchers, A. (2012). Farmland assessment through multiple regression analysis. Journal of Extension [online], 50(3) Article 3TOT3. Available at: http://www.joe.org/joe/2012june/tt3.php
LewisBeck, C., & LewisBeck, M. (2015). Applied regression: An introduction (2nd ed.). Thousand Oaks, CA: Sage.
Lipsey, M. W., & Hurley, S. M. (2009). Design sensitivity: Statistical power for applied experimental research. In L. Bickman & D. J. Rog (Eds.), The Sage handbook of applied social research methods (2nd ed., pp. 44–76). Thousand Oaks, CA: Sage.
McCoach, D. B., & Adelson, J. L. (2010). Dealing with dependence (Part I): Understanding the effects of clustered data. Gifted Child Quarterly, 54(2), 152–155.
Worker, S. M. (2014). Evaluating adolescent satisfaction of a 4H leadership development conference. Journal of Extension [online], 52(2) Article 2RIB4. Available at: http://www.joe.org/joe/2014april/rb4.php
Workman, J. D., & Scheer, S. D. (2012). Evidence of impact: Examination of evaluation studies published in the Journal of Extension. Journal of Extension [online], 50(2) Article 2FEA1. Available at: http://www.joe.org/joe/2012april/a1.php
Appendix
This appendix presents the SPSS command language that was used to generate the logistic regression analyses described in this article, as well as the critical tables provided in the SPSS output files. In SPSS, command language is used through the creation of an SPSS syntax file. As an alternative to command language, the analyses can also be conducted using SPSS's menu format (Analyze ➡ Regression ➡ Binary Logistic . . .). Experienced SPSS users sometimes prefer the command language option because of its efficiency and its ability to reproduce a detailed record of the procedure.
In interpreting the run commands and the output, note the following variable names that have been used in the SPSS procedures, and the codes for each level of each categorical variable. The first variable listed (UseLoFat2) is the target outcome, which is entered into the regression model as the dependent variable:
 USELOFAT2: Use of lowfat milk at posttest, coded as 1 (Yes) or 0 (No)
 CONDITION: Program group (1 = SNAPEd program, 0 = Comparison group)
 USELOFAT1: Use of lowfat milk at pretest (1 = Yes, 0 = No)
 PREVIOUS: Previous SNAPEd program experience (1 = Yes, 0 = No)
 GENDER: 1 = Female, 2 = Male
 AGEGRP: Age Group, divided into 4 categories and coded from 1 (18–25 years) to 4 (46–55 years)
 PROGDOSE: Number of lessons attended (program dosage), divided into four levels and coded from 1 (1–2 sessions) to 4 (7–8 sessions)
 LESNFREQ: Lesson frequency (1 = Weekly, 2 = Monthly)
Logistic regression 1: Testing for program impact
The command language is as follows:
LOGISTIC REGRESSION VARIABLES UseLoFat2
/METHOD=ENTER Gender AgeGrp Condition Previous UseLoFat1
/CATEGORICAL = Gender AgeGrp Condition Previous UseLoFat1
/CONTRAST (Gender)=Indicator(1)
/CONTRAST (AgeGrp)=Indicator(1)
/CONTRAST (Condition)=Indicator(1)
/CONTRAST (Previous)=Indicator(1)
/CONTRAST (UseLoFat1)=Indicator(1)
/PRINT=CI(95)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
The three tables that follow provide the summary of cases, the coding of categorical variables within the regression model (which differ from the assigned codes listed above), and the results of the logistic regression analysis. The column labeled "Exp(B)" (i.e., exponent of the beta coefficient) shows the odds ratio for each categorical level of each predictor variable. All variables were entered into the regression in a single step.
Dummy coding. Logistic regression uses "dummy variables" to classify the levels of each categorical variable. In dummy coding, the potential values are either 0 or 1. The second table ("Categorical Variables Codings") lists how the categories were recoded to create dummy variables. For example, for gender, the original, arbitrary codes for females (1) and males (2) have been changed to 0 and 1, respectively, for the regression analysis. For categorical variables that have more than two potential values (including AGEGRP and PROGDOSE in these analyses), the number of dummy variables created will be one less than the number of categories. For example, as shown in the Categorical Variables Codings table, SPSS has created three dummy variables to represent the four levels of the age variable. The first of those dummy variables [column labeled "(1)"] is coded 1 for participants aged 26–35 and 0 for all other participants; the second dummy variable is coded 1 for participants aged 36–45 and 0 for all other participants; the third dummy variable is coded 1 for participants aged 46–55 and 0 for all other participants. Participants aged 18–25 (the referent category) are identified by having a value of 0 on all three of the dummy variables.
The direction of the category codes (i.e., who gets 1 and who gets 0 for a categorical variable) is reflected in the sign of the regression coefficients. Thus, as can be seen in the results table ("Variables in the Equation"), the beta coefficient (B) for gender is negative, indicating that lower scores on gender (that is, females rather than males) are associated with higher scores on the outcome variable (use of lowfat milk at posttest). In nontechnical language, this would be described as: Females are more likely than males to use lowfat milk at posttest. However, the relationship is not statistically significant, so this claim cannot be made, despite the direction of the relationship between the variables.
As can be seen in the results table, a negative beta coefficient is reflected as an odds ratio [column "Exp(B)"] that is lower than 1.0. The value of an odds ratio cannot be less than zero.
In the SPSS results table below ("Variables in the Equation"), values from the following columns have been included in Table 4 above: Exp(B) (the odds ratio), Sig. (significance level), and 95% C.I. for Exp(B) (95% Confidence Interval for the odds ratio).
Unweighted Cases  N  Percent  
Selected Cases  Included in Analysis  200  100.0 
Missing Cases  0  .0  
Total  200  100.0  
Unselected Cases  0  .0  
Total  200  100.0 
Frequency  Parameter coding  
(1)  (2)  (3)  
AGEGRP  1 18  25  51  .000  .000  .000 
2 26  35  62  1.000  .000  .000  
3 36  45  47  .000  1.000  .000  
4 4655  40  .000  .000  1.000  
USELOFAT1  0  132  .000  
1  68  1.000  
CONDITION  0 Control group  80  .000  
1 Instruction group  120  1.000  
PREVIOUS  0 No previous Nut Ed  112  .000  
1 Previous Nut Ed  88  1.000  
GENDER  1 Female  156  .000  
2 Male  44  1.000 
B  S.E.  Wald  df  Sig.  Exp(B)  95% C.I.for EXP(B)  
Lower  Upper  
Step 1^{a}  GENDER(1)  .433  .372  1.357  1  .244  .649  .313  1.344 
AGEGRP  .853  3  .837  
AGEGRP(1)  .312  .408  .584  1  .445  .732  .329  1.629  
AGEGRP(2)  .016  .439  .001  1  .972  1.016  .430  2.400  
AGEGRP(3)  .075  .463  .026  1  .871  .928  .375  2.297  
CONDITION(1)  .934  .314  8.839  1  .003  2.545  1.375  4.711  
PREVIOUS(1)  .781  .316  6.115  1  .013  2.184  1.176  4.057  
USELOFAT1(1)  .934  .339  7.611  1  .006  2.544  1.310  4.940  
Constant  .693  .397  3.052  1  .081  .500  
a. Variable(s) entered on step 1: GENDER, AGEGRP, CONDITION, PREVIOUS, USELOFAT1. 
Logistic regression 2: Effects of program dosage
For the second logistic regression analysis, results of which are presented in Table 5 above, the SPSS command language, case summary table, categorical coding table, and results table are presented below. Values from the following columns in the results table have been included in Table 5 above: Exp(B), Sig., and 95% C.I. for Exp(B).
LOGISTIC REGRESSION VARIABLES UseLoFat2
/METHOD=ENTER ProgDose UseLoFat1 Gender Previous LesnFreq
/CATEGORICAL = UseLoFat1 Gender Previous LesnFreq
/CONTRAST (UseLoFat1)=Indicator(1)
/CONTRAST (Gender)=Indicator(1)
/CONTRAST (Previous)=Indicator(1)
/CONTRAST (LesnFreq)=Indicator(1)
/PRINT=CI(95)
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
Unweighted Cases^{a}  N  Percent  
Selected Cases  Included in Analysis  120  100.0 
Missing Cases  0  .0  
Total  120  100.0  
Unselected Cases  0  .0  
Total  120  100.0 
Frequency  Parameter coding  
(1)  
LESNFREQ  1 Every week  64  .000 
2 Every month  56  1.000  
GENDER  1 Female  94  .000 
2 Male  26  1.000  
PREVIOUS  0 No previous Nut Ed  70  .000 
1 Previous Nut Ed  50  1.000  
USELOFAT1  0  79  .000 
1  41  1.000 
B  S.E.  Wald  df  Sig.  Exp(B)  95% C.I.for EXP(B)  
Lower  Upper  
Step 1^{a}  PROGDOSE  .591  .133  19.879  1  .000  1.806  1.393  2.342 
USELOFAT1(1)  1.355  .565  5.750  1  .016  3.876  1.281  11.732  
GENDER(1)  1.255  .649  3.736  1  .053  .285  .080  1.018  
PREVIOUS(1)  1.509  .567  7.088  1  .008  4.521  1.489  13.726  
LESNFREQ(1)  1.210  .503  5.785  1  .016  .298  .111  .799  
Constant  2.098  .677  9.596  1  .002  .123  
a. Variable(s) entered on step 1: PROGDOSE, USELOFAT1, GENDER, PREVIOUS, LESNFREQ. 