October 1999 // Volume 37 // Number 5 // Research in Brief // 5RIB6
Factor Analysis Adds New Dimension to Extension Surveys
Abstract
Survey has been one of the most popularly used research and evaluation tools in extension work. Traditional approach to survey analysis involves the use of frequency counts, t-test, correlation, and measures of central tendency. One procedure often left out, if not totally ignored because of its reputed complexity, is factor analysis. Factor analysis is a variable-reduction statistical technique capable of probing underlying relationships in variables using Likert-type scales. The procedure essentially removes metric redundancies from a survey and extracts the common thread that binds a set of observed variables together. The analysis can be implemented using a powerful SAS(R) procedure, called PROC FACTOR. This paper discusses implementation of factor analysis in SAS and proposes its use as an additional statistical procedure for Likert-based extension surveys.
Introduction
One popular technique for obtaining information on human knowledge, attitudes, behavioral preferences, and similarities or the lack of them is the inclusion of Likert-type (for example, 1 = strongly disagree, 5 = strongly agree) or dichotomous (such as, yes/no) scales in survey questionnaires. Frequency analysis, t-test, and measures of central tendency are the traditional statistical methods for analyzing survey responses (Santos, Lippke & Pope, 1998). However, such procedures do not account for correlation occurring at and/or between scale level responses. This leaves out the more important attribute of being able to detect and evaluate unobservable patterns. From such patterns, one is able to describe and explain behavioral traits shared within and/or uniquely associated with some groups of respondents.
One approach to analyzing subjective perceptions, to gain insights from survey responses, is through factor analysis (Kim and Mueller, 1978). Factor analysis is a statistical variable reduction procedure, which extracts a small number of latent variables or "constructs" from among a larger set of observed variables. This paper discusses factor analysis in SAS(R) and proposes its use as an additional statistical test for Likert-based surveys in Extension. (NOTE: Common factors, component, and constructs are used interchangeably in this paper.)
Zeroing in on the Solution
The terminal solution to factor analysis is achieved through a series of steps that involved the use of several PROC FACTOR runs which can be executed together as one program. Typical procedure output would include:
Simple Statistics and Table of Eigenvalues
Once specified as a procedure of choice, PROC FACTOR automatically generates simple statistics and a table of eigenvalues (discussed later). This part contains the information on the relative sizes of variance accounted for by each extracted component.
Extraction of Initial Factors
For new users of factor analysis, the use of the maximum likelihood method option facilitates the extraction of common factors and provides a significance test for determining the number of factors to retain (Hatcher, 1994; SAS Institute Inc., 1985). There are alternative methods, such as the use of the NFACT= option, to initial common factor extraction.
Rotation to Terminal Solution
Very seldom will common factors extracted during the initial run have a clear-cut loading of observed variables on them. Sometimes, a number of observed variables will have two- or three-way moderate loading, making it difficult to interpret the factor pattern generated by PROC FACTOR. In such cases, a remedial procedure referred to as rotation is used to effect a linear transformation such that the variable loading in one construct is maximized while minimizing the loading of the same for all the others. A rotated factor pattern usually takes on a simple structure since transformation minimizes multiple loading. Based on the number of variables that loaded and the simplicity of resulting structure, one can then tentatively retain selected common factors. At this point, any construct that does not meet the minimum requirement for loading of at least three variables is dropped from the analysis.
Choosing the Number of Factors to Retain
The three key determinants for retaining a certain number of common factors are the position of the factors in the SCREE plot, the proportion of variance accounted for by the individual factor, and overall interpretability of all retained factors.
Position in the Scree Plot
"Scree" is a word used to describe loose stones or rocky debris at the base of a hill or cliff. In factor analysis, a scree plot graphically groups factors, which makes it easy to separate the retainable constructs from those that are not useful. Groups of factors tend to separate because a "break" or an abrupt jump in eigenvalue, which is the amount of variance that is accounted for by a given factor, occurs. The larger the eigenvalue the more meaningful the common factor is. Consequently, the process facilitates identification of common factors eligible for retention. Since more than one break can occur in the plot, decisions based on scree plot position are usually reinforced with the use of the other two determinants.
Proportion of Accounted Variance
The hierarchical position of factors in the table of eigenvalues is determined by the proportion of common variance in the data set accounted for by the individual factors (Table 1). The intersection of the items labeled "Proportion" in the table and the factor number on top gives the amount of variation attributed to that particular factor. There is no rule on the extent of contribution a common factor needs in order for it to be retained. Arbitrary values of at least 10% for individual components or 70-80% of the total percent of variance are commonly used.
Table 1 A table of eigenvalues |
||||
---|---|---|---|---|
FACTOR | ||||
1 | 2 | 3... | 23 | |
Eigenvalue | 5.1772 | 3.8729 | 1.8821... | -4.4600 |
Difference | 1.3042 | 1.9908 | 0.9184... | |
Proportion | 0.4918 | 0.3679 | 0.1788... | -0.0437 |
Cumulative | 0.4918 | 0.8598 | 1.0386... | 1.0000 |
Interpretability
The following are the test criteria for interpretability and how they can be satisfied by candidate component factors:
1. Minimum of three variables loading per factor:
Initial extraction is usually sufficient to generate a good number of meaningful constructs. However, if there is a failure to come up with an acceptable number of constructs, then using the NFACT= option is an alternative approach. The option forces the procedure to extract a user-entered number of factors. Another option is to add more relevant question items (specifically targeted to constructs that did not meet the minimum loading requirement), re-administering the survey, and then running factor analysis again.
2. Simplicity of structure:
All input factors loading on to a specific construct should exhibit one-way moderate to high loading (coefficient of .40 or greater) and very low complementary loading (ideally approaching zero) on other constructs. This is usually referred to as "having a simple structure."
3. Variables that loaded high on each construct should subscribe to the same concept that is distinctly different from those shared and measured by the variables supporting the other constructs.
To name a construct is to identify all variables that loaded high on it, and then looking at the predominant common theme, concept, or content that each of the variables has contributed. The process of attaching a meaning to a retained factor, as related to the objectives and science of the study, is called "interpretation". For a successful interpretation, all observed variables that loaded highly on a particular construct should share the same thematic or conceptual perspective. Such conceptual domain must be distinctly different from the dimensions addressed by the variables loading highly on the other constructs.
Presentation of Result and Interpretation
A typical tabular presentation of results of factor analysis would include (a) a list of loading of the rotated factor pattern with component number and labels as headings, (b) a list of the final communality estimates (squared multiple correlations for predicting variables from the estimated factors) indicated by "h2" heading, (c) the questionnaire items aligned with their corresponding loading and "h2" values, and (d) component names (optional). Usually the table is footnoted with the total number of respondents providing valid responses, and the rating scale used. Putting Cronbach's alpha (Cronbach, 1951) coefficient and values of the common variance accounted for by each component is also desirable. A sample presentation of result from an actual survey is shown in Table 2.
Table2 Tabular presentation of a three-factor solution from a survey's factor analysis. |
|||||
---|---|---|---|---|---|
Component Loading | |||||
Component Variable Label | 1 | 2 | 3 | h2 | |
1 | Pay farmers for planting grass | 0.42 | 0.10 | 0.05 | .19 |
U.S. cont subsidy on ag export | 0.74 | 0.04 - | 0.05 | .54 | |
U.S. cont subsidy on value add | 0.68 | 0.11 - | 0.07 | .48 | |
Subsidy on plant-derived fuel | 0.46 | 0.09 | 0.13 | .24 | |
Incr funding for rural employm | 0.50 | 0.20 | 0.19 | .33 | |
2 | Govt regulate farming practices | 0.15 | 0.46 | 0.05 | .24 |
Cont'n of govt reg on water qua - | 0.01 | 0.68 | 0.09 | .47 | |
Req farmers to plant grass stri | 0.13 | 0.66 | 0.09 | .44 | |
Req farmers/keep pesticide reco - | 0.01 | 0.55 | 0.06 | .30 | |
3 | Storage/cooking instruction for | 0.16 | 0.20 | 0.59 | .41 |
Strengthen food inspection | 0.10 | 0.19 | 0.66 | .48 | |
More nutritional info on food l | 0.14 | 0.22 | 0.57 | .41 | |
Eigenvalue | 4.942 | 2.440 | 1.576 | ||
Common var explained by component | 0.55 | 0.27 | 0.17 | ||
Reliability coefficient: ____a | |||||
n = 1083 | |||||
Scale:1 = Strongly agree; 5 =S trongly disagree h2 = final communality estimates |
|||||
Component name: 1 = Government subsidy 2 = Government regulation 3 = Food safety |
|||||
a Cronbach's alpha coefficient goes here. |
Under traditional survey analysis, the researcher would have been satisfied having frequency analysis done on the survey total of 23 Likert-type variables. It is from such frequencies that the researcher would have deduced the relative importance of each variable. The higher the frequency count on a certain issue, the more popular is the issue to the greater proportion of the respondents. To interpret the result, the researcher would have to rank each observed variable in decreasing order of frequency, and then devote the discussion to the first three or five high-ranking variables. For numeric variables, correlation of observed variables with demographic variables would have been done at this point.
However with factor analysis and using the information from the table above, one is able to identify three major areas of concern that impact all the respondents, namely: (component 1) government subsidies, (component 2) government regulations, and (component 3) food safety. In the process, factor analysis had detected and eliminated redundant variables (which measure the same construct as the other observed variables) and only retained those which effectively influence the three extracted components. From here on, the three components can now be used as inputs to predictive models or as benchmarks to developing indices for measuring social and/or behavioral attributes on surveys of similar nature.
There are other useful implications in this exercise. One could have used only the 12 observed variables (questions) that loaded high on the three components and still come up with the same conclusion as if all 23 original Likert-type variables had been used. Also, while the example looked at a 3-component model in our example, one should not hesitate to explore 2-, 4- or 5-component models, which may be more appropriate for some other types of surveys.
Instances of a specific variable behaving against expectations, such as failing to measure an attribute that one intended it to measure, should serve as a signal to investigate the possible cause(s) of the unexpected outcome. Questionnaire design and consistency of scale usually cause such an error. Indeed, the power of factor analysis lies on the iterative fine-tuning that one does while using the procedure. It is through such persistence that a researcher improves his/her chance of uncovering the hidden dimensions of the constructs that he/she sought to identify and measure in a survey.
Finally, the task of interpretation is easy when the criteria discussed earlier have been satisfied. Once the decision is made to retain a set of constructs, the researcher should be able to intellectually synthesize and describe the common thread that binds all the variables involved in each construct, and relate them to the objective(s) of the survey. In the end, it is the researcher's subject matter expertise, professional experience, and his/her interpretive ability that determine the utility of the set of component factors that he/she chose to retain.
Conclusion
Factor analysis has been available for use for decades but surprisingly few people design their surveys to be amenable to the use of the procedure. The most glaring deficiencies of many surveys are inadequate sample size, using too many scales, and an insufficient number of Likert-type variables allocated to each targeted construct. With the advent of computers, the availability of easy-to-use software, and the potentially useful information that can be gained from using the procedure, the use of factor analysis in Likert-based Extension surveys should be encouraged.
Exploratory data analysis is the most popular application of factor analysis but some use it to iteratively refine and confirm their models in the light of new data or current research. Others use the procedure as an intermediate process to develop indices designed to measure attributes that could not be reliably predicted by the original variables. Factor analysis is a powerful analytical tool and using it would certainly benefit and enhance the data processing capability of any Extension program.
References
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,297-334.
Hatcher , L. (1994). A step-by-step approach to using the SAS system for factor analysis and structural equation modeling. Cary, NC: SAS Institute.
Kim, J.O. & Mueller, C.W. (1978). Factor analysis: Statistical methods and practical issues. Beverly Hills, CA: Sage Publications.
Santos, J.R.A., Lippke, L., & Pope, P. (1998). PROC FACTOR: A tool for extracting hidden gems from a mountain of variables. Proceedings of the Twenty-Third Annual SAS Users Group International Conference. (pp. 1330-1335). Cary, NC: SAS Institute.
SAS Institute Inc. (1985). SAS user's guide: Statistics, Version 5 Edition. Cary, NC: SAS Institute.
Acknowledgement
Data for this paper were derived from the "1994 National Agricultural and Food Policy Preference Survey" of Dr. Lawrence Lippke (Texas Agricultural Extension Service) and Benny Lockett (Cooperative Extension Program, Prairie View A&M University).
Trademark Information
SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. (R) indicates USA registration.