August 1999 // Volume 37 // Number 4 // Ideas at Work // 4IAW2

Previous Article Issue Contents Previous Article

Are Open-Ended Questions Tying You in Knots?

Customer surveys are the mainstay of the Extension Service and have always served as conduits for collecting feedback about planned, ongoing, or concluded projects. While it is most desirable to design survey questions where respondents can categorically indicate their preferred answers, there are instances when a researcher wants to capture unbiased, unconstrained, and thoughtful responses via open-ended questions unavailable by any other means. Unlike structured questionnaires, which are easily handled by most statistical software, responses to open-ended questions seem to defy programming logic and are usually treated as ordinary character strings by computer programmers. This paper shows a possible technique for dealing with open-ended question responses without the time-consuming collation associated with them.

J. Reynaldo A. Santos
Extension Information Technology
Texas Agricultural Extension Service
Internet address:

Diann Mitchell
Extension Information Technology
Texas Agricultural Extension Service

Paul Pope
Department of Rural Sociology

Texas A&M University
College Station, Texas


Responses to pre-categorized structured questionnaires are easily analyzed by desktop statistical software. It must be the lure of getting presumably unbiased, unconstrained, and thoughtful responses that encourage researchers to use open-ended questions whenever they like. However, such enthusiasm is usually dampened upon realization that the only way that they can make sense of the valued responses is by printing them out as a list. A cumbersome alternative is to sit by the stack of surveys and glean out the contents one form at a time. This paper shows a possible technique for dealing with open-ended question responses without the time-consuming chore of manual collation (Culp and Pilat, 1998).


A subset of three open-ended questions from a previously conducted simulated survey was used for this exercise. The data was analyzed using the SAS/BASE(R) and SAS/STAT(R) modules of SAS(R) software (SAS Institute, Cary, NC). During data entry, three essential fields namely ID, QUESTION, and CATEGORY, were added to the original set of variables contained on each record (responses from each respondent). The ID field tags the form and associates it to a specific respondent. ID was also used as a common field from which to merge information with other data set(s) when doing analysis across demographic data, or when frequency analysis becomes important in establishing relationships with other variables in other data sets.

The QUESTION field identifies the issue being addressed by a particular response. The CATEGORY field makes possible the pre-classification of the response for easy handling and programming. The latter is an optional field since categories can be generated from the results of the word frequency analysis as will be discussed in the later part of this paper. Table 1 shows a sample data entry format.

Table 1
A Sample Data Entry Formats for Open-ended Questionnaire Responses. ($ after RESPONSE denotes a character string variable)
112 2 1 it boils down to economics
112 3 2 self-esteem
112 4 1 supply and demand
124 2 1 production and marketing
124 3 2 proud of one's accomplishment
124 4 3 developing the tourism industry

The word count strategy operates on the premise that open-ended responses are strands of phrases and sentences constructed of major and minor keywords plus "extraneous" words such as "the," "is," "are," "that," "to," "a," "of," and many others. Key to capturing the essence of any response is the ability of the programmer to extract and manipulate character strings such that all unimportant components are dropped while keeping the major and minor words intact. For this exercise, numbers and punctuation marks were also excluded. Frequency analysis was then performed on the retained words from which Table 2 was derived. This procedure may be performed for each of the response variables of interest.

Table 2
Word Frequency Distribution for the Sample Data Set after Excluding the Extraneous Words, Characters, and Punctuation Marks
Q01 Frequency Percent Cumulative Frequency Cumulative Percent
Economics 27 21.4 27 21.4
Development 23 18.3 50 39.7
Supply 20 15.9 70 55.6
Demand 18 14.3 88 69.9
Expenses 14 11.1 102 81.0
Prices 10 7.9 112 88.9
Competition 5 4.0 117 92.9
Marketing 3 2.4 120 95.3
Availability 2 1.6 122 96.9
Payment 2 1.6 124 98.5
Product 2 1.6 126 100.0


It could be gleaned from Table 2 that if one only wants to gauge and have a feel of the important issues the respondents are concerned about, then the word frequency table would probably suffice. However, by grouping closely related words together (such as, product, prices, and marketing), the table provides a platform for creating new or modified categories that may be more meaningful, and more descriptive, than those that were originally generated. It also opens up opportunities to drill down on one or several of the issues to reveal more specific opinion of the respondents.

After performing word frequency analysis on the original variables, the result may indicate convergence to just a few specific issues. In such a situation and where dichotomous or only few divergent responses are expected, one can program the string extraction such that the presence (or absence) of a minor modifier word (such as "high") is detected for specific variables (such as "demand"). The sequence of the modifier word with the major keyword (such as "high demand") can then be tracked and counted. Since dual word extraction is only possible using the original response variables, result from word frequency analysis should only serve as a reference.

In SAS, there are at least three ways of searching for a word or a combination of words within a string of characters. The first one is by using the software's pattern-matching functions. The second is through the use of indexing functions that can be programmed to perform string extraction, and word concatenation. The third and simplest way is by the use of conditional loops in combination with counter variables to monitor the occurrence of pattern-matches. All three techniques render the drilling down and tallying of all possible responses to a particular issue possible (Table 3). However, this procedure may not be applicable for variables with less clear-cut responses.

Table 3
Frequency Analysis Resulting from a Drill Down Using the Indexing Technique to Search for "Demand" and its Modifiers
Q01DEMND Frequency Percent Cumulative Frequency Cumulative Percent
no demand 9 50.0 9 50.0
high demand 6 34.6 15 84.6
demand 3 15.4 18 100.0

In the example above, one could have stopped with the analysis after doing a word frequency count as was demonstrated in Table 2. At that point there would have been the confidence to conclude that "demand" is one of the predominant concerns of respondents. Whether they feel there is a high demand or no demand at all is imperceptible and could not be deduced by just looking at Table 2. However, with the use of the indexing technique, the researchers were able to drill down and generate definitive, polarized responses not obtainable by the word frequency method alone.


Surveys have always been and will always be indispensable tools in Extension work. While closed-ended questionnaires in surveys offer an easy way of evaluating a program's success or failure, the power of open-ended questionnaire to elicit unrestricted, unbiased, and frank responses is a highly valuable feedback mechanism which should be harnessed to advantage.

One objection to using open-ended questions is the time-consuming and difficult task involved in summarizing responses. This paper presents a simple but effective method of extracting information from surveys that should remove such a constraint. The only drawback is that while SAS provides a point-and-click user interface that takes care of the statistical analyses for most numeric variables, one still needs some experience in SAS at manipulating character strings in order to implement the procedure described in this paper. The good news is programming in SAS is easy to learn. SAS is available for site licensing by universities at academic rates.


Culp, K. and Pilat, M. 1998. Converting feedback into quantifiable categories. Journal of Extension 36(5). Available on-line:

Trademark Information

SAS is a registered trademark of the SAS Institute Inc. in the USA and other countries. (R) indicates USA registration.