Program evaluation is essentially a set
of philosophies and techniques to determine if a program 'works'. It is
a practice field that has emerged, particularly in the USA, as a
disciplined way of assessing the merit, value, and worth of projects and
programs. Evaluation became particularly relevant in the 1960s during
the period of the Great Society social programs associated with the
Kennedy and Johnson administrations. Extraordinary sums were invested in
social programs, but the means of knowing what happened, and why were
not available.
Behind the seemingly simple question of whether the program works are a
host of other more complex questions. For example, the first question
is, what is a program supposed to do? It is often difficult to define
what a program is supposed to do, so indirect indicators may be used
instead. For example schools are supposed to 'educate' people. But what
does 'educate' mean? Give knowledge? Teach how to think? Give specific
skills? If the exact goal cannot be defined well, it is difficult to
indicate whether the program 'works'.
Another question about programs is, what else do they do? There may be
unintended or unforeseen consequences of a program. Some consequences
may be positive and some may be negative. These unintended consequences
may be as important as the intended consequences. So evaluations should
measure not just whether the program does what it should be doing, but
what else it may be doing.
Perhaps the most difficult part of evaluation is determining whether it
is the program itself that is doing something. There may be other events
or processes that are really causing the outcome, or preventing the
hoped for outcome. However, due to the nature of the program, many
evaluations cannot determine whether it is the program itself, or
something else, is the 'cause'.
One main reason that evaluations cannot determine causation involves
self selection. That is, people select themselves to participate in a
program. For example, in a jobs training program, some people decide to
participate, and others, for whatever reason, do not participate. It may
be that those who do participate are those who are most determined to
find a job, or who have the best support resources, thus allowing them
to participate and allowing them to find a job. The people who
participate are somehow different from those who don't participate, and
it may be the difference, not the program, that leads to a successful
outcome for the participants, that is, finding a job.
If programs could, somehow, use random assignment, then they could
determine causation. That is, if a program could randomly assign people
to participate or to not participate in the program, then,
theoretically, the group of people who participate would be the same as
the group who did not participate, and an evaluation could 'rule out'
other causes.
However, since most programs cannot use random assignment, causation
cannot be determined. Evaluations can still provide useful information.
For example, the outcomes of the program can be described. Thus the
evaluation can say something like, "People who participate in program
xyz were more likely to find a job, while people who did not participate
were less likely to find a job."
If the program is fairly large, and there are many participants, and
there is enough data, statistical analysis can be used sometimes to make
a 'reasonable' case for the program by showing, for example, that other
causes are unlikely.
Another approach is to use the evaluation to analyze the program
process. So instead of focusing on the outcome (for example, did people
in a jobs training program get jobs), the evaluation would focus on what
the program was doing. For example, did people seem to learn the skills
being taught? Did people stay in the program or did they drop out part
way through? Were the teachers teaching appropriate skills? And so
forth. This information could help how the program was operating.
People who do program evaluation can come from many different
backgrounds, such as sociology, psychology, economics, social work or
many other areas. Some graduate schools also have specific training
programs for program evaluation.
Program evaluations can involve quantitative methods of social research
or qualitative methods or both.
Types of evaluation
Program evaluation is often divided into types of evaluation.
Formative Evaluation occurs early in the program. The results are used
to decide how the program is delivered, or what form the program will
take. For example, an exercise program for elderly adults would seek to
learn what activities are motivating and interesting to this group.
These activities would then be included in the program.
Process Evaluation is concerned with how the program is delivered. It
deals with things such as when the program activities occur, where they
occur, and who delivers them. In other words, it asks the question: Is
the program being delivered as intended? An effective program may not
yield desired results if it is not delivered properly.
Outcome Evaluation addresses the question of what are the results. It is
common to speak of short-term outcomes and long-term outcomes. For
example, in an exercise program, a short-term outcome could be a change
knowledge about the health effects of exercise, or it could be a change
in exercise behavior. A long-term outcome could be less likelihood of
dying from heart disease.
CDC framework
In 1999, the Centers for Disease Control and Prevention (CDC) published
a six-step framework for conducting evaluation of public health
programs. The publication of the framework is a result of the increased
emphasis on program evaluation of government programs in the US. The six
steps are:
1. Engage stakeholders, a term referring to anyone with an interest in
the program.
2. Describe the program.
3. Focus the evaluation.
4. Gather credible evidence.
5. Justify conclusions.
6. Ensure use and share lessons learned.