Interpreting the Impact of Program Maturity
Online learning programs are often on the cutting edge of education reform and, like any new technology, may require a period of adaptation. For example, a district might try creating a new online course and discover some technical glitches when students begin to actually use it. Course creators also may need to fine-tune content, adjusting how it is presented or explained. For their part, students who are new to online course taking may need some time to get used to the format. Perhaps they need to learn new ways of studying or interacting with the teacher to be successful. If an evaluation is under way as all of this is going on, its findings may have more to do with the program's newness than its quality or effectiveness.
Because the creators of online programs often make adjustments to policies and practices while perfecting their model, it is ideal to wait until the program has had a chance to mature. At the same time, online learning programs are often under pressure to demonstrate effectiveness right away. Although early evaluation efforts can provide valuable formative information for program improvement, they can sometimes be premature for generating reliable findings about effectiveness. Worse, if summative evaluations (see Glossary of Common Evaluation Terms, p. 65) are undertaken too soon and show disappointing results, they could be damaging to programs' reputations or future chances at funding and political support.
What steps can evaluators and program leaders take to interpret appropriately the impact of the program's maturity on evaluation findings?
Several of the programs featured in this guide had evaluation efforts in place early on, sometimes from the very beginning. In a few cases, evaluators found less-than-positive outcomes at first and suspected that their findings were related to the program's lack of maturity. In these instances, the evaluators needed additional information to confirm their hunch and also needed to help program stakeholders understand and interpret the negative findings appropriately. In the case of Thinkport, for example, evaluators designed a follow-up evaluation to provide more information about the program as it matured. In the case of the Algebra I Online program, evaluators used multiple measures to provide stakeholders with a balanced perspective on the program's effectiveness.
Conduct Follow-up Analyses for Deeper Understanding
In evaluating the impact of Thinkport's electronic field trip about slavery and the Underground Railroad, evaluators from Macro International conducted a randomized controlled trial (see Glossary of Common Evaluation Terms, p. 65), with the aim of understanding whether students who used this electronic field trip learned as much as students who received traditional instruction in the same content and did not use the field trip.
Initially, the randomized controlled trial revealed a disappointing finding: The electronic field trip did not impact student performance on a test of content knowledge any more or less than traditional instruction. A second phase of the study was initiated, however, when the evaluators dug deeper and analyzed whether teachers who had used an electronic field trip before were more successful than those using it for the first time. When they disaggregated the data, the evaluators found that students whose teachers were inexperienced with the electronic field trip actually learned less compared to students who received traditional instruction; however, students whose teachers had used the electronic field trip before learned more than the traditionally taught students. In the second semester, the evaluators were able to compare the effect of the treatment teachers using the electronic field trip for the first time to the effect of those same teachers using it a second time. This analysis found that when teachers used the electronic field trip a second time, its effectiveness rose dramatically. The students of teachers using the field trip a second time scored 121 percent higher on a test of knowledge about the Underground Railroad than the students in the control group (see Glossary of Common Evaluation Terms, p. 65) who had received traditional instruction on the same content.
The evaluators also looked at teachers' responses to open-ended survey questions to understand better why second-time users were so much more successful than first-timers. Novice users reported that "they did not fully understand the Web site's capabilities when they began using it in their classes and that they were sometimes unable to answer student questions because they didn't understand the resource well enough themselves."12 On the other hand, teachers that had used the electronic field trip once before "showed a deeper understanding of its resources and had a generally smoother experience working with the site."
In this instance, teachers needed time to learn how to integrate a new learning tool into their classrooms. Although not successful at first, the teachers who got past the initial learning curve eventually became very effective in using the field trip to deliver content to students. This finding was important for two reasons. First, it suggested an area for program improvement: to counter the problem of teacher inexperience with the tool, the evaluators recommended that the program offer additional guidance for first-time users. Second, it kept the program's leaders from reaching a premature conclusion about the effectiveness of the Pathways to Freedom electronic field trip.
This is an important lesson, say Thinkport's leaders, for them and others. As Helene Jennings, vice-president of Macro International, explains, "There's a lot of enthusiasm when something is developed, and patience is very hard for people.… They want to get the results. It's a natural instinct." But, she says, it's important not to rush to summative judgments: "You just can't take it out of the box and have phenomenal success." Since the 2005 evaluation, the team has shared these experiences with other evaluators at several professional conferences. They do this, says Macro International Senior Manager Michael Long, to give other evaluators "ammunition" when they are asked to evaluate the effectiveness of a new technology too soon after implementation.
Use Multiple Measures to Gain a Balanced Perspective
Teachers are not the only ones who need time to adapt to a new learning technology; students need time as well. Evaluators need to keep in mind that students' inexperience or discomfort with a new online course or tool also can cloud evaluation efforts—especially if an evaluation is undertaken early in the program's implementation. The Algebra I Online program connects students to a certified algebra teacher via the Internet, while another teacher, who may or may not be certified, provides academic and technical support in the classroom. When comparing the experiences of Algebra I Online students and traditional algebra students, EDC evaluators found mixed results. On the one hand, online students reported having less confidence in their algebra skills. Specifically, about two-thirds of students from the control group (those in traditional classes) reported feeling either confident or very confident in their algebra skills, compared to just under half of the Algebra I Online students. The online students also were less likely to report having a good learning experience in their algebra class. About one-fifth of Algebra I Online students reported that they did not have a good learning experience in the class, compared to only 6 percent of students in regular algebra classes. On the other hand, the online students showed achievement gains that were just as high or higher than those of traditional students. Specifically, the Algebra I Online students outscored students in control classrooms on 18 of 25 posttest items, and they also tended to do better on those items that required them to create an algebraic expression from a real-world example.
In an article they published in the Journal of Research on Technology in Education, the evaluators speculated about why the online students were less confident in their algebra skills and had lower opinions of their learning experience: "It may be that the model of delayed feedback and dispersed authority in the online course led to a 'lost' feeling and prevented students from being able to gauge how they were doing."13 In other words, without immediate reassurance from the teacher of record, students may have felt they weren't "getting it," when, in fact, they were.
This example suggests that students' unfamiliarity with a new program can substantially affect their perceptions and experiences. The evaluators in this case were wise to use a variety of measures to understand what students were experiencing in the class. Taken alone, the students' reports about their confidence and learning experience could suggest that the Algebra I Online program is not effective. But when the evaluators paired the self-reported satisfaction data with test score data, they were able to see the contradiction and gain a richer understanding of students' experiences in the program.
A lack of program maturity is not a reason to forego evaluation. On the contrary, evaluation can be extremely useful in the early phases of program development. Even before a program is designed, evaluators can conduct needs assessments to determine how the target population can best be served. In the program's early implementation phase, evaluators can conduct formative evaluations that aim to identify areas for improvement. Then, once users have had time to adapt, and program developers have had time to incorporate what they've learned from early feedback and observations, evaluators can turn to summative evaluations to determine effectiveness.
When disseminating findings from summative evaluations, program leaders should work with their evaluator to help program stakeholders understand and interpret how program maturity may have affected evaluation findings. The use of multiple measures can help provide a balanced perspective on the program's effectiveness. Program leaders also may want to consider repeating a summative evaluation to provide more information about the program as it matures.
Given the sophistication of many online learning programs today, it takes an extraordinary amount of time and money up front to create them. This, and stakeholders' eagerness for findings, makes these programs especially vulnerable to premature judgments. Evaluators have an important message to communicate to stakeholders: Evaluation efforts at all stages of development are critical to making sure investments are well spent, but they need to be appropriate for the program's level of maturity.