Archived Information

The Quality of Vocational Education, June 1998

Research on Grouping and Tracking

De-trackers cite three kinds of studies to support their case against tracking. The first type of study is ethnographic or observational. Ethnographic studies provide narrative descriptions of classroom processes in upper- and lower-track classes. The second kind of study is the survey analysis. Researchers carrying out survey studies often use regression methods to determine how track placement influences students. The third type of study is the experimental comparison. Experimental studies examine educational outcomes for equivalent students assigned to ability-grouped and non-grouped classes.

Reviewers have recently examined the findings from each of these research traditions (Gamoran & Berends, 1987; J. Kulik, 1992; Slavin, 1987, 1990b). In this chapter, I critically examine the review findings and the conclusions that have been drawn from them. I also try to determine how relevant the findings are for vocational education. My overall goal is to provide a background for later detailed examination of studies of tracking and vocational education.

Ethnographic Studies of Tracking

The earliest ethnographic studies of tracking examined behavior in British streamed schools (Ball, 1981; Lacey, 1970; Hargreaves, 1967). Later influential studies focused on American schools and classes (Oakes, 1985; Page, 1991; Rosenbaum, 1976). Although some ethnographic studies include quantitative data, most provide only qualitative observations. Ethnographers seldom go beyond armchair analysis of their data. As Gamoran and Berends (1987) point out, ethnographers try to uncover the subjective meaning of events and patterns of life in schools through rational analysis of their observations.

Perhaps the best known example of ethnographic research on tracking is that reported by Oakes (1985) in her book Keeping Track. The observations were originally collected for a project that John Goodlad described in his 1984 book A Place Called School. They were made in 299 English and math classes (75 high track, 85 average track, 64 low track, and 75 heterogeneous classes) in a national sample of 25 junior and senior high schools. The observations covered course content, quality of instruction, classroom climate, and student attitudes in each of the classes.

Oakes saw a pattern in the results. Instruction usually seemed to be better in the higher tracks. She reported, for example, that the percentage of time spent on instruction was 81 in high-track and 75 in low-track English classes; percentage of time spent on instruction was 81 for high-track and 78 for low-track math classes. Percentage of time off-task was 2 for high-track and 4 for low-track English classes; it was 1 for high-track and 4 for low-track math classes. More time was spent on instruction and less time was spent off-task, therefore in the high tracks.

Oakes (1985) also reported that there were curricular differences in high- and low-track classes. She reported, for example, that low-track classes covered less-demanding topics, whereas high-track classes covered more complex material. High-track teachers also seemed to encourage competent and autonomous thinking, whereas low-track teachers stressed low-level skills and conformity to rules and expectations. Oakes did not provide quantitative data to support these observations, however.

Gamoran and Berends (1987) have summarized the main results from the ethnographic studies of Oakes and others.  They report that ethnographers have reached four main conclusions. (a) Instruction is conceptually simplified and proceeds more slowly in lower tracks. (b) More experienced teachers and those regarded as more successful seem to be disproportionately assigned to the higher tracks. (c) Teachers view high-track students positively and low-track students negatively. (d) Most of a student's friends are found in the same track.

An important point to note is that ethnographers seldom quantify their observations. It is often hard to tell therefore whether the differences they find between upper and lower tracks are large or small. In a few cases, however, ethnographers have quantified their observations. The results suggest that the differences may be small. Gamoran and Berends (1987), for example, noted that Oakes reported only a 2 or 3 percent difference in time off-task for upper and lower tracks. They noted that this difference is not large, and in most respects, track levels appear to be much more alike than they are different.

More important, it is difficult to tell whether differences in behavior in upper and lower track classrooms are student- or teacher-produced. Slavin (1990a) makes this point

"On the quality of instruction issue, the variables typically found to differentiate high- and low-track classes are ones that cannot be separated from the nature of the students themselves. For example, many studies find that there is less content covered in low-track classes. But is this by its nature an indication of low quality? Might it be that low-track classes need a slower pace of instruction?  The whole idea of ability grouping is to provide students with a level and pace of instruction appropriate to their different needs. Similarly, time on-task is found to be lower in low-track classes. Might it be that low-achieving students are more likely to be off-task no matter where they are?"  (p. 505).

Even in heterogeneous classrooms, low-achieving students might spend less time on-task than high-achieving students do. Even in heterogeneous classrooms, low-achieving students might cover less content than high-achieving students do.  Without observations of heterogeneous classrooms, it is difficult to know which classroom behaviors are student-produced and which are teacher-produced.

We could determine whether the differences were student- or teacher-produced if we had control data, but ethnographers seldom provide such data. When such data are available, they are usually illuminating. For example, some ethnographers have observed that self-esteem is low in lower-track classrooms, and they have concluded that tracking produces this low self-esteem by stigmatizing students. With control data available, we can see that the conclusion is unwarranted. Controlled studies have examined the self-esteem level of children in mixed-ability classrooms as well as tracked classrooms, and these studies have usually found that the self-esteem of lower aptitude children goes down even further when such children are taught in mixed-ability classrooms (Kulik, 1992). If anything, tracking seems to raise the self-esteem of lower aptitude students.

Lotto (1986) has noted an additional problem in applying results of these ethnographic studies to vocational programs.  The ethnographic studies examine instruction in lower-track academic classes not in vocational classes. According to Lotto, instruction in vocational classes may be more similar to high-track academic instruction than it is to low-track academic instruction. She points out that teachers use a wide variety of instructional techniques in vocational classes, and there is use much less lecturing, questioning, quizzing, and seat-work in these classes than in other classes. In other words, teachers of vocational classes use instructional techniques that are usually endorsed by experts. Lotto also points out that vocational courses are among those that students like most. No one knows for sure whether instruction in vocational classes is more like upper- or lower-track academic instruction, but Lotto concludes that it is unfair to charge vocational courses with deficiencies found in lower-track academic courses.

For vocational educators, the yield from ethnographic studies must seem rather slim. The ethnographic studies show that the amounts of time on-task are different in upper- and lower-track classrooms, and there may also be differences in teachers, in teacher reactions to students, and in instructional emphasis as well. One problem is that ethnographic studies do not show what lies behind these differences. Differences in instruction for fast and slower students may be appropriate adjustments, or they may reflect real differences in instructional quality in different curricular tracks. Observational research does not provide the answer. Another problem is that differences found in comparisons of upper and lower-track academic classes may not be found in comparisons of academic and vocational classes.  At least some experts believe that quality of instruction is higher in vocational classes than it is in academic ones.

Regression Studies of Tracking

Regression studies of survey data on tracking address two main questions. First, what factors influence students to enroll in different curricular tracks? Survey researchers have been especially interested in determining whether academic ability or socioeconomic status plays a more important role in track placement. Second, how much do curricular tracks influence students? Researchers have explored the influence of tracks on such educational outcomes as high school achievement, postsecondary attainment, and self-esteem.

Garet and DeLany (1988) identified and reviewed the four most influential studies on student placement into curricular tracks. These studies are those of Alexander, Cook, and McDill (1978); Hauser, Sewell, and Alwin (1976); Heyns (1974); and Rosenbaum (1980). All of these studies have one thing in common. They divide the high school curriculum into only two tracks, the college preparatory track and the noncollege track. The studies do not separate the vocational track from the general track. Gamoran and Mare (1989) drew conclusions from the studies examined by Garet and DeLany and from their own analysis of High School and Beyond data.

Although estimates of the importance of ability, socioeconomic status, and other influences on track placement differ somewhat from one study to the next, the pattern of results is fairly consistent. Four points emerge from the studies reviewed by Garet and DeLany (1988) and Gamoran and Mare (1989):

Results concerning the effects of track placement on achievement in high school are less clear. Gamoran and Berends (1987) reviewed results from 16 outcome studies that analyzed data from 10 national or state-wide surveys. Some of the studies found that a significant amount of variation in test scores was related to track membership, but others found a non-significant relationship between these variables.  All of the studies of the question, however, found that there was a significant relationship between track membership and educational attainment after high school. That is, students who are in college preparatory programs are more likely to enroll in college than are equally able students from the general and vocational tracks. Gamoran and Berends do not report any other consistent findings from national surveys of high school students.

Reviewers have criticized the studies that produced these findings on methodological grounds. Most of the studies, for example, compared achievement of students in academic and nonacademic programs. The observed difference in aptitude of students in these programs is so great that attempts to equate the groups statistically may be futile.  Slavin (1990a) has described the problem as follows:

"In my article, I discussed at length the problems with these high-track/low-track studies. One problem is statistical; when groups are very different on a covariate, the covariate does not adequately "control" for group differences. To the degree that the covariate has a reliability less than 1.0, it tends to undercontrol for group differences, but even small differences in within-group slopes of the covariate on the dependent measure can cause major errors when there are large group differences (see Reichardt, 1979). When comparing high- to low-ability groups, pretest or covariate differences of one to two standard deviations are typical. No statistician on earth would expect that analysis of covariance or regression could adequately control for such large differences" (p. 506).

Another methodological problem that comes up in comparisons of academic and nonacademic students is the failure to take into account all differences between high- and low-track students. Slavin (1990a) has also described this problem clearly:

"In addition, the logic of such comparisons is simply difficult to accept. Do students at Harvard learn more than those at East Overshoe State, controlling for SAT scores and high school grades? Are the San Francisco Forty-Niners better than the Palo Alto High School football team, controlling for height, weight, speed, and age? Such questions fall into the realm of the unknown. Comparing the achievement gains of students in existing high versus low tracks is not so different. Many factors go into track placement—achievement, behavior, attitudes, motivation, prior course selection, and so onand each of these is likely to affect post-test achievement regardless of track placement. No study will ever adequately control for all these factors, and as a result studies comparing high to low track will always tend to show higher achievement for the high track students" (p. 506).

Beyond these methodological problems lie conceptual ones. Writers who cite regression results in arguments against tracking miss the point of the analyses. Regression analyses at best show what would happen to certain students in a track system if they were moved to another track.  Regression analyses do not show what would happen if a track system were eliminated and replaced by something different.  Effectiveness might be greater or less in an untracked system than in any of the tracks in a multi-track system. Slavin (1990a) has also written about this point:

"However, this whole discussion is in many ways beside the point. Educators are not looking for research on whether students should be assigned to the high-, middle, or low-ability group. As long as the system exists, students will be assigned to these groups by some standard. What educators want to know is the effect of the system, compared to a plausible alternative. This is precisely the comparison made in 29 studies I have emphasized. In these studies, schools did exactly what many middle and high schools are considering: They untracked, either in selected subjects or across the board. The results were clear. Comparing ability grouped to ungrouped situations, there were no differences for high, average, or low achievers" (p. 506).

It would be wrong to conclude, however, that regression analysis should never be used with achievement data from tracked students. It is true that regression methods can produce misleading results when academic and nonacademic students are being compared. But regression results are far more trustworthy when vocational and general students are being compared because vocational and general students are similar in important characteristics that influence school outcomes. Comparisons of vocational and general students have seldom received much attention in reviews of research on curricular tracking, however.

Experimental Studies of Ability Grouping

Researchers have for many decades carried out experimental evaluations of tracked or ability-grouped classes. Slavin (1990a) has noted that these studies are the best guide that we have to the effects of tracking and grouping on students. In a typical study, a researcher assembles groups of learners who have been assigned to either ability-grouped or non-grouped classes. The researcher compares overall academic achievement in the ability-grouped classes and non-grouped classes, and the researcher may also examine effects of grouping on children at different aptitude levels.

One difficulty in drawing conclusions from the experimental literature arises from the variety of grouping programs. Such programs are not all the same. Differences among programs such as the following are too great to be ignored:

Older reviews do not distinguish adequately among grouping programs, and as a result they do not always reach the same conclusions about grouping effects (Kulik, 1992).

Recent reviews, however, have made the necessary distinctions, and the reviewers have reached consistent conclusions about what the research says. Among the most comprehensive recent reviews are the meta-analytic investigations carried out by Robert Slavin at Johns Hopkins University (e.g., Slavin, 1987, 1990b) and those conducted by my research group at the University of Michigan (e.g., C.-L. Kulik & J. Kulik, 1982, 1984; J. Kulik & C.-L. Kulik, 1984, 1987, 1991, 1992 ). These meta-analyses show that different kinds of programs produce different effects. The key distinction is among (a) programs in which all ability groups follow the same curriculum; (b) programs that make curricular adjustments for the special needs of highly talented learners; and (c) programs in which all groups follow curricula adjusted to background and ability.

Grouping without curricular adjustment. The Michigan meta-analyses covered 51 separate studies of XYZ classes (J. Kulik , 1992), and the Johns Hopkins analyses covered 47 studies (Slavin, 1987, 1990b). Both analyses reached the same conclusion about lower and middle ability students: These students learn the same amount in XYZ or mixed classes. The evidence from the higher aptitude groups is less clear. The Michigan meta-analysis found that higher aptitude learners make slightly larger gains in XYZ programs. A higher aptitude student who gained 1.0 years on a grade-equivalent scale after a year in a mixed class would gain 1.1 years in an XYZ class. The Johns Hopkins meta-analysis suggested that gains for higher aptitude students are equal in XYZ and mixed classes. Slavin (1990b) described effects of XYZ programs on secondary students as follows:

"Comprehensive between-class ability grouping plans have little or no effect on the achievement of secondary students, at least as measured by standardized tests. This conclusion is most strongly supported in grades 7-9, but the more limited evidence that does exist from studies in grades 10-12 also fails to support any effect of ability grouping. ....For the narrow, but extremely important purpose of determining the impact of ability grouping on standardized achievement measures, the studies reviewed here are exemplary. Six randomly assigned individual students to ability-grouped or heterogeneous classes, and nine more individually matched students and then assigned them to one or the other grouping plan. Many of the studies followed students for 2 or more years. If there had been any true effect of ability grouping on student achievement, this set of studies would surely have detected it" (p. 494).

Why are the effects of XYZ grouping on student achievement so small? It may be because the curriculum is a key determinant of learning outcomes, and XYZ programs do not prescribe different curricular materials for the stratified classes. While school personnel are usually careful in placing children into XYZ classes by aptitude level, they seldom adjust the curriculum to the aptitude levels of the classes. For example, children in the high group in a grade 5 program may be ready for work at the 6th-grade level; children in the middle group are usually ready for work at the 5th-grade level; and children in the low group may need remedial help to cover 5th-grade material. All groups work with the same materials and follow the same course of study in most XYZ programs. XYZ programs are therefore programs of differential placement but not differential treatment.

Some of the studies of XYZ classes also examined student self-concepts. The Michigan meta-analyses, for example, covered 13 studies of grouping effects on self-esteem (J. Kulik, 1992). The average overall effect of grouping in the 13 studies was to decrease self-esteem scores by a trivial amount. The average self-esteem scores in XYZ and mixed classes were therefore nearly identical. Nonetheless, XYZ classes appeared to have a small effect on student self-esteem. The Michigan meta-analyses showed that self-esteem scores go up slightly for low-aptitude learners in XYZ programs, and they go down slightly for high-aptitude learners. Brighter children lose a little of their self-assurance when they are put into classes with equally talented children. Slower children gain a little in self-confidence when they are taught in classes with other slower learners.

Findings from studies of XYZ classes may be relevant to some courses taken by students in vocational programs. Like other students, vocational students take courses in core subjects such as English and math, but they sometimes find themselves in sections of these courses in which there are many other vocational students, but relatively few college-bound students. The homogeneous grouping may be the deliberate result of administrative planning or the unplanned consequence of scheduling of courses in vocational subjects.  The findings on XYZ classes suggest that these homogeneous classes should not hurt vocational students academically, as long as the sections taken by vocational students follow the same curriculum as other classes do. Findings on XYZ classes also suggest that vocational students may feel slightly better about themselves and their abilities in these homogeneous sections than they would in mixed-ability sections.

Curricular adjustment for high-aptitude learners. The Michigan meta-analyses covered 23 studies of accelerated classes for high-aptitude learners (J. Kulik, 1992). The studies compared the achievement of equivalent students in accelerated classes and non-accelerated control classes. All of the studies examined moderate acceleration of a whole class of students rather than acceleration of individual children. In each of the comparisons involving students who were initially equivalent in age and intelligence, the students in accelerated classes outperformed the students in non-accelerated classes. In the typical study, the average superiority for the students in accelerated classes was nearly one year on a grade-equivalent scale of a standardized achievement test.

The Michigan meta-analyses also covered 25 studies of enriched classes for talented students. Twenty-two of the 25 studies found that talented students achieved more when they were taught in enriched rather than regular mixed-ability classes. In the average study, students in the enriched classes outperformed equivalent students in mixed classes by about 4 to 5 months. Children receiving enriched instruction gained 1.4 to 1.5 years on a grade-equivalent scale in the same period during which equivalent control children gained only 1.0 year.

The strong effects of accelerated and enriched classes are probably due to curricular differentiation. In XYZ classes, curricular adjustment is minimal; in accelerated and enriched classes, it is maximal. In these classes, teachers introduce a good deal of above-grade-level material for students who are willing and able to meet the challenge. The test scores show the results. High-aptitude students benefit from taking these advanced classes, and they suffer when they are held back in regular classes.

Although these findings come from academic classes rather than vocational ones, they may be relevant for vocational educators. Vocational students take fewer advanced courses in mathematics, English, and in other academic areas than college prep students do. High aptitude students in vocational programs may therefore be at a disadvantage on standardized tests when compared to equally talented students in college-prep programs. Vocational students as a group may therefore perform less well on tests in the core subjects because high-aptitude vocational students ordinarily take fewer advanced and enriched courses than college-prep students do.

Curricular adjustment for all students. Both the Michigan and Johns Hopkins meta-analyses found that cross-grade and within-class programs in elementary and middle schools usually produce positive results (Kulik, 1992; Slavin, 1987). The Michigan analysis, for example, covered 14 studies of cross-grade grouping and 11 studies of within-class grouping. More than 80 percent of the studies of each type reported positive results. The average gain attributable to cross-grade or within-class grouping was between 2 and 3 months on a grade equivalent scale. The typical pupil in a mixed-ability class might gain 1.0 years on a grade-equivalent scale in a year, whereas the typical pupil in a cross-grade or within-class program would gain 1.2 to 1.3 years. Effects were similar for high, middle, and low aptitude pupils.

Cross-grade and within-class programs appear to work because they provide different curricula for pupils with different aptitude. In cross-grade programs, students move up or down grades to ensure a match between their reading ability and their reading instruction. In within-class programs, teachers divide students into ability groups so that all children can work on arithmetic materials for which they are properly prepared. Curriculum varies with student aptitude in both cross-grade and within-class programs. The programs thus differ in an important respect from most programs of XYZ grouping.

These studies are only indirectly relevant to vocational education. After all, none of the studies examined instruction in senior high school, and none examined grouping in vocational subjects. Nonetheless, the studies do suggest that separation of students into ability groups will produce positive benefits for all students if the ability grouping is accompanied by appropriate curricular differentiation. These studies suggest that vocational and other tracks may produce positive benefits if the curricular tracking is used to provide students with instruction for which they are adequately prepared.

Conclusions from experimental studies. Different grouping and tracking programs produce different effects on student achievement. Some grouping programs do not raise student test scores above the usual levels, but others lead to moderate to large increases in student achievement. The programs that add little to student achievement are those in which all ability groups follow the same curriculum. Programs that have a moderate to large effect on student achievement are those in which groups follow curricula adjusted to their ability levels.

Less is known about the effects of grouping programs on student self-esteem, but experimental studies fail to support the charge that students in the lower tracks suffer irreparable damage to their self-esteem. Students in the high groups drop a little in self-esteem; students in low groups actually increase a little in self-esteem in ability-grouped classes. The finding is inconsistent with labeling or stigma theory, which predicts a drop in self-esteem for the groups with lower status. The finding is consistent, however, with predictions of social comparison theory. According to social comparison theory, people make self-evaluations by comparing themselves to those around them. The theory predicts that slow learners will feel more adequate in a slow-learning group and that fast learners will feel less special in a fast-learning group.

Conclusions

Advocates of de-tracking cite three types of studies to support the claim that curricular tracks have harmful effects on students. The studies are survey analyses of tracking, ethnographic studies of tracked classes, and experimental studies of ability grouping. Authoritative reviews of these studies do not support the claims of the de-trackers.

Authoritative reviews of national educational surveys suggest that the most important factor affecting student placement into tracks is the personal preference of the students. Between 75 percent and 85 percent of all students report choosing their curricular programs. The next most important influence on track placement is academic aptitude. Other factors, including socioeconomic background, gender, and race, play a less important role in track placement. The reviews also report one highly consistent effect of tracking:  Enrollment in a college-prep track is related to college attendance. Students from college-prep programs are more likely to enroll in college than are equally able students from general and vocational programs. Educational surveys report mixed findings on student achievement, however. Some studies report that a significant amount of variation in test scores is related to track membership, but others report a non-significant relation between test scores and curricular track. Reviewers report no other consistent findings from regression analyses of tracking effects.

Although individual ethnographers have reported that the curriculum is debased, teachers are inexperienced, and instruction is poor in lower track classes, reviewers find little concrete evidence to support the charges. When ethnographers have quantified their observations, their reports suggest that differences between instruction in upper- and lower-track academic classes are small. What is more important, the interpretation of the differences is unclear. The slight difference ordinarily found in instruction in upper- and lower-track classes may reflect a difference in instructional quality, but it may also indicate that teachers make appropriate adjustments to the different needs of young people in the two types of classes. Reviews of experimental studies show that different grouping or tracking programs produce different effects on students. Some grouping programs have little or no effect on student achievement, but others have moderate to large effects. The key distinction is between programs in which all ability groups follow the same curriculum and programs in which groups follow curricula adjusted to their background and ability. Programs in which all ability groups follow the same curriculum have little or no effect on student achievement. Programs in which curricular adjustments are made for the aptitude level of the students usually have moderate to large effects. Few of the experimental studies, however, examine effects on vocational students as a separate group.

The overwhelming impression that one gets from literature reviews on grouping and tracking, in fact, is neglect of vocational education. Reviewers of survey research usually fail to distinguish between general and vocational tracks. Instead, they lump general and vocational classes together into a single category of nonacademic classes. Reviewers of ethnographic studies examine instruction in upper- and lower-track academic classes, but not in vocational courses. Reviewers of the experimental literature cite relatively few studies of ability grouping at the senior-high level and seldom mention the topic of vocational education.

Reviews of the literature on tracking and grouping are therefore not as helpful as they should be to vocational educators. To draw conclusions about the effect of vocational programs, vocational educators cannot simply rely on existing reviews, but instead they must look more closely at the studies themselves. In the next four chapters, I provide some help in this task by describing and analyzing results from relevant studies. My purpose is to uncover findings that are more directly relevant to the field of vocational education.


-###-
[Introduction and Purpose] [Table of Contents] [High School Completion]