Archived Information

Answers in the Tool Box: Academic Intensity, Attendance Patterns, and Bachelor's degree Attainment — June 1999


1. National Center for Education Statistics, High School & Beyond Sophomore Cohort: 1980­92, Postsecondary Transcripts(NCES 98-135). The CD includes not only the postsecondary transcript files, but also the high school transcript files, approximately 200 student-level variables constructed from the survey data, a labor market experience file, and an institutional file. The data from these selected files, while sufficient for most analyses of the life course histories of this cohort, can be merged with thousands of other variables on the original (1995) version of the HS&B/So restricted data set.

2. Surveys of the group were taken in 1980, 1982, 1984, 1986, and 1992. Postsecondary transcripts were gathered in 1993. Only four percent of the cohort was enrolled in postsecondary education in 1993, so for most students in the sample, the history ends at age 28/29 in 1992. A small number of students in the sample died between 1980 and 1992. The analysis file used in this study excludes those who passed away prior to 1984 on the grounds that, in terms of educational histories, they did not have the chance to complete degrees.

3. The BPS89-94 is an "event cohort" study, not an age cohort study. The initial group consisted of a national sample of people who were true first-time postsecondary students in the academic year, 1989-1990. These students ranged in age from 16 to over 50. The data collected for this group included neither high school nor college transcripts.

4. For example, during the five years of the Beginning Postsecondary Students Study (19891994), 18 percent of participants moved from dependent to independent status, 19 percent experienced a change in marital status, and 9 percent added children to their household (14 percent already had children when they started postsecondary education in 1989). Source: Data Analysis System, BPS90.

5. In most institutions of higher education, and certainly at a state university, the 3rd semester of calculus assumes that the student has previously studied elementary functions and analytic geometry. If one is placed in the 5th semester of college-level Russian, one can assume, at a minimum, prior study at the 4th semester level.

6. See, for example, Fernand Braudel's essay, "History and the Social Sciences," in Braudel (1980), pp.25-54.

7. Of the HS&B/So students who took more than two remedial courses in college, the Census division x urbanicity of high school cells in which one finds the largest proportions (and in relation to their share of the origins of all postsecondary students) were:

  Proportion of
Remedial Students
Proportion of
All Students
Division x Urbanicity
South Atlantic, suburban 10.4% 6.8%
East North Central, suburban 9.7 12.2
Pacific, suburban 9.6 8.3
Mid-Atlantic, suburban 6.8 9.8
Mid-Atlantic, urban 5.9 3.8
East North Central, urban 5.4 3.9
South Atlantic, rural 5.1 4.8
West South Central, rural 4.5 3.1

The lessons of such detail are not only that suburban high schools can be significant producers of remedial students, but that specific Region x Urbanicity configurations are over-represented in the origins of remedial students: South Atlantic and Pacific suburban schools and East North Central and Mid-Atlantic urban schools.

8. Given the 1,000 + course taxonomy in 77ze New College Course Map and Transcript Files (U.S. Department of Education, 1995), the following illustrates what is/is not included in the aggregate category of "remedial courses." The examples are generalized. Institutional credit and grading policies help determine what is "remedial" in any given case. If, for example, in institution X, "Grammar and Usage" is indicated as a non-additive credit course with a nonstandard grade (e.g. "Y"), it would be classified as remedial. If the same title appeared on a student record as a junior year course for someone who had previously taken courses in Shakespeare and creative writing and the credits were additive and the grades standard, the course would have been classified under linguistics. There were over 300,000 course entries in the HS&B/So transcript sample. Every entry was examined in this manner.

Included in the "Remedial" Aggregate Not Included
Basic Skills: Student Development Student Orientation
Basic Academic Skills Library Skills/Orientation/Methods
Remedial English; Developmental English,
Punctuation, Spelling, Grammar, Basic
Language Skills, Grammar and Usage
Reading & Composition, Exposition
Basic Writing, Writing Skills Academic Writing, Informational Writing
Remedial/Basic Speech, Basic Oral
Communication, Listening Skills
Fundamentals of Speech, Speech
Communication, Effective Speech
Basic reading, Reading Skills,
Reading Comprehension
Speed Reading, Reading & Composition
Business Math: Pre-College, Business
Business Arithmetic, Business
Computations, Consumer Math
Math for Business/Econ, Math for Finance, Business Algebra
Arithmetic Number Systems/Structures
Pre-College Algebra Algebra for Teachers

9. We can drive home the point even further by comparing student accounts from BPS to transcript accounts from the HS&B/So in the matter of the type of remedial courses taken:

  BPS(student) HS&B/So (transcript)
Any Remedial Mathematics 8.6% 33.7%
Any Remedial Reading 7.4 11.2

Hypothesis: students are more likely to know that they are in remedial reading than in precollegiate level mathematics.

10. To cite all the major studies that collapse high school curriculum in this manner would consume a dozen pages.

11. The procedure involves, first, matching all existing cases of students who show both SAT/ACT and senior test scores on their records, and determining the percentile on the senior test score that matched the median score on the SAT. It is not surprising that this percentile (54th) is higher than the mean for the senior test since the SAT/ACT test-taking population has been filtered by college-going intentions. The second step in imputing senior test score percentile from SAT and ACT scores is to call the median for each test the 54th percentile and to distribute the rest of the SAT/ACT scores in terms of percentiles.

12. The average was based on grades in non-remedial mathematics courses, English, all science courses, foreign language, history, and social studies. Grades in fine and performing arts and vocational courses were not included.

13. Faced with a similar situation, and because they had a much smaller sample and wished to reduce the number of missing cases to a minimum, Alexander and Eckland (1973) used a regression weight in the opposite direction: from the school principal's report of the student's class rank, in quintiles, to the student's self-reported "grade averages. "

14. The five variables involve different combinations of Carnegie units in high school subject matter as follows:

English 4.0 --- --- --- ---
Mathematics 3.0 3.0 3.0 3.0 2.0
Science 3.0 3.0 3.0 3.0 2.0
Social St. 3.0 3.0 3.0 3.0 3.0
For. Langs. 2.0 --- 2.0 --- ---
Comput Sci. 0.5 0.5 --- --- ---

For four of these variables, we have no indication of how much English is involved; and for three of them, no indication of how much foreign languages or computer science may be involved. Taking these constructions at face value, there is no hierarchical difference between NWBASIC2 and NWBASIC3. In fact, while one can set NWBASIC1 aside as an ideal, the only claim to a hierarchy is implicit in the numbering of the combinations. Those interested in the frequency counts for these variables can view them in the public release Data Analysis System (DAS) for High School & Beyond/Sophomore cohort included on the National Center for Education Statistics' CD#98-074.

15. The categories of mathematics available on a high school transcript sample from the late 1960s were: Algebra 1, Algebra 2, geometry, trig, calculus, general math (1,2,3, and 4), applied math (1 and 2), advanced math, and math not elsewhere classified (see Pallas and Alexander, 1983, p. 175). This is a very difficult list to configure in a categorical variable with intervals that clearly delineate a hierarchy. By the time the HS&B/So cohort was in high school a decade later, pre-calculus was a standard high school offering, pre-algebra courses were clearly identified, and statistics was specified, i.e. there was a great deal more specificity on the HS&B/So high school transcripts, and one could construct an HIGHMATH variable with the following values: Calculus, Pre-Calculus, Trigonometry, Algebra 2, Geometry, Algebra 1, Pre-Algebra/General Math I and 2/Arithmetic, and Indeterminable.

16. The early drafts of this monograph included a separate section in which the Academic Resources construction was replicated and tested using a newly-edited version of the NLS-72 high school records, and in which Altonji's work was described in more detail. At the advice of reviewers, this section was set aside for separate publication.

17. There are two versions of the high school transcripts in the HS&B/So data file, each based on a slightly different coding system. The HSTS version, on which this study relies, did not include an accounting for remedial English. " The CTI version of the transcripts was merged for this variable. Fractional credits (less than 0.5) that were labeled "remedial English" in the CTI version of the transcripts were not deemed remedial since the "courses" at issue included tutorials and workshops, and these do not necessarily mean developmental work.

18. A criterion of 0.5 or more credits of computer science, as a dummy variable, was added at eight points along the scale to disaggregate lumps in the distribution. And at four points along the scale, credits earned in mathematics or science were added for the same purpose.

19. Pallas and Alexander (1983) attributed about 60 percent of the gap between men's and women's scores on the SAT-Q "to the sparse quantitative programs of study typically pursued by girls in high school". The High School & Beyond/Sophomore data, however, do not show much of a divergence in the highest level of mathematics studied by men and women in high school, nor a significant divergence in composite senior year test scores by highest level of mathematics:

  Proportion Who Reached
This Level of Mathematics
in High School
Proportion Scoring in the
Highest Quintile of the
Senior Test Composite
  Men Women Men Women
Calculus   5.3   4.1 82.3 81.2
Pre-Calculus   4.7   3.9 66.5 63.7
Trigonometry   9.0   7.8 51.9 48.3
Algebra 2 21.3 23.9 31.0* 26.9*
< Algebra 2 59.7 60.3   6.9   5.8

20. Alexander, Riordan, Fennessey and Pallas define "high academic resources" as one standard deviation above the mean on ability (the senior test) and class rank, both within a college preparatory high school curriculum (p. 325). The principal problem with this formulation lies in class rank, since the variable is computed within-school, whereas the other components are not.

21. Using the 1994 survey of the NELS-88 cohort, i.e. two years after scheduled high school graduation, Berkner and Chavez (1997) first examined the pre-collegiate records of all students who said they had attended a 4-year college as of that date. They then judged all NELS-88 students with reference to the profiles of those who had entered 4-year colleges. Students were judged to be "college qualified" if their records evidenced at least one value on any of five criteria that would place them among the top 75 percent of 4-year college students for that criterion. The minimum values for "qualified" were: a class rank of the 46th percentile, an academic GPA of 2.7, an SAT combined score of 820, an ACT composite score of 19, or a NELS-88 test score (roughly the same test as used for the HS&B/So) of the 56th percentile. Curriculum in the form of the NWBASIC1 variable (see p. 13 above) was used to adjust degrees of "qualification," that is, it played a secondary role despite data that show it to be of primary importance. ACRES, in contrast, does not judge students with reference to isolated criteria, rather provides an analytic indicator of the general level of academic resource development toward which students can reach.

22. The most simplistic line of these inquiries use data from the Current Population Surveys of the Census Bureau despite the ambiguities in the way Census asks questions about "college" enrollment and (until 1992) attainment. The ambiguity produces extraordinarily volatile year-to-year enrollment rates, particularly by race, though analysts usually ignore the volatility. Other problems with time series in Census data involve the 1992 division of the question concerning secondary school completion into two categories (diploma and equivalency) and the fact that the Current Population Surveys do not contain information on immigration status (Census focuses on college enrollment for the non-institutionalized civilian population age 1824, and this group includes people who attended primary and/or secondary school in other countries). There are simply too many cross-currents and too much imputation in the data collection methodology of Census to rely on this source for precise estimates of college access or even degree completion (Pelavin and Kane, 1990). Other benchmark data assembled by the American College Testing Service, for example, come far closer to the longitudinal studies estimates and evidence far less volatility (see Digest of Education Statistics, 1997, tables 183 and 184, pp. 194-195).

23. The NELS-88 longitudinal study reminds those who tend to forget the importance of this factor in initial choice of college: 71 percent of high school seniors in 1992 cited location as a primary factor in choice. Source: Data Analysis System (DAS), National Education Longitudinal Study of 1988.

24. This was true particularly for those of Hispanic background. The cultural tone of persistence decisions in this population is very difficult to model, but critical to acknowledge. None of the national longitudinal studies account for the role of family and significant others after initial access to higher education. There is no reason to believe that a student will offer anything but an honest assessment to the true/false statement, "my family encourages me to continue attending this institution" (Cabrera, Nora, and Castaneda, 1993), but the relative power of the attitudes behind the statement may be very different for students in community colleges compared with those in 4-year colleges compared with those attending more than one institution.

25. McCormick's study excluded students who attended 4-year colleges but began their careers in other types of institutions (26 percent of all 4-year students and 20 percent of the bachelor's degree recipients in the HS&B/So), as well as 4-year college students whose 12th grade educational aspirations were less than a bachelor's degree (29 percent). McCormick also excluded credits earned by examination, credit equivalents of clock-hour courses, credits earned at less than 2-year schools, and credits earned before high school graduation. Some of these exclusions are unfortunate, but they should not detract from an instructive exposition.

26. Confining their interest only to completion at the first institution of attendance, the purpose of Astin, Tsui and Avalos' study was to demonstrate the difference between predicted and actual institutional graduation rates. The model is worth visiting. Starting with self-reported high school grades, and with stepwise feeding of SAT scores, gender and race, these authors found the adjusted R2s increased from .281 to .325 in predicting 9-year completion rates within an institution (with gender adding a small amount to the R 2 and race adding almost nothing). What does that mean and how do we judge the results? An R 2 of .325 means that the model accounts for about one-third of the variance in what happened to this population. Given all the intervening behaviors of a 9-year period, one-third of the variance is a very strong number. As we will see, however, models that transcend the boundaries of a single institution are both more persuasive and produce even stronger estimates.

27. However, the proportion varies widely by type of true first institution of attendance and by combinations of institutions attended. For example, 24 percent of those whose first institution of attendance was a community college indicated they had been part-time students versus 9.8 percent of those who first entered comprehensive colleges. Among those who attended only 4-year colleges, 6.5 percent indicated part-time status at some time during their undergraduate careers, versus 20 percent of those engaged in alternating or simultaneous enrollment in 4-year and 2-year colleges.

28. Source: Data Analysis System (DAS), BPS90.

29. Unfortunately, Berkner, Cuccaro-Alamin, and McCormick did not make use of one of the most important filtering variables in the BPS90, namely the question of whether the respondents categorized themselves as students who happen to be employed (63.7%) or employees who happen to be students (36.3%). The distinction ripples through the entire dataset and any interpretation of student careers. Here are some examples of how the differences in primary status played out in the first institution of attendance (1989-1990):

   ALL 63.7 36.3
Level of First Institution
   4-year 77.5 22.5
   2-year 49.1* 50.9*
   < 2-Year 38.0 62.0
Degree Working Toward
   None 23.9 76.1
   Certificate 37.4 62.3
   Associate's 53.8 46.2
   Bachelor's 76.2 23.8
Enrollment Intensity
   Full-Time 73.6 26.2
   Part-Time 31.6 62.0

Source: Data Analysis System (DAS), BPS90. * Not a statistically significant difference.

30. Hearn (1992) also warns us "to be suspicious of the measurement properties [validity, reliability] of aspirations, plans, and expectations indicators when the data are from the responses of middle or late adolescents." (p. 661).

31. Morgan (1996) translated the categorical variable of educational aspirations into years of schooling (e.g. a master's degree was worth 18 years, 6 of which were postsecondary). His notion was that if there were still differences in educational expectations of sub-groups after controlling for SES (which he standardized to a scale in which the mean= 0 and SD= I in order to merge the HS&B/So and NELS-88 cohorts), then we have to look elsewhere to explain the residual. With this methodology, Morgan found that, net of drop-outs, 29 percent of the students increased their expectations and 26 percent lowered their expectations between grades 10 and 12. This is a much less dramatic change than that revealed by the minimum educational level satisfaction questions treated as categorical variables.

32. The family income variable in the HS&B/So (as well as most other national data sets) is equally as tenuous as parental levels of education, but may be more important in analyses of college going, persistence, and completion. I chose to use the family income file for the HS&B/So base year (1980) prepared as a by-product of a report to NCES estimating families' capacity to finance higher education for their children (Dresch, Stowe, and Waldenberg, 1985). The "Dresch file" examined all attendant features of family and student, and removed outlying cases. The analytic problem with the "Dresch file" is that it reports income in eight unequal bands. The distribution appears reasonable, but there are no means here, no standard deviations, and no regular intervals on a continuous scale. When set on a grid against SES quintiles, we lose nearly 25 percent of the HS&B/So universe. As Sewell and Hauser (1975) effectively demonstrated, non-economic aspects of stratification are more important than the economic. So one should stick with the composite rather than lose such a large proportion of the sample.

33. NELS-88 provides both student and parent accounts, but 15 percent of the parents did not provide information on their highest levels of education, and 23 percent skipped the question on family income.

34. There were only 414 cases out of 14,825 in the data base that could be edited in this manner. In weighted numbers, the results indicate that we have been over-estimating first generation college status by a minimum of 4 percent.

35. For students of "typical college age" in the Beginning Postsecondary Students Study of 1989-1994, we find the following with respect to factors in choosing the first institution of attendance, by general type of institution the student actually entered:

  4-Year College 2-Year College < 2-Year
School Close to Home
An Important Factor
in Choice
59.4% 75.4% 50.1%
First Institution Was
50 or Fewer Miles from
45.7% 90.0% 78.0%

Source: National Center for Education Statistics: Beginning Postsecondary Students, 1989-1994, Data Analysis System.

36. Transcript practices with respect to study abroad are highly variable, and require hand-and-eye reading to identify. Some may be highly explicit, e.g. "University of Heidelberg" or " Monterrey Semester. " Others may use an abbreviation, e. g. " Bensacon, " and the transcript reader simply has to know that a major French language training center is being referenced. In still other cases, the sequence of courses for a history major shifts in the spring term to "Art and Architecture of Florence," "The Age of Machiavelli," "Advanced Italian Conversation," Il Paradiso, and "Antonioni and Italian Cinematic Realism." The reader knows that it is highly unlikely that the student is attending school in the United States.

37. How does this estimate stack up against current (1999) claims of massive numbers of bachelor's degree students doing post-baccalaureate work for credit in community colleges? No one has conducted a complete census, let alone a census with unduplicated headcounts. But let us speculate. The HS&B/So is only one high school graduating class. Assume that there are ten high school graduating classes "in play" at the present moment, and that in each class of eventual bachelor's degree recipients there has been a 10 percent increase in the proportion attending a community college after the BA. This fairly generous set of assumptions results in an estimate of about a half-million credit students, or 9 percent of community college students currently enrolled for credit.

38. The outcome effects of selectivity have been remarkably consistent for two generations of college graduates: both the NLS-72 and HS&B/So:

Relation of selectivity of bachelor's degree-granting institution to graduate school attendance and GPA in two cohorts of college graduates, 1972-1993

1972-84 1982-93
Graduate School Attendance by Age 30
Highly Selective 39.7% 43.0%  
Selective 34.6% 31.0  
Non-Selective 23.1 17.9  
Mean (S.D.) Undergraduate GPA
Highly Selective 3.16 (.51) 3.13 (42) Effect size = .07
Selective 3.01 (.50) 2.96 (.45) Effect size = .11
Non-Selective 2.92 (.48) 2.86 (.42) Effect size = .13

NOTES: (1) The universes are confined to students who earned bachelor's degrees and for whom an undergraduate GPA could be determined. NLS-72 weighted N = 732k; HS&B/So weighted N = 935k. (2) Differences in estimates of graduate school attendance rates are significant at p<.05. (3) Effect sizes for changes in mean GPA indicate no change. SOURCE:NCES: (1) National Longitudinal Study of the High School Class of 1972; (2) High School & Beyond/Sophomore Cohort, NCES CD#98-135.

39. The proportion of all undergraduate grades that were drops, no-penalty withdrawals, and no-penalty incompletes rose from 4 percent for the NLS-72 cohort to over 7 percent for the HS&B/So.

40. The variable, HSBSTAT, on the 1998 restricted CD release dataset, divides the entire universe of HS&B/So students into five groups with reference to their postsecondary status. Of these groups, only three are in the potential universe for analysis: (1) students for whom postsecondary transcripts were received (8215, of whom 14 had died by 1986 and are not included), (2) students for whom transcripts were received, but the content of the transcripts was entirely GED-level or basic skills work (180, of which very few had attended 4-year colleges), and (3) students for whom transcripts were requested but not received, yet for whom the evidence allows imputation of college attendance (478). In the basic universe for multivariate analyses, the expansion group comes from the third of these categories. Relying on survey data, it was possible to determine the order of institutional attendance for students in the third of these groups, hence values for variables such as Transfer and No-Return.

41. Of all students who entered postsecondary education, 7 percent of the men and 16.2 percent of the women became parents by 1986. Among students who attended 4-year colleges at any time, the figures were 3.3 percent (men) and 7.5 percent (women).

42. Had I not edited the "Children" variable to exclude contradictory and out-of-scope information, its contribution to the explanatory power of the model, as well as the overall explanatory power of the model, would have been 0.2 percent higher. What this tells us, albeit very indirectly, is that the people who provided contradictory or out-of-scope information about having children probably did not complete bachelor's degrees.

43. The statistical software packages commonly used for regression models (SAS, SPSS, STATA, and others) first set up all the variables in a correlation matrix. Only those independent variables whose correlations with the dependent variable (in our case, bachelor's degree completion) are statistically significant at p<.05 are allowed into the regression equation. p<.05 is a default value. One can change this criterion to be more or less generous. I have chosen to be more generous throughout the models in this study by setting the selection criterion to p<.20. The reader will note that, under those conditions, variables of marginal significance will not hold up when they are asked to take on an explanatory role.

44. The California State University at Fullerton is not the California State University at Monterey Bay. Queens College of the City University of New York is not the John Jay College of Criminal Justice (also part of CUNY). Even a multi-campus institution such as Southern Illinois University presents different environments for students who move among the campuses.

45. For the academic satisfaction index, responses to questions concerning the quality of teachers, the quality of instruction, curriculum, intellectual life of the school, and personal intellectual growth were combined. For the environmental index, responses to questions concerning the social life, cultural activities, recreation facilities, and adequacy of services and other facilities were combined.

46. Cabrera, Nora, and Castaneda (1993) ran into similar difficulties with a satisfaction measure of college finances, although the question on their survey involved total financial support ("I am satisfied with the amount of financial support [grants, loans, family assistance, and jobs] I have received while attending . . . ") and not costs. In an integrated persistence model, they found that the effects of this type of satisfaction were indirect, and expressed through academic integration and GPA. Even then, the structural coefficients of these effects were rather weak (. 138 for academic integration and .104 for GPA).

47. White, African-American, and Asian-American students have very similar enrollment rates in community colleges, no matter how those rates are represented. In the following table, with three "parsings, " it is obvious how much Latino attendance patterns differ from the others, and how much the community college means to Latinos communities:

Proportions of students attending community colleges under three different notions of "attendance," by race/ethnicity, High School & Beyond/Sophomores, 1982-93

The Only
White 39% 50% 25%
Black 39 51 24
Asian 36 55 20
Latino 54 66 40

V. Conclusion: The Tool Box Story [Table of Contents] References