Archived Information
Answers in the Tool Box: Academic Intensity, Attendance Patterns, and Bachelor's degree Attainment — June 19991. National Center for Education Statistics, High School & Beyond Sophomore Cohort: 198092, Postsecondary Transcripts(NCES 98135). The CD includes not only the postsecondary transcript files, but also the high school transcript files, approximately 200 studentlevel variables constructed from the survey data, a labor market experience file, and an institutional file. The data from these selected files, while sufficient for most analyses of the life course histories of this cohort, can be merged with thousands of other variables on the original (1995) version of the HS&B/So restricted data set.
2. Surveys of the group were taken in 1980, 1982, 1984, 1986, and 1992. Postsecondary transcripts were gathered in 1993. Only four percent of the cohort was enrolled in postsecondary education in 1993, so for most students in the sample, the history ends at age 28/29 in 1992. A small number of students in the sample died between 1980 and 1992. The analysis file used in this study excludes those who passed away prior to 1984 on the grounds that, in terms of educational histories, they did not have the chance to complete degrees.
3. The BPS8994 is an "event cohort" study, not an age cohort study. The initial group consisted of a national sample of people who were true firsttime postsecondary students in the academic year, 19891990. These students ranged in age from 16 to over 50. The data collected for this group included neither high school nor college transcripts.
4. For example, during the five years of the Beginning Postsecondary Students Study (19891994), 18 percent of participants moved from dependent to independent status, 19 percent experienced a change in marital status, and 9 percent added children to their household (14 percent already had children when they started postsecondary education in 1989). Source: Data Analysis System, BPS90.
5. In most institutions of higher education, and certainly at a state university, the 3rd semester of calculus assumes that the student has previously studied elementary functions and analytic geometry. If one is placed in the 5th semester of collegelevel Russian, one can assume, at a minimum, prior study at the 4th semester level.
6. See, for example, Fernand Braudel's essay, "History and the Social Sciences," in Braudel (1980), pp.2554.
7. Of the HS&B/So students who took more than two remedial courses in college, the Census division x urbanicity of high school cells in which one finds the largest proportions (and in relation to their share of the origins of all postsecondary students) were:
Proportion of Remedial Students 
Proportion of All Students 

Division x Urbanicity  
South Atlantic, suburban  10.4%  6.8% 
East North Central, suburban  9.7  12.2 
Pacific, suburban  9.6  8.3 
MidAtlantic, suburban  6.8  9.8 
MidAtlantic, urban  5.9  3.8 
East North Central, urban  5.4  3.9 
South Atlantic, rural  5.1  4.8 
West South Central, rural  4.5  3.1 
The lessons of such detail are not only that suburban high schools can be significant producers of remedial students, but that specific Region x Urbanicity configurations are overrepresented in the origins of remedial students: South Atlantic and Pacific suburban schools and East North Central and MidAtlantic urban schools.
8. Given the 1,000 + course taxonomy in 77ze New College Course Map and Transcript Files (U.S. Department of Education, 1995), the following illustrates what is/is not included in the aggregate category of "remedial courses." The examples are generalized. Institutional credit and grading policies help determine what is "remedial" in any given case. If, for example, in institution X, "Grammar and Usage" is indicated as a nonadditive credit course with a nonstandard grade (e.g. "Y"), it would be classified as remedial. If the same title appeared on a student record as a junior year course for someone who had previously taken courses in Shakespeare and creative writing and the credits were additive and the grades standard, the course would have been classified under linguistics. There were over 300,000 course entries in the HS&B/So transcript sample. Every entry was examined in this manner.
Included in the "Remedial" Aggregate  Not Included 
Basic Skills: Student Development  Student Orientation 
Basic Academic Skills  Library Skills/Orientation/Methods 
Remedial English; Developmental English, Punctuation, Spelling, Grammar, Basic Language Skills, Grammar and Usage 
Reading & Composition, Exposition 
Basic Writing, Writing Skills  Academic Writing, Informational Writing 
Remedial/Basic Speech, Basic Oral Communication, Listening Skills 
Fundamentals of Speech, Speech Communication, Effective Speech 
Basic reading, Reading Skills, Reading Comprehension 
Speed Reading, Reading & Composition 
Business Math: PreCollege, Business Business Arithmetic, Business Computations, Consumer Math 
Math for Business/Econ, Math for Finance, Business Algebra 
Arithmetic  Number Systems/Structures 
PreCollege Algebra  Algebra for Teachers 
9. We can drive home the point even further by comparing student accounts from BPS to transcript accounts from the HS&B/So in the matter of the type of remedial courses taken:
BPS(student)  HS&B/So (transcript)  
Any Remedial Mathematics  8.6%  33.7% 
Any Remedial Reading  7.4  11.2 
Hypothesis: students are more likely to know that they are in remedial reading than in precollegiate level mathematics.
10. To cite all the major studies that collapse high school curriculum in this manner would consume a dozen pages.
11. The procedure involves, first, matching all existing cases of students who show both SAT/ACT and senior test scores on their records, and determining the percentile on the senior test score that matched the median score on the SAT. It is not surprising that this percentile (54th) is higher than the mean for the senior test since the SAT/ACT testtaking population has been filtered by collegegoing intentions. The second step in imputing senior test score percentile from SAT and ACT scores is to call the median for each test the 54th percentile and to distribute the rest of the SAT/ACT scores in terms of percentiles.
12. The average was based on grades in nonremedial mathematics courses, English, all science courses, foreign language, history, and social studies. Grades in fine and performing arts and vocational courses were not included.
13. Faced with a similar situation, and because they had a much smaller sample and wished to reduce the number of missing cases to a minimum, Alexander and Eckland (1973) used a regression weight in the opposite direction: from the school principal's report of the student's class rank, in quintiles, to the student's selfreported "grade averages. "
14. The five variables involve different combinations of Carnegie units in high school subject matter as follows:
NWBASIC1  NWBASIC2  NWBASIC3  NWBASIC4  NWBASIC5  
English  4.0         
Mathematics  3.0  3.0  3.0  3.0  2.0 
Science  3.0  3.0  3.0  3.0  2.0 
Social St.  3.0  3.0  3.0  3.0  3.0 
For. Langs.  2.0    2.0     
Comput Sci.  0.5  0.5       
For four of these variables, we have no indication of how much English is involved; and for three of them, no indication of how much foreign languages or computer science may be involved. Taking these constructions at face value, there is no hierarchical difference between NWBASIC2 and NWBASIC3. In fact, while one can set NWBASIC1 aside as an ideal, the only claim to a hierarchy is implicit in the numbering of the combinations. Those interested in the frequency counts for these variables can view them in the public release Data Analysis System (DAS) for High School & Beyond/Sophomore cohort included on the National Center for Education Statistics' CD#98074.
15. The categories of mathematics available on a high school transcript sample from the late 1960s were: Algebra 1, Algebra 2, geometry, trig, calculus, general math (1,2,3, and 4), applied math (1 and 2), advanced math, and math not elsewhere classified (see Pallas and Alexander, 1983, p. 175). This is a very difficult list to configure in a categorical variable with intervals that clearly delineate a hierarchy. By the time the HS&B/So cohort was in high school a decade later, precalculus was a standard high school offering, prealgebra courses were clearly identified, and statistics was specified, i.e. there was a great deal more specificity on the HS&B/So high school transcripts, and one could construct an HIGHMATH variable with the following values: Calculus, PreCalculus, Trigonometry, Algebra 2, Geometry, Algebra 1, PreAlgebra/General Math I and 2/Arithmetic, and Indeterminable.
16. The early drafts of this monograph included a separate section in which the Academic Resources construction was replicated and tested using a newlyedited version of the NLS72 high school records, and in which Altonji's work was described in more detail. At the advice of reviewers, this section was set aside for separate publication.
17. There are two versions of the high school transcripts in the HS&B/So data file, each based on a slightly different coding system. The HSTS version, on which this study relies, did not include an accounting for remedial English. " The CTI version of the transcripts was merged for this variable. Fractional credits (less than 0.5) that were labeled "remedial English" in the CTI version of the transcripts were not deemed remedial since the "courses" at issue included tutorials and workshops, and these do not necessarily mean developmental work.
18. A criterion of 0.5 or more credits of computer science, as a dummy variable, was added at eight points along the scale to disaggregate lumps in the distribution. And at four points along the scale, credits earned in mathematics or science were added for the same purpose.
19. Pallas and Alexander (1983) attributed about 60 percent of the gap between men's and women's scores on the SATQ "to the sparse quantitative programs of study typically pursued by girls in high school". The High School & Beyond/Sophomore data, however, do not show much of a divergence in the highest level of mathematics studied by men and women in high school, nor a significant divergence in composite senior year test scores by highest level of mathematics:
Proportion Who Reached This Level of Mathematics in High School 
Proportion Scoring in the Highest Quintile of the Senior Test Composite 

Men  Women  Men  Women  
Calculus  5.3  4.1  82.3  81.2 
PreCalculus  4.7  3.9  66.5  63.7 
Trigonometry  9.0  7.8  51.9  48.3 
Algebra 2  21.3  23.9  31.0*  26.9* 
< Algebra 2  59.7  60.3  6.9  5.8 
20. Alexander, Riordan, Fennessey and Pallas define "high academic resources" as one standard deviation above the mean on ability (the senior test) and class rank, both within a college preparatory high school curriculum (p. 325). The principal problem with this formulation lies in class rank, since the variable is computed withinschool, whereas the other components are not.
21. Using the 1994 survey of the NELS88 cohort, i.e. two years after scheduled high school graduation, Berkner and Chavez (1997) first examined the precollegiate records of all students who said they had attended a 4year college as of that date. They then judged all NELS88 students with reference to the profiles of those who had entered 4year colleges. Students were judged to be "college qualified" if their records evidenced at least one value on any of five criteria that would place them among the top 75 percent of 4year college students for that criterion. The minimum values for "qualified" were: a class rank of the 46th percentile, an academic GPA of 2.7, an SAT combined score of 820, an ACT composite score of 19, or a NELS88 test score (roughly the same test as used for the HS&B/So) of the 56th percentile. Curriculum in the form of the NWBASIC1 variable (see p. 13 above) was used to adjust degrees of "qualification," that is, it played a secondary role despite data that show it to be of primary importance. ACRES, in contrast, does not judge students with reference to isolated criteria, rather provides an analytic indicator of the general level of academic resource development toward which students can reach.
22. The most simplistic line of these inquiries use data from the Current Population Surveys of the Census Bureau despite the ambiguities in the way Census asks questions about "college" enrollment and (until 1992) attainment. The ambiguity produces extraordinarily volatile yeartoyear enrollment rates, particularly by race, though analysts usually ignore the volatility. Other problems with time series in Census data involve the 1992 division of the question concerning secondary school completion into two categories (diploma and equivalency) and the fact that the Current Population Surveys do not contain information on immigration status (Census focuses on college enrollment for the noninstitutionalized civilian population age 1824, and this group includes people who attended primary and/or secondary school in other countries). There are simply too many crosscurrents and too much imputation in the data collection methodology of Census to rely on this source for precise estimates of college access or even degree completion (Pelavin and Kane, 1990). Other benchmark data assembled by the American College Testing Service, for example, come far closer to the longitudinal studies estimates and evidence far less volatility (see Digest of Education Statistics, 1997, tables 183 and 184, pp. 194195).
23. The NELS88 longitudinal study reminds those who tend to forget the importance of this factor in initial choice of college: 71 percent of high school seniors in 1992 cited location as a primary factor in choice. Source: Data Analysis System (DAS), National Education Longitudinal Study of 1988.
24. This was true particularly for those of Hispanic background. The cultural tone of persistence decisions in this population is very difficult to model, but critical to acknowledge. None of the national longitudinal studies account for the role of family and significant others after initial access to higher education. There is no reason to believe that a student will offer anything but an honest assessment to the true/false statement, "my family encourages me to continue attending this institution" (Cabrera, Nora, and Castaneda, 1993), but the relative power of the attitudes behind the statement may be very different for students in community colleges compared with those in 4year colleges compared with those attending more than one institution.
25. McCormick's study excluded students who attended 4year colleges but began their careers in other types of institutions (26 percent of all 4year students and 20 percent of the bachelor's degree recipients in the HS&B/So), as well as 4year college students whose 12th grade educational aspirations were less than a bachelor's degree (29 percent). McCormick also excluded credits earned by examination, credit equivalents of clockhour courses, credits earned at less than 2year schools, and credits earned before high school graduation. Some of these exclusions are unfortunate, but they should not detract from an instructive exposition.
26. Confining their interest only to completion at the first institution of attendance, the purpose of Astin, Tsui and Avalos' study was to demonstrate the difference between predicted and actual institutional graduation rates. The model is worth visiting. Starting with selfreported high school grades, and with stepwise feeding of SAT scores, gender and race, these authors found the adjusted R2s increased from .281 to .325 in predicting 9year completion rates within an institution (with gender adding a small amount to the R 2 and race adding almost nothing). What does that mean and how do we judge the results? An R 2 of .325 means that the model accounts for about onethird of the variance in what happened to this population. Given all the intervening behaviors of a 9year period, onethird of the variance is a very strong number. As we will see, however, models that transcend the boundaries of a single institution are both more persuasive and produce even stronger estimates.
27. However, the proportion varies widely by type of true first institution of attendance and by combinations of institutions attended. For example, 24 percent of those whose first institution of attendance was a community college indicated they had been parttime students versus 9.8 percent of those who first entered comprehensive colleges. Among those who attended only 4year colleges, 6.5 percent indicated parttime status at some time during their undergraduate careers, versus 20 percent of those engaged in alternating or simultaneous enrollment in 4year and 2year colleges.
28. Source: Data Analysis System (DAS), BPS90.
29. Unfortunately, Berkner, CuccaroAlamin, and McCormick did not make use of one of the most important filtering variables in the BPS90, namely the question of whether the respondents categorized themselves as students who happen to be employed (63.7%) or employees who happen to be students (36.3%). The distinction ripples through the entire dataset and any interpretation of student careers. Here are some examples of how the differences in primary status played out in the first institution of attendance (19891990):
Percent Primarily Students 
Percent Primarily Employees 

ALL  63.7  36.3 
Level of First Institution  
4year  77.5  22.5 
2year  49.1*  50.9* 
< 2Year  38.0  62.0 
Degree Working Toward  
None  23.9  76.1 
Certificate  37.4  62.3 
Associate's  53.8  46.2 
Bachelor's  76.2  23.8 
Enrollment Intensity  
FullTime  73.6  26.2 
PartTime  31.6  62.0 
Source: Data Analysis System (DAS), BPS90. * Not a statistically significant difference.
30. Hearn (1992) also warns us "to be suspicious of the measurement properties [validity, reliability] of aspirations, plans, and expectations indicators when the data are from the responses of middle or late adolescents." (p. 661).
31. Morgan (1996) translated the categorical variable of educational aspirations into years of schooling (e.g. a master's degree was worth 18 years, 6 of which were postsecondary). His notion was that if there were still differences in educational expectations of subgroups after controlling for SES (which he standardized to a scale in which the mean= 0 and SD= I in order to merge the HS&B/So and NELS88 cohorts), then we have to look elsewhere to explain the residual. With this methodology, Morgan found that, net of dropouts, 29 percent of the students increased their expectations and 26 percent lowered their expectations between grades 10 and 12. This is a much less dramatic change than that revealed by the minimum educational level satisfaction questions treated as categorical variables.
32. The family income variable in the HS&B/So (as well as most other national data sets) is equally as tenuous as parental levels of education, but may be more important in analyses of college going, persistence, and completion. I chose to use the family income file for the HS&B/So base year (1980) prepared as a byproduct of a report to NCES estimating families' capacity to finance higher education for their children (Dresch, Stowe, and Waldenberg, 1985). The "Dresch file" examined all attendant features of family and student, and removed outlying cases. The analytic problem with the "Dresch file" is that it reports income in eight unequal bands. The distribution appears reasonable, but there are no means here, no standard deviations, and no regular intervals on a continuous scale. When set on a grid against SES quintiles, we lose nearly 25 percent of the HS&B/So universe. As Sewell and Hauser (1975) effectively demonstrated, noneconomic aspects of stratification are more important than the economic. So one should stick with the composite rather than lose such a large proportion of the sample.
33. NELS88 provides both student and parent accounts, but 15 percent of the parents did not provide information on their highest levels of education, and 23 percent skipped the question on family income.
34. There were only 414 cases out of 14,825 in the data base that could be edited in this manner. In weighted numbers, the results indicate that we have been overestimating first generation college status by a minimum of 4 percent.
35. For students of "typical college age" in the Beginning Postsecondary Students Study of 19891994, we find the following with respect to factors in choosing the first institution of attendance, by general type of institution the student actually entered:
4Year College  2Year College  < 2Year  
School Close to Home An Important Factor in Choice 
59.4%  75.4%  50.1% 
First Institution Was 50 or Fewer Miles from Home 
45.7%  90.0%  78.0% 
Source: National Center for Education Statistics: Beginning Postsecondary Students, 19891994, Data Analysis System.
36. Transcript practices with respect to study abroad are highly variable, and require handandeye reading to identify. Some may be highly explicit, e.g. "University of Heidelberg" or " Monterrey Semester. " Others may use an abbreviation, e. g. " Bensacon, " and the transcript reader simply has to know that a major French language training center is being referenced. In still other cases, the sequence of courses for a history major shifts in the spring term to "Art and Architecture of Florence," "The Age of Machiavelli," "Advanced Italian Conversation," Il Paradiso, and "Antonioni and Italian Cinematic Realism." The reader knows that it is highly unlikely that the student is attending school in the United States.
37. How does this estimate stack up against current (1999) claims of massive numbers of bachelor's degree students doing postbaccalaureate work for credit in community colleges? No one has conducted a complete census, let alone a census with unduplicated headcounts. But let us speculate. The HS&B/So is only one high school graduating class. Assume that there are ten high school graduating classes "in play" at the present moment, and that in each class of eventual bachelor's degree recipients there has been a 10 percent increase in the proportion attending a community college after the BA. This fairly generous set of assumptions results in an estimate of about a halfmillion credit students, or 9 percent of community college students currently enrolled for credit.
38. The outcome effects of selectivity have been remarkably consistent for two generations of college graduates: both the NLS72 and HS&B/So:
NLS72 197284 
HS&B/Sophomores 197284 198293 

Graduate School Attendance by Age 30  
Highly Selective  39.7%  43.0%  
Selective  34.6%  31.0  
NonSelective  23.1  17.9  
Mean (S.D.) Undergraduate GPA  
Highly Selective  3.16 (.51)  3.13 (42)  Effect size = .07 
Selective  3.01 (.50)  2.96 (.45)  Effect size = .11 
NonSelective  2.92 (.48)  2.86 (.42)  Effect size = .13 
NOTES: (1) The universes are confined to students who earned bachelor's degrees and for whom an undergraduate GPA could be determined. NLS72 weighted N = 732k; HS&B/So weighted N = 935k. (2) Differences in estimates of graduate school attendance rates are significant at p<.05. (3) Effect sizes for changes in mean GPA indicate no change. SOURCE:NCES: (1) National Longitudinal Study of the High School Class of 1972; (2) High School & Beyond/Sophomore Cohort, NCES CD#98135.
39. The proportion of all undergraduate grades that were drops, nopenalty withdrawals, and nopenalty incompletes rose from 4 percent for the NLS72 cohort to over 7 percent for the HS&B/So.
40. The variable, HSBSTAT, on the 1998 restricted CD release dataset, divides the entire universe of HS&B/So students into five groups with reference to their postsecondary status. Of these groups, only three are in the potential universe for analysis: (1) students for whom postsecondary transcripts were received (8215, of whom 14 had died by 1986 and are not included), (2) students for whom transcripts were received, but the content of the transcripts was entirely GEDlevel or basic skills work (180, of which very few had attended 4year colleges), and (3) students for whom transcripts were requested but not received, yet for whom the evidence allows imputation of college attendance (478). In the basic universe for multivariate analyses, the expansion group comes from the third of these categories. Relying on survey data, it was possible to determine the order of institutional attendance for students in the third of these groups, hence values for variables such as Transfer and NoReturn.
41. Of all students who entered postsecondary education, 7 percent of the men and 16.2 percent of the women became parents by 1986. Among students who attended 4year colleges at any time, the figures were 3.3 percent (men) and 7.5 percent (women).
42. Had I not edited the "Children" variable to exclude contradictory and outofscope information, its contribution to the explanatory power of the model, as well as the overall explanatory power of the model, would have been 0.2 percent higher. What this tells us, albeit very indirectly, is that the people who provided contradictory or outofscope information about having children probably did not complete bachelor's degrees.
43. The statistical software packages commonly used for regression models (SAS, SPSS, STATA, and others) first set up all the variables in a correlation matrix. Only those independent variables whose correlations with the dependent variable (in our case, bachelor's degree completion) are statistically significant at p<.05 are allowed into the regression equation. p<.05 is a default value. One can change this criterion to be more or less generous. I have chosen to be more generous throughout the models in this study by setting the selection criterion to p<.20. The reader will note that, under those conditions, variables of marginal significance will not hold up when they are asked to take on an explanatory role.
44. The California State University at Fullerton is not the California State University at Monterey Bay. Queens College of the City University of New York is not the John Jay College of Criminal Justice (also part of CUNY). Even a multicampus institution such as Southern Illinois University presents different environments for students who move among the campuses.
45. For the academic satisfaction index, responses to questions concerning the quality of teachers, the quality of instruction, curriculum, intellectual life of the school, and personal intellectual growth were combined. For the environmental index, responses to questions concerning the social life, cultural activities, recreation facilities, and adequacy of services and other facilities were combined.
46. Cabrera, Nora, and Castaneda (1993) ran into similar difficulties with a satisfaction measure of college finances, although the question on their survey involved total financial support ("I am satisfied with the amount of financial support [grants, loans, family assistance, and jobs] I have received while attending . . . ") and not costs. In an integrated persistence model, they found that the effects of this type of satisfaction were indirect, and expressed through academic integration and GPA. Even then, the structural coefficients of these effects were rather weak (. 138 for academic integration and .104 for GPA).
47. White, AfricanAmerican, and AsianAmerican students have very similar enrollment rates in community colleges, no matter how those rates are represented. In the following table, with three "parsings, " it is obvious how much Latino attendance patterns differ from the others, and how much the community college means to Latinos communities:
First Institution Attended 
Ever Attended 
The Only Institution Attended  
White  39%  50%  25% 
Black  39  51  24 
Asian  36  55  20 
Latino  54  66  40 
Note: Comparisons of Latino estimates with others are significant at p <. 05; comparisons among other race/ethnicity groups are not significant. Source: National Center for Education Statistics: High School & Beyond/Sophomore cohort. NCES CD #98135.