Archived InformationAnswers in the Tool Box: Academic Intensity, Attendance Patterns, and Bachelor's degree Attainment June 1999
There are many tables in this document, both in the text and in the notes. Some are derived or constructed from other published sources or from the Data Analysis System (DAS) presentations of data sets published on CD-ROM by the National Center for Education Statistics. But most of the tables in this publication were prepared using special analyses files created from the High School and Beyond/Sophomore Cohort (HS&B/So) longitudinal study, and it is helpful to know something about the statistical standards that lie behind these tables and the decision rules that were used in presenting the data.
The populations in all NCES age-cohort longitudinal studies are national probability samples first drawn when the students were in high school or middle school. In the case of the HS&B/So, the design involved first, a stratified sample of secondary schools with an over-sampling of schools in minority areas, and a random sampling of 10th grade students within those schools. The original sample was then weighted to match the national census of all 10th-graders in 1980 (about 3.7 million people). Each participant carries a weight in inverse proportion to the probability that he or she would be selected by chance. The HS&B/So base year sample was what statisticians call "robust": 28,000. After the base year, every subsequent survey was a subset of the original, and the weights carried by participants are modified accordingly. In the penultimate survey of the HS&B/So in 1992, there were 12,640 respondents out of 14,825 surveyed (of whom 155 had died). The postsecondary transcript file for the HS&B/So has 8,395 cases, and the special version of the transcript file used in this study has 8,873 cases. These are still very robust numbers. They represent populations in the millions. By the conclusion of any of these longitudinal studies, a student is carrying a half-dozen different weights, depending on what question is asked.
For the High School and Beyond cohort, for example, I used three basic weights in the tables in this study: a "senior year" weight for a question such as the relationship between the highest level of mathematics studied in high school and whether someone eventually earns a bachelor's degree; a "primary postsecondary transcript weight" for analyses of degree attainment for anyone for whom the evidence indicates attended a 4-year college; a "secondary transcript weight" for any question that would be compromised if students with incomplete postsecondary transcript records were included. Where correlation matrices and multivariate analyses are involved, each of these basic weights must be modified by the population that possesses positive values for all variables in an equation.
For example, the basic senior year weight for a group of students confined to those who graduated from high school on time and for whom we have SES data, known race, and all three components of ACRES (senior year test, class rank/GPA, and curriculum intensity & quality), would be modified as follows:
Weight2=senior year weight/(2320762/8844)
The numbers in parentheses are the weighted N and raw N derived from a simple cross-tabulation of any two of the variables in this set, using the senior year weight. Weight2 is what is carried into a correlation matrix for students with this set of variables. In the execution of the correlation matrix itself, I did not allow for pairwise missing values. The software program is instructed, "NOMISS." This decision derives from the historical approach to the data as discussed in the "Introduction" to this monograph.
More important are issues of standard errors of measurement and significance testing. What you see in the tables are estimates derived from samples. Two kinds of errors occur when samples are at issue: errors in sampling itself, particularly when relatively small sub-populations are involved, and non-sampling errors. Non-sampling errors are serious matters. Good examples would include non-response to specific questions in a survey or missing college transcripts. Weighting will not address the panoply of sources of non-sampling errors.
The effects of sampling and non-sampling errors ripple through data bases, and, to judge the accuracy of any analysis, one needs to know those effects. When the unit of analysis is the student, this is a straightforward issue. When we ask questions about combinations of institutions attended, bachelor's degree completion rates by selectivity of first institution of attendance, or highest level of mathematics studied in high school, we are asking questions about non-repetitive behaviors of people who were sampled. To judge comparisons in these cases we use the classic "Student's t" statistic that requires standard errors of the mean. But because the longitudinal studies were not based on simple random samples of students, the technique for generating standard errors involves a more complex approach known as the Taylor series method. For the descriptive statistics in this report, a proprietary program incorporating the Taylor series method, called STRATTAB, was used.
It is important to note that STRATTAB will provide neither estimates nor standard errors for any cell in a table in which the unweighted N is less than 30. For those cells, the program shows "LOW N." Table 20 on page 45 illustrates the frequency of LOW N cells that occur when one is making multiple comparisons among categories of an independent variable.
Most of the tables in this monograph include standard errors of the estimates and/or an indication of which comparisons in the table are significant at the p<.05 level using the classic "Student's t" test. The text often discusses these cases, and, when appropriate to the argument, offers the t statistic. A reader interested in comparing categories of a dependent variable that are not discussed can use the standard errors and employ the basic formula for computing the "Student's t":
where P1 and P2 are the estimates to be compared and se1 and se2 are the corresponding standard errors. If, in this case,t>1.96, you have a statistically significant difference such that the probability that this observation would occur by chance is less than 1:20. In the case of multiple comparisons, the critical value for t rises following the formula for Bonferroni Tests: if H comparisons are possible, the critical value for a two-sided test is Z (1-.05/2H). For a table showing the Z required to ensure that p<.05/H for particular degrees of freedom, see Dunn, O.J., "Multiple Comparisons Among Means," Journal of the American Statistical Association, vol. 56 (1961), p. 55.
In multivariate analyses of a stratified sample such as any of the NCES longitudinal studies, it is necessary to adjust the standard errors produced with the software package (SPSS or SAS) by the average design effect, or DEFT (see Skinner, Holt, and Smith, 1989). Software packages such as SPSS or SAS assume simple random sampling when computing standard errors of parameter estimates. The DEFT for any population is computed by dividing the simple standard error by the Taylor Series error. The design effects for the HS&B/So populations used in the tables of this study range from 1.49 to 1.64.
These design effects are then carried over into the determination of significance in correlation matrices and regression analyses.
For correlation matrices, the formula for the standard two-tailed t-test thus becomes:
Step 1: Determine the adjusted standard error of the correlation coefficient (r).
Step 2: Determine t = r / s.e.(r)
For regression analyses, the formula for the standard two-tailed t-test thus becomes:
In recent years, a variety of analyses prepared for the National Center for Education Statistics have employed a strategy of adjusting estimates by covariation among control variables in a given table. Under this strategy, each value of a variable is turned into a dichotomy and regressed on all the other variables under consideration. The parameter values in the regression equation are then used to adjust the estimate. The result basically says, "If the students in the dependent variable evidenced the same configuration of values on the independent variables as everybody else, this is the estimated percentage who would do X."
In the course of this monograph, many studies employing this procedure have been cited (e.g. Cuccaro-Alamin and Choy, 1998; Horn, 1997; Horn, 1998; McCormick, 1997; McCormick, 1999). Given the fundamental question addressed in this study, I have chosen not to invoke the "adjustment of means" procedure, rather to let the series of regression equations in Part V tell the story. I do so because in the world subject to practical intervention by the "tool box," people do not evidence the same configuration of characteristics or behaviors as everybody else, and the messages one might convey on the basis of adjusted percentages might not be helpful.
For example, in one table, Horn (1997, p. 42) looks at the relationship between high school mathematics course sequence and college enrollment (access) for students with one or more "risk factors" in the NELS-88 cohort. An excerpt from this table should dramatize the case.
of Students Enrolling
Education by 1994
of Students Enrolling
Education by 1994
| High School Math
|Algebra I and Geometry||63.1||70.7|
|At Least One Advanced Course||90.8||73.9|
The message to "at risk" students of the "adjusted percentage" is that the highest level of mathematics one reaches in high school really doesn't make that much of a difference in college access. In the real (unadjusted) world, precisely the opposite occurs. Now, which message should we be delivering to all students and particularly to those "at risk"? Enough said.