The following article appeared in Telicom, the Journal of the International Society of Philosophical Inquiry.

Mostly About the SAT as an IQ Test

by Bob Williams

 

In the August 1999 issue of Telicom (page 10), Brien Donlan made some comments which I believe deserve some response. The context is set at the turn of the century. He wrote: "Rather, the intelligent people (men in virtually all the cases) followed careers in politics, government services, the Church or became entrepreneurs." This comment and the general discussion seemed to imply that women were not as intelligent as men. Perhaps my reading of the comment is in error, but I believe that there is no reason to conclude that the average intelligence of women has ever been lower than that of men. I also wonder how Mr. Donlan determined the professions that were selected by intelligent men at that time. Missing from his list are professions such as science, engineering, mathematics, law, medicine, and the arts. Are we to believe that these were then pursued only by unintelligent men or by women?
 

Donlan: "Your observation that the average I.Q. of teachers and physicians has dropped is not reflected in the research into intelligence testing." This may be true or not, but for some reason, Mr. Donlan did not provide us with a reference to this research. If there is credible scientific research to support his contention, where is it?
 

The very next thing he writes: "In fact the Wechsler Intelligence Scale for Children has been revised five or six times in the last sixty odd years to accommodate the continually rising I.Q. scores of American children." It seems to me that this is a bit of a logical stretch. Yes, IQ tests (not just the WISC) have been renormed to adjust for the Flynn effect, but it is rather obvious that the WISC is appropriate for testing children, not teachers and physicians. Even if one takes this comment as proof that the average IQ has risen over the past 60 years, it does not follow that the average IQs of physicians and teachers has either increased or decreased.
 

Donlan: "...but the fact remains that students are getting brighter as measured by commonly accepted testing standards. On the other hand, achievement is declining as shown by the falling scores each year on the Scholastic Aptitude Tests given to high school graduates in the U.S. each year. This is a clear example of the differentiation between the intelligence measured on I.Q. tests and the academic ability measured on achievement tests." It was this comment that inspired me to write this article and is the subject of my remaining comments. My comments are not intended so much to dispute Mr. Donlan's claim (although I will do that), as to address commonly misunderstood aspects of the SAT and the relation of the SAT to standard IQ tests.
 

Are SAT Scores Really Declining?
 

Students are scoring higher on IQ tests, as we all know and as we have discussed in Telicom in the past. In my article "The Flynn Effect" (May/June 1999), I pointed out that the Flynn Effect gains primarily affect the lower half of the IQ distribution curve. It seems inappropriate to make any comparison to the shift in average IQ scores and the SAT without first accounting for the variations in the behaviors of the parts that make up the whole distribution of test results. (More on this later.)
 

An example of the effects of the size of the test population on average SAT scores can be seen by examining the 1999 test results on a state by state basis. Overall, 43% of high school seniors took the test. In states where the percentage was low, one would expect to see relatively high average state scores because the least intelligent students probably did not take the test. Here are some examples:(1)
 

[The first column shows the average verbal score; the second, average math score; and the third, the percentage of high school graduates tested.]
 

Ark. 563 556 6 D.C. 494 478 77
Iowa 594 598 5 Ga. 487 482 63
Mo. 572 572 8 Fla. 499 498 53
Neb. 568 571 8 Ind. 496 498 60
N.D. 594 605 5 N.J. 498 510 80
S.D. 585 588 4 N.Y. 495 502 76
Wis. 584 595 7 S.C. 479 475 61
When an IQ test is renormed, it is done by considering the entire population as reflected in a statistically random sample of individuals. The SAT, however, is not given to a random group of students, it is given to those who want to apply for admission to a college.(2) If the SAT is taken by only a few bright kids, the average score will be relatively higher than if it is taken by larger numbers of lower IQ college applicants, as is clearly illustrated by the preceding data. In years when the average SAT score fell, this phenomenon accounted for most of the fall.(3)
 

That this phenomenon is an embedded one (within the distribution of scores) can be seen by examining the PSAT data. Unlike the SAT, the PSAT is given to all high school juniors.(4) Even during periods when the average SAT scores were in decline, the PSAT scores did not decline.(5) The implication is that any losses by college bound students were offset by the other students (the same ones that have seen the greatest Flynn Effect score gains).
 

A further examination of the components of the distribution of SAT scores reveals that the top scores are increasing. The number of students scoring above 650 has increased by 65% over the period 1941 to 1994.(6) The point is that an examination of SAT average scores cannot be properly interpreted unless the details of the yearly tests are examined and compared to identical segments in prior years.(7)
 

But, Mr. Donlan seems to imply that the SAT has been falling for a long time, including recently. That simply is not true. Comparing SAT scores for 1989 and 1999, the following groups have experienced increases in both verbal and math scores: American Indian/Alaska Native, Asian/Pacific Islander, Black, Puerto Rican, and White. Only Latino (excluding Mexican American and Puerto Rican) and Mexican Americans had a net decrease over the period.(8)
 

What does the SAT measure?
 

Mr. Donlan says there is a difference between IQ tests and the SAT. There may or may not be a difference, but if there is a difference it is slight. The Bell Curve:(9) "The SAT was originally designed to be an intelligence test targeted for the college-going population and was originally validated against existing intelligence tests. For a modern source showing how carefully the College Board avoids saying the SAT measures intelligence while presenting the evidence that it does, see Donlon 1984."(10) In fact, the test was developed by Princeton professor Carl Brigham, who had been one of the Army I.Q. testing team during the first world war. One of its first applications was by Harvard president James Bryant Conant in his establishment of the Harvard national scholarship program. He was looking for a way to find and admit capable students from parts of the U.S. where the university would not otherwise have looked. Newsweek reports: "There was one point about it on which Conant repeatedly demanded reassurance: was it a pure test of intelligence, rather than of the quality of the taker's education? Otherwise he was concerned that bright boys who had been born into modest circumstances and gone to poor schools would be penalized." Only after being convinced that the SAT was a pure intelligence test did Conant implement its use.(11)
 

Consider the correlations between various standard tests and the WAIS:

WAIS to Stanford Binet = 0.77

WAIS to Raven's = 0.72

WAIS to Otis = 0.78

WAIS to SAT = 0.80(12)
 

The designers of the SAT benchmarked it against the Otis;(13) the similarity of correlations between the SAT and the WAIS was no accident.(14) It is no wonder that high IQ societies (including Mensa, Intertel, ISPE, and TNS) have accepted the pre-1994 SAT as proof of membership qualification. TNS is presumably going to continue to accept it, with an adjusted score (to compensate for recent tinkering).
 

What should the SAT measure?
 

Francis Carter: "Psychometricians call 'IQ' tests aptitude tests. They call measures of accomplishment of specific abilities proficiency tests (e.g., licensing tests for pilots) or achievement tests (e.g., the final exam for Algebra 101)."(15) IQ tests are generally capable of measuring psychometric g, group factors,(16) and specificity. Anything not falling into one of these categories is error. If the SAT measures g, it is measuring the most reliable and useful parameters for predicting success in a wide range of areas, such as job performance and academics.(17) The measurement of group factors, completes that portion of IQ measurement which is not g. Test item measurement of specificity, is a measure of specific learning that is directed at the question that is asked, is not correlated to other items, and is lumped together with measurement error.(18) So, a test of problem solving would contain more of the IQ components and less specificity, unless the task at hand were taught to the point where the problem being solved contains little novelty. Questions concerning dates, names, and other memorized material will be less g loaded and reflect greater specificity. The latter may indeed be more a matter of achievement, for the simple reason that specificity is not a measure of intelligence. The problem is that when an individual learns by rote, the material that is learned is not transferable to any other task and is not a good indicator of future accomplishment, especially in the general sense of academic achievement.
 

There is also the concern that Conant expressed--that is it more unfair to admit students on the basis of the quality of their high school circumstances than on the basis of their innate abilities, as measured by IQ tests. Conant seems to have had a good point and one that is also consistent with well established predictive validity of IQ tests for scholastic achievement.(19) 80 to 90 percent of the predictable variance in scholastic performance is accounted for by g,(20) which is the primary measurement of standard IQ tests. If the purpose of the SAT is to predict future academic success, it can best do that by emulating heavily g-loaded IQ tests.(21) One of the reasons for this is that reading comprehension has a particularly high g-loading.(22) The central role of reading comprehension in learning is self-evident.
 

Conclusions
 




 
 

References
 

Seligman, D. (1994). A Question of Intelligence: The IQ Debate in America. New York: Citadel Press.
 

Herrnstein, R. J. & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in American Life. New York: Free Press.
 

Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Westport: Praeger.
 

Notes
 

1. The New York Times, "Average Math and Verbal SAT Scores" By The Associated Press, August 31, 1999.

2. The pool of students who actually took the SAT has not been consistent from one year to another and has varied not only from an increasingly large pool, but also from college admission requirements. In the mid-60s, many state universities dropped the requirement that applicants submit SAT scores. This caused a decline in the numbers of middle level students who took the SAT, thereby again influencing the average scores in a way that cannot be seen without very close examination of the details. See Herrnstein, R. J. & Murray, C. 1994, P. 426.

3. Jensen 1998, P. 322.

4. ETS conducted "national norm studies" in 1955, 1960, 1966, 1974, and 1983.

5. Seligman, D. 1994, P. 179. Also, Herrnstein, R. J. & Murray, C. 1994, P. 423 "American eleventh graders as of 1983 were, as a whole, roughly as well prepared in both verbal and math skills as they had been when the college-bound SAT scores were at their peak in 1963, and noticeably stronger in their verbal skills than they had been in the first norm study in 1995."

6. Bracey, G. W. (1994). The fourth Bracey report on the condition of public education. Phi Delta Kappan, 76, 115-127. Earlier Bracey reports were printed in the October 1993, 1992, and 1991 issues of the Phi Delta Kappan.

7. The record is even more confusing, depending on the SAT score level examined, the specific years considered, and whether one examines the SAT total, math, or verbal. For a detailed dicing and slicing, see "Inequity in Equity: How 'Equity' Can Lead to Inequity for High-Potential Students" Benbow, C. P. and Stanley, J. C. Psychology, Public Policy and Law, 1996, Vol. 2, No. 2.

8. Source: The College Board; as printed in the Los Angeles Times, September 1, 1999, "Ethnic Gap Widens in SAT College Exam Scores," M. GROVES, R. COOPER.

9. Herrnstein, R. J. & Murray, C. 1994, P. 744

10. Donlon, T.E. 1984. The College Board Technical Handbook For The Scholastic Aptitude Test And Achievement Tests. New York: College Entrance Examination Board.

11. Newsweek, September 6, 1999, "Behind the SAT" By Nicholas Lemann

12. Seligman, D. 1994, P. 167

13. Herrnstein, R. J. & Murray, C. 1994, P. 39

14. Herrnstein, R. J. & Murray, C. 1994, P. 38 "In its first annual report, a Commission appointed by the College Entrance Examination Board provided a table for converting the SAT of that era to IQ scores."

15. "Notes, Quotes, and Anecdotes" by Francis Carter, The Mensa Research Journal, Winter 1999, P. 7

16. For a lengthy discussion of group factors, see Jensen 1998, chapters 3 and 4.

17. Reviewed at length in the three reverences to this article and in a wide variety of other publications.

18. Jensen 1998, P. 34 Also, see page 111: "By definition a given task's specificity lacks the power to predict performance significantly on any other tasks except those that are very close to the given task on the transfer gradient."

19. Jensen 1998. P. 227. Notes that Psychological Abstracts contains some 11,000 citations to support this claim.

20. Thorndike, R. L. (1984). Intelligence As Information Processing: The Mind and the Computer. Bloomington, IN: Center on Evaluation, Development, and Research.

21. Jensen 1998. P. 182 "There is no better predictor of scholastic achievement than psychometric g, even when the g factor is extracted from tests that have no scholastic content."

22. Thorndike, E. L. (1917). Reading as Reasoning: A Study of Mistakes in Paragraph Reading. Journal of Educational Psychology, 8, P. 323-332.

23. Herrnstein, R. J. & Murray, C. 1994, P. 427 note that although educational intervention does not raise IQ, "it may be within the capability of the educational system--probably with the complicity of broader social trends--to put a ceiling on, or actually dampen, the realized intelligence of those with high potential."


Home