Intelligence and g

Grady M. Towers

Reprinted from Lucid Vol. VIII, No. 4/5 (#42/43), Aug./Oct. 1988.
Lucid is the newsletter of the Mensa “Truth SIG.” This issue was edited by
Dale Adams. Some of Grady’s essays were printed in more than one place;
if this has been published elsewhere, a further citation will appear in Noesis.

No anthropologist believes that IQ tests measure intelligence. At best he believes that IQ tests measure only a small part of intelligence, and by far the least important part. This is because the anthropologist does not use the word intelligence in the same way as the psychometrician uses it. The anthropologist thinks of intelligence as the individual’s global capacity to adapt to his environment and to exploit it to his, and his group’s, advantage. To the anthropologist, any nonphysical ability possessed by man but not possessed by animals, or possessed in only a rudimentary way by animals, is a legitimate manifestation of intelligence. An individual’s ability to sing, dance, create art, see visions, or fashion tools is as much a part of man’s intelligence as his ability to do geometry or argue philosophy. IQ tests are good estimates of the latter, but have little correlation with the former, and in the context of man’s evolutionary history, the anthropologist considers these non-IQ attributes to be the more important. There can be little wonder, then, that anthropologists regard IQ tests with skepticism. IQ tests are not measures of general adaptability.

There’s a syllogism that one sometimes finds in books on logic that goes like this:

No horse has two tails.

Every horse has one more tail than no horse.

Therefore, every horse has three tails.

This fallacy is known as the fallacy of the undistributed middle. The erroneous conclusion is the result of giving the expression “no horse” two entirely different meanings.

Much of the confusion that exists with regard to intelligence testing is a consequence of committing the fallacy of the undistributed middle. The reason that people disagree on whether or not IQ tests are legitimate measures of intelligence is that they are using the word intelligence in two entirely different ways.

Just as the anthropologist employs the term intelligence in his own way, most people seem to use the word as a synonym for mental power, and most of them accept without much question that mental ability tests are valid indicators of mental power. This may be one reason why special education for the gifted sometimes meets with such opposition; why give special privileges to those who are already powerful? But to equate mental ability with mental power is to commit the fallacy of the undistributed middle as surely as confusing the anthropologist’s definition with that of the psychometrician. IQ tests are not measures of mental power.

The modern theory of mental ability rests upon a mathematical procedure known as factor analysis. The fundamental idea of factor analysis is that a multitude of indicators (items in a test) may be “explained” by a relatively small number of factors. The motivation is to account for as wide a range of indicators as possible in the most economical way possible. If a factor recurs in many investigations, and on different samples of individuals, the factor takes on the status of a hypothetical construct. A good example of a hypothetical construct in the physical sciences is gravity. It can’t be detected directly, but its effects can be seen and measured, and although we may have no idea as to its ultimate nature, a great deal of practical use can be made of it and its associated measurements; the concept of gravity “explains” much. All science aims at parsimony of explanation, and the technique of factor analysis is one of the most powerful of all tools employed by social scientists to achieve this aim.

When intelligence tests are factor analyzed, there are normally seven factors extracted: verbal meaning, verbal fluency, reasoning, number, space, memory, and perceptual speed. But all of these factors correlate, or overlap with one another to such a degree that what is common to all of them accounts for most of the variance in test scores. This common property is itself a factor, the general factor, and has been given the symbol g. All of these factors now have the status of hypothetical constructs, but g is by far the most important of them. A test is an intelligence test only insofar as it is saturated with g. Psychometricians make a conceptual distinction between intelligence and g, but for all practical purposes they treat both terms as interchangeable.

There are no pure measures of g. The Raven Progressive Matrices, and other culture reduced tests, are currently the best approximations of pure g. Consequently, an IQ from a more traditional, culture loaded test is necessarily contaminated by other factors such as the verbal, numerical, and spatial factors. This contamination means that, due to the different degrees and kinds of factors present, scores derived from different IQ tests aren’t always interchangeable. However, these contaminating factors can also improve the predictive power of a test, making it a better test, even though it’s a less accurate measure of g. IQ tests. therefore, measure more than intelligence in the g sense, but less than intelligence in the anthropological or common sense meaning of the word.

All seven factors correlate highly with the general factor, but some correlate more highly than others. The two with the highest g loadings are the verbal and reasoning factors. Psychometricians sometimes make use of this by constructing their tests from verbal and reasoning items to the exclusion of other types of items. This can both shorten and improve a test. Those tests that load highly on the verbal factor are said to be culture loaded, and those that make use of reasoning items to the exclusion of verbal items are said to be culture reduced. (This is an oversimplification, but will have to do for now. ) The Concept Mastery Test is a good example of a highly verbal, culture loaded test, and the Raven Progressive Matrices is a good example of a culture reduced test that loads highly on the reasoning factor alone. A test like the Cattell III, which Mensa uses as an entrance examination, is almost evenly divided between these two kinds of factors.

Each kind of test, culture loaded or culture reduced, has its own peculiar virtues and defects. If the purpose in employing a test is to predict immediate, academically relevant performance, the highly verbal, culture loaded test is the better choice. This is true even with members of other cultures when the criterion being predicted is performance in our own culture in the near future.

The disadvantage of culture loaded tests is that they’re less relevant in the long run and that they tend to become obsolete even in their own culture. It’s necessary to renorm them periodically.

The advantages of culture reduced tests are that they are able to predict nonacademically relevant performance better than culture loaded tests, and they’re thought to be better predictors of future, or life-long performance. They’re also able to predict, to a degree, some kinds of performance in non-Western cultures. One of their greatest advantages is that they do not tend to become obsolete, and therefore can provide accurate information about population trends over long periods of time.

The culture reduced tests’ chief disadvantages are that they do not predict culturally relevant performances as well as culture loaded tests do, and that scores on such tests fluctuate more an a day to day basis. Important decisions made on the basis of culture reduced testing should be made using more than one testing. This is also true of the more traditional tests, but to a lesser extent. Presumably a test like the LAIT, which takes a long time to complete, will smooth out the daily fluctuations so that this criticism is not relevant to power tests of great length.

Psychometricians make use of the other factors in constructing other kinds of ability tests. Clerical ability tests, for example, are heavily saturated with the numerical and perceptual speed factors. It’s interesting to note that the numerical factor correlates just as highly with measures of spelling and grammar as it does with measures of computational ability. A better name for this factor would probably be automatized reasoning.

Test makers are always interested in improving the performance of their tests. In practical terms, this means making their tests correlate more and more highly with some real world criterion such as grades in school, highest grade completed, income, and the like. One interesting attempt to do this was L.L. Thurstone’s construction of a test to measure the Primary Mental Abilities. The idea was to give separate scores for all seven factors discovered by factor analysis. It was thought that some factor, or combination of factors, would prove especially relevant to a given task, and would therefore make a better predictor of that task than measures of g. This, indeed, turned out to be the case, but the improvement was so slight as to be unworth the additional effort in most cases.

The only really successful efforts to make use of part scores appear to be in the SAT and the GRE, where both a verbal and a quantitative score are given. The quantitative score was thought to be a better predictor in the physical sciences and engineering, and the verbal score to be a better predictor in the arts and humanities. The latest evidence tends to show that this is an oversimplification. The current findings indicate that quantitative ability is a good indicator of choice of field, but that verbal ability is a better predictor of grades, even in highly mathematical subjects. Also, contrary to popular impression, scientists and engineers are verbally superior to arts and humanities majors, as well as being mathematically superior. Possibly this erroneous impression is due to the scientist’s unwillingness to guess in the absence of data. He says nothing when he has nothing to say.

Each of the factors discovered by factor analysis can be further analyzed. This has been done by G.P. Guilford who has isolated 120 different factors. This model of intelligence, while of some theoretical interest, has little power for predicting real world performance. In general, then, we can conclude that the most useful tests in the field of ability testing are those most highly saturated with g.

But what is g? Even if we can’t define it exactly, can we at least characterize it in some way? In Bias in Mental Testing, p. 250, A.R. Jensen summarizes what we now know about g in the following way:

By examining the surface characteristics of a great variety of tests in connection with their g loadings, we may arrive at some descriptive generalizations about the common surface features that characterize tests that have relatively high g loadings as compared with tests that have relatively low g loadings. Today we have much more test material to examine for this purpose than was available to Spearman more than half a century ago. This permits broader generalizations about g than Spearnan could safety draw. Spearman characterized the most g-loaded tests essentially as those requiring the subject to grasp relationships—“the eduction of relations and correlates.” That is all perfectly correct. But now we can go further. The g factor is manifested in tests to the degree that they involve mental manipulation of the input elements (“fundaments” in Spearman’s terminology), choice, decision, invention in contrast to reproduction, reproduction in contrast to selection, meaningful memory in contrast to rote memory, long-term memory in contrast to short-term memory, and distinguishing relevant information from irrelevant information in solving complex problems. Although neither the forward nor backward digit-span test of the Wechsler Intelligence Scale, for example, has much g loading. the slightly greater mental manipulation required by backward than by forward recall of the digits more than doubles the g variance in backward as compared with forward digit span (Jensen & Figueroa, 1975). We have seen many examples in which a slight increase in task complexity is accompanied by an increase in the g loading of the task. This is true even for the most mundane and seemingly nonintellectual tasks. Virtually any task involving mental activity that is complex enough to be recognized at the commonsense level as involving some kind of conscious mental effort is substantially g loaded. It is the task’s complexity rather than its content that is most related to g.

We see, then, that g loaded tests are not measures of general adaptability, nor of mental power. A task is g loaded to the extent that it is complex, and an individual is intelligent in a psychometric sense to the extent that he can cope with complexity.