Dear Kevin Langdon:
Let me at least concentrate on areas of possible agreement with you.
Professor Jack Good solved the problem of true correction to the Ferguson formula at Cambridge University many decades ago when he suggested a modification to admissions to that august body.
This was to add the squares of the deviations for individual subjects. In our case say the admission level were an arbitrary figure of 10 over two tests. Scoring 9 and 11 earns the candidate this figure as does 10 and 10. If, however, we square these figures and then add them we get 202 and 200 with a win to the candidate scoring 9 and 11. Consider a third candidate scoring 8 and 12 with 208. As our ideal candidate reaches closer to the single figure cut-off on one test s/he becomes more likely to be the accepted one. This validates the observation you have made: namely, that the tests are bound to correlate more closely at their top ends than over their entire range after corrections are applied for the smaller range of the scores as the ceilings are approached. This is true since it revolves around the fact that the correlations are themselves simply the sum of the squares and it is in line with what one would expect.
We can increase the power of selection further by calculating the contributing weight for each item in the test based simply on the power of the item to discriminate between accepted and rejected candidates in accordance with the R-value of the item over a limited range or at least at first the discrimination within the total field: the point of re-weighting can be moved up within the field as the number of candidates increases. We will thus have restricted or limited range with this made progressively smaller as the numbers continue to build--the power of selection continuing to grow with time. We can go further by squaring the discrimination of the individual items giving an ``internal'' aplication of ``Ferguson-Good corrected adaptation'' to the individual tests, i.e., to their items.
We can carry this still further by looking at wrong answers to the multiple-choice items made by candidates aproaching or exceeding the selection level. Since some distractors are more plausible than others (whether by intention or not) we can add a corrected weighting to those items over which our candidates failed, thus adding more ``punch'' around the cut-off we seek. This follows the work of Giles, who discovered most failure to respond to an item correctly occurred with a single distractor and that to fail an item by choosing a distractor that is less plausible than some other candidate is to indicate a weakness greater than the other candidate. Thus items which were chosen incorrectly also had value to the total predictive power of the test. To have chosen stupidly was more indicative of stupidity than to have merely failed an item.
Over two questions we might have:
Our candidate would earn marks but less marks for less plausible responses. In a 50-item test one would earn a set of 250 responses when five choices are given. In effect, we lengthen the test, thus increasing its power to discriminate--indeed, this process could continue even further for the pattern of the marks obtained can be thought of as converging to the ideal.
This process can also allow a downward extension to IQ's which would otherwise fall below the floor of the test.
To extend the test to the magic heights one would need only use items which the best of our candidates got right no more than, say, 30 or 40 percent of the time. Pure probability theory can then be used to establish the ceiling for such a super-test for the failure level can be used as a sequence of coin-flips into which the right side of the bell curve may be mapped.