I was thinking about the Tversky/Kahneman mind trick of asking how many words start with R vs how many words have R in the third position. This comes from a famous 1973 paper on the availability bias. Kahneman and Tversky write:
According to the extensive word-count of Mayzner and Tresselt ( 1965), there are altogether eight consonants that appear more frequently in the third than in the first position. Of these, two consonants (X and Z) are relatively rare, and another (D) is more frequent in the third position only in three-letter words. The remaining five consonants (K,L,N,R,V) were selected for investigation.
The Availability Bias
What did they find? They found that when people were asked “For this particular letter, do you think it occurs more often in the first position of a word than it does in the third position?” 2/3 of (not many) respondents said the first position. Using this “data,” the famous duo declared that the “availability heuristic” fails when asking people about those five letters. They called this a bias toward what’s available in the mind - we can all name many more words with a given letter at the beginning than in the third position.
As with so much research done before about 2010, this paper has several notable (questionable) aspects:
First, they didn’t answer the question, “So what? It actually works correctly for most of the rest of the letters!” For a random letter, you’re better off going with the availability heuristic, and it’s only a “bias” in the case of about 7 letters out of 26. How could you predict which letters would occur more often in the first position than in the third? I don’t think even extremely smart people have a way to do this. I don’t think it’s predictable.
Was there a Data Bias?
Second, believe it or not, Herbert Olman, in 1959, came up with a completely different list! From table 7 of this report:
Letters that occur significantly more in the first than third position: B, C, F, G, H, J, K, M, S, V, W, X, Z
Letters that occur roughly the same in first and third: A
Letters that occur significantly more in the third position than first: D, E, I, L, N, R, T, U, Y,
Notes: most of these occur about 2-3 times more often than their counterparts. H is a huge outlier, it's 4x more prevalent in first than third position. W is even more, with 5x. Amazingly, J occurs 1 percent of the time in first and NEVER in third!
This is not the same list as Tversky and Kahneman used.
What Can we Learn from this?
Understanding human behavior is hard. I believe there is good value in the rest of the paper, but the letter-position problem shows that people are simply guessing when they approach a problem like this, and statistically guessing that there are more words that start with a given letter than have it in the third position is a pretty good rule!
I think their point was “Yes, but it fails some of the time, see?” And sure enough, it does. But I don’t think very smart people could take a letter and give an educated guess on whether it had more in the first than in the third position. It’s like asking people how many satellites are orbiting Earth. People simply have nothing to go by.
Either data set you use, picking a random letter, you're more likely to find it in first position than in third. That’s the availability heuristic, and it works pretty well much of the time.
Please read the full paper and tell me if you find anything else interesting in it.