For puzzle people, this is huge news


If that looks like something other than gibberish to you, then you might be a puzzle person — specifically, a puzzler who has solved your share of cryptograms. (Or played PuzzleNation’s Guessworks.) Those twelve letters are the most commonly used in the English language… or so it was widely thought for a very long time.

The original frequency list was put together by a researcher named Mark Mayzner back in 1965. The English language has surely drifted around a bit in the last fifty years — perhaps it was time to revise the list. Furthermore, computing power has improved somewhat since the mid-sixties. Mayzner’s famous list was based on just 20,000 words; a new look would obviously cast a much wider net.

And that is why Mayzner, now 85 years old, contacted Peter Norvig of Google. Using Google Books and the research tool Ngrams, Norvig redid Mayner’s work, but on a grand scale, analyzing 97,565 distinct words, which were collectively used in books over 743 billion times. That breaks down to over 3.5 trillion letters. Norvig sorted them for frequency and — whoa! We have a new list!


Mayzner’s low-tech effort holds up pretty well — we don’t see any variation for the first seven letters. After that, things change a little: I always suspected R was getting short shrift compared to H. I’m not too surprised to see L edge out D, either. And U must be bummed to have slipped a place. “I’m a vowel!” you hear him cry. “I’m just as important as E or A! Do you hear me?!”

Peter Norvig’s full report, with lots more fascinating trivia about letter and word usage, can be read here.

Update: The original frequency list apparently pre-dates Mayzner.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s