Google Word Frequency Counts

From MallWiki

Jump to: navigation, search

Below are the ~38000 words with the highest frequencies in the Google corpus. This represents all of the tokens that appear with a frequency of greater than one in a million. The corpus size is 1024908267229, that is over a trillion words. So even words that appear once in a million tokens have been seen over a million times, so estimates should be very tight. Hopefully, this will make selection artifacts largely a thing of the past (at least as far as frequency is concerned). Note tokens with different capitalization have been combined in the list below.

Image:GoogleWordFrequency.txt

The full count list. Note this version keeps tokens with different case variations separate.

Image:Googlecountsbig