Single Letter Frequencies in English

[This article was first published on "R-bloggers" via Tal Galili in Google Reader, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Every time that I read a paper that discusses the frequencies of single letters in English, I feel like I should sit down and calculate them for myself from a sample of English text. Today, I finally did. Here are the probabilities and negative log probabilities of the characters in English over the corpus of Shakespeare’s plays:

Single Letter Probabilities.png
Single Letter Inverse Probabilities.png

And, for those who care, here’s the code to generate the data from the plays, which I downloaded from Project Gutenberg:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def initialize_letter_counts(letter_counts)
  ('a'..'z').each do |chr|
    letter_counts[chr] = 0
  end
end
 
def parse_file(filename, letter_counts)
  f = File.new(filename)
  begin
    while 1
      char = f.readchar().chr.downcase
      if char.match(/[a-z]/)
        letter_counts[char] = letter_counts[char] + 1
      end
    end
  rescue EOFError
    return nil
  end
end
 
directory = '/Users/johnmyleswhite/Princeton/Research/Letter Frequency'
 
Dir.chdir(directory)
 
letter_counts = {}
 
initialize_letter_counts(letter_counts)
 
Dir.new('Data').entries.each do |entry|
  if entry.match(/.txt$/)
    entry = File.expand_path(entry, directory + '/Data')
    parse_file(entry, letter_counts)
  end
end
 
letter_counts.keys.sort.each do |key|
  puts "'#{key}',#{letter_counts[key]}"
end

To leave a comment for the author, please follow the link and comment on their blog: "R-bloggers" via Tal Galili in Google Reader.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)