Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My New Year’s Resolution was to learn Python. After taking a few online courses, I became comfortable enough with the language to tackle a small side project. Side projects are great for learning a language because they let you “own” a project from start to finish as well as solve a problem that is of genuine interest to you. While I was interested in having a side project in Python for a while, it took me a while to find a project that interested me.

This all changed during the COVID-19 lockdowns. In order to pass the time my mother (a retired English teacher) became obsessed with Scrabble and insisted on playing game after game with me. The problem is that I hate the game, and not good at it, and kept on losing. Eventually I realized that it would be straightfrorward to write a program in Python that looked at my rack of letters and listed the highest scoring word I could create. Voila – my first Python side project was born!

I just wrapped up this project and decided to share it because it might help others who are interested in Python. Most people read my blog because of Choroplethr (my suite of R packages for mapping open datasets) or my various R trainings. However, over time I’ve learned that many of my readers are also interested in Python. Additionally, most data-related jobs in Industry (as opposed to Academia) use Python rather than R.

You can view the “Scrabble Cheat” project on github here. The key function is get_all_words, which takes a string that represents a set of tiles. It returns a list of tuples that represent valid words you can form from those letters, along with their score in Scrabble. The list is ordered so that the highest-scoring word appears first:

> get_all_words('ilzsiwl')
[('zills', 16),
('swiz', 16),
('zill', 15),
('wiz', 15),
('liz', 13),
('isz', 12),
('zs', 11),
('wills', 10),
('swill', 10),
('willi', 10),
...
]

This post will help you make sense of this output (i.e. “what is a list of tuples, and why is the data structured this way?”) But first, it’s useful to do a compare-and-contrast between Base R and Python Builtins.

# Base R vs. Python Built-ins

One of the central concepts in R is the distinction between “Base R” and “Packages you choose to install”. Base R, while itself a package, cannot be uninstalled, and contains core language elements like data.frame and vector. “Base R” also colloquially refers to “all the packages that ship with R and are available when you load it” such as utils, graphics and datasets.

One of the more confusing things about R is that people are increasingly moving away from Base R to 3rd party libraries for routine tasks. For example, the utils package has a function read.csv for reading CSV files. But the read_csv function from the package readr is actually faster and does not automatically convert strings to factors, which is often desirable. Similarly, the graphics package has a plot function for making graphs, but the ggplot function in the ggplot2 package is much more popular.

This split between “functionality that ships with R” and “how people ‘in the know’ actually use R” is inherently confusing. Python’s equivalent of “Base R” is called “Built-ins”. (You can see the full list of Python’s Built-ins here). But unlike R, it appears that people are generally happy with Python’s Built-ins, and do not recreate that functionality in other packages. In fact, when talking to my friends who teach Python, they emphasized that expertise in Python often comes down to having fluency with the Built-ins.

# Python’s Built-in Data Structures

The main Built-in Data Structures that I used in this project are Dictionaries, Lists and Tuples.

### Dictionaries

Dictionaries (often just called Dicts) define a key-value relationship. For example, each Scrabble letter can be viewed as a key, and its numeric score can be viewed as its value. We can store this information in a Python Dict like this:

> letter_scores = {'a': 1,  'b': 4,  'c': 4, 'd': 2,
'e': 1,  'f': 4,  'g': 3, 'h': 3,
'i': 1,  'j': 10, 'k': 5, 'l': 2,
'm': 4,  'n': 2,  'o': 1, 'p': 4,
'q': 10, 'r': 1,  's': 1, 't': 1,
'u': 2,  'v': 5,  'w': 4, 'x': 8,
'y': 3,  'z': 10}

> letter_scores['a']
1
> letter_scores['z']
10

The Dict itself is defined by curly braces. Each key-value pair within the Dict is defined by a colon, and each element of the dict is separated by a comma.

The page on Built-ins says that Dicts are created with the keyword dict. However, they can also be created with the symbol { }. As a rule of thumb, Python programmers prefer to define data structures with symbols instead of keywords.

Note that R does not really have an equivalent data structure. In the accepted answer to this question on Stack Overflow people say that a List with Names is as close as you can get. However, there are still significant differences between the two data structures:

• In a Python Dict, Keys must be unique. In R, List Names do not have to be unique.
• In a Python Dict, each Key can be of a different type (e.g. int or string). In R, all List Names must be of the same type.

### Lists

Lists are probably the most common type in Python. They are similar to Vectors in R, in that they are meant to store multiple elements of the same type. However, R strictly enforces this requirement, while Python does not.

Scrabble Cheat uses a List to store the contents of a file that contains a dictionary of English words. We then iterate over this list to see which words can be spelled with the user’s tiles. Here is code to read in the dictionary from a file:

all_words
>>> ['a',
'aa',
'aaa',
'aah',
'aahed',
'aahing',
'aahs',
... ]

Here we open the file with open and read it in as a string with read. The split function breaks the string into a list of smaller strings, using a blank space as the delimeter. This type of function chaining is very common in Python.

### Tuples

Tuples are used to store data that has multiple components. For example, a location on a map has two components: longitude and latitude. Tuples are also immutable, which means that you cannot change their values after creation.

Scrabble Cheat tells you each word that your tiles can make, along with the Scrabble score of that word. Each (word, score) pair is stored as a Tuple. Because each set of tiles can normally make multiple words, the return value of get_all_words is actually a List of Tuples:

get_all_words('ttsedue')
[('etudes', 8),
('dustee', 8),
('detest', 7),
('stude', 7),
('tested', 7),
('tutees', 7),
('suede', 7),
('etude', 7),
('duets', 7),
... ]

In addition to being created with parentheses, Tuples can also be created with the tuple keyword.

# List Comprehensions

Many languages have functionality for creating a new list as a function from another list. Python provides a way to do this that I have not encountered before. It is called a List Comprehension and has the following template:

[ object_in_new_list
for element in old_list
if condition_is_met ]

Scrabble Cheat uses a List Comprehension to iterate over a list of English words and pluck out the words which can be spelled with the user’s tiles. If the word can be spelled, then it is put into a Tuple along with its score. The actual code looks like this:

[(one_word, get_word_score(one_word))
if can_spell_word(one_word, tiles)]

(The actual code is a bit more complex, and you can see it here.)

While I have not encountered List Comprehensions before (and they are certainly not a feature in R), it appears that they have appeared in other programming languages in the past (see 1, 2).

# Wrapping Up

This was a fun project that helped solidfy the book knowledge that I had recently gained about Python. It gave me valuable experience with Python’s Built-ins, and the write up helped me to solidify my understanding of some key differences between R and Python.

A small confession: the actual game I am playing with my mom is Zynga’s Words with Friends (WWF) not Hasbro’s Scrabble. I consider WWF to be a knock-off of Scrabble, and it is also a bit more clunky to type, so I just refer to it as Scrabble in this post. Also, the dictionary my app uses is much larger than the official WWF dictionary, so many of the words the app recommends you cannot actually use.

If this post winds up becoming popular, then I can do another one as I continue to learn Python. (I am currently looking for a side project that will give me some experience with Pandas, Mathplotlib and/or Seaborn).

# Interested in Learning Python?

The best resources I found for learning Python came from my friends Reuven Lerner and Trey Hunner. Both are professional Python trainers who (a) specialize in doing live corporate trainings and (b) have recently launched consumer products for individuals. Reuven’s Introductory Python course was especially helpful in getting me quickly up to speed with the basics. Trey’s Python Morsels, which sends you one problem a week, was helpful in forcing me to continue to practice Python every week. (I am not being paid to recommend these courses – I am simply passing along that they helped me).

The post Using Python to Cheat at Scrabble appeared first on AriLamstein.com.