I’ve created an app, Shinywordle.
The use of regular expressions (regex) to solve the game is interesting. As an applied statistician I can’t consider myself a regex expert, but these have helped me a lot when working with non-structured data such as texts and Natural Language Processing.
Each time we have a text, we can detect patterns. For example, in emails we tend to see “Hi Pacha, [Content] Regards, [Sender’s name]”. On Twitter, usernames start with “@”.
We can use those repeated patterns to out advantage, and here’s where regex are very powerful. For example, I can download an archive of tweets and search for “#[Rr][Ss]tats”, that would match “#rstats” and “#RStats”.
The previous regex can be explained by parts:
- #: starts with #
- [Rr][Ss]: exact match that coincides with characters R/r or S/s
- tats: literal match (case sensitive)
I could also use “[A-Za-z0-9]”, if for example I want to look for usernames containing both uppercase or lowercase letters and numbers.
Regex can get complicated quite quickly. For example, “-?(.” means to find numbers
- -: literal match for “-” (i.e. negative numbers)
- ?: means the previous element is optional and can’t be present or not
- same as [0-9], a number
- : it must be at least one element of the previous type, in this case a number
The code for the Shiny app is on GitHub and it uses some of the code posted by Hadley Wickham in recent days.
How does the app work?
For the user input, it takes the correct letters and subsets an English dictionary to the set of words containing the green letters (i.e. correct letters in correct order).
In the previous subset, keep the words that contain the yellow letters (i.e. correct letters in incorrect order) but removing those that contains the yellow letters in the i-th position.
In the twice filtered subset, remove the words containing any of the gray letters (i.e. incorrect letters)
These steps are using regex. In the code, the steps are completed here, in part by using a function such that if the game tells us that the “a” in “apple” is yellow, we remove “apple”, “apricot”, etc but can’t remove “banana” provided we know “banana” is a possible word. This is done by using regex.
The rest of the filtering consists in using an additional function that inverts matches. For example, if the “a” in “apple” is gray, we have to remove any word contaning an “a” in any position.
Shameless self-promotion: If you liked this post, I am an Applied Statistician with years of experience in R, Shiny, APIs, SQL and finance. If you think I can be a valuable addition for your team, I’m happy to read from you and provide more details. My email is mavargas 11 [ at ] uc dot cl or send me a tweet to pachadotdev.