Price’s Protein Puzzle: 2023 update
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the joys (?) of having been online for…quite some time now…is watching topics reappear every few years or so.
So what’s new?
In terms of English word matches: not much. Some new proteins but no new 9-letter words. The Twitter thread, above, contains an interesting reply about an approach using generative AI:
Other languages are somewhat constrained by (1) the quality of the word lists that I could find online (in Github) and (2) for some languages, the presence of characters not found in the English alphabet which reduces the viable word list even further. That said, there a few fun matches. I am not a linguist so I’m relying on Google Translate and other online translators here.
In addition to the previously-noted 10-letter Italian word ANNIDAVATE we have:
- sp|B2II34|KATG_BEII9 – GANGARILLA (Spanish) – a company of strolling players
- sp|P40069|IMB4_YEAST – FERRAILLAI (French) – je ferraillai (I scrapped)
All of the languages have 9-letter matches except Swedish (maximum 8 letters, for example STALLARE – stabler). Spanish was a rich source of hits (452 distinct words > 7 letters), although that’s probably due in large part to the large size of the Spanish word list used. Swedish the lowest (26 distinct words > 7 letters), perhaps due to the large number of unusable words with non-amino acid alphabet characters.
There are 9 hits to the start of a protein. Some of these are:
- sp|O49997|1433E_TOBAC – MAESTREEN (Spanish) – you direct
- sp|Q3V0Q6|SPAG8_MOUSE – METTESTE (Italian) – you put
- sp|Q49135|FCHA_METEA – MAGNETIET (Dutch) – magnetite
And so ends the update for another year.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.