Hash-tag baby
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Naming a baby can be a difficult task. It can take – soon to be – parents quite some time to align on a name they both like. Afterwards it can also be challenging to keep the name a secret until birth.
If you are lucky, there might be some close friends or colleagues who are simultaneously expecting a child. It can be an unique occasion where you can share experiences. However, wouldn’t it be a shame if they have a similar name in mind as you? You would likely want your children to have a more or less unique name, no? In data science we could use a Universally Unique Identifier (UUID) to solve this problem. Or one might choose very exotic names, like Æ A-Xii. Even though basically guaranteeing uniqueness, these might not be appropriate solutions for naming a child…
So, given that you have chosen a more ‘conventional’ name, there is a risk it is similar to the one your friends chose. Before you start to design your birth cards, you might want to align with eachother to avoid similarities. You could check that your names don’t start with the same letter. You could also compare the total amount of letters. But unfortunately these kind of hints slightly give away what your name could be. Moreover, they might overlook slightly similar names and different ways of writing the same name. So how can you properly compare baby names without revealing anything?
This seems like an impossible task. Which is the type of task we like to solve.
Let us use hashes to check for clashes!
What is hashing?
Powerful algorithms like md5 and sha256 allow you to convert your baby name into some unique text that seems like complete nonsense. As an example, we can do the following in R:
digest::digest('Open Analytics', algo = "md5") -> 620b15afa8838824d3f396b1cff4a68c
This conversion is called hashing. Think of it as taking a persons fingerprint. You can easily take a fingerprint from a person, uniquely identifying that person. However, unless you have access to a database with known criminals, a fingerprint alone is completely meaningless. Based on the fingerprint alone, there is no way to reconstruct of which person it came.
So, if we only share the ‘fingerprints’ of our name(s) with our friends, we can easily compare if they clash. Without spoiling any other information about the names.
The one loophole is of course that your friends can start trying out every possible name until they find a match. They would waste a whole lot of time and karma points doing this though, especially if they have no prior clue as to what your name might be. This is like brute-force hacking an account, trying each and every password until you get access.
Because hashing is so easy and secure, it is no wonder it is used everywhere in IT. The most common example is password storage. Instead of storing your actual password, good providers will only keep an encrypted version. This is sufficient to check if you provided the correct password, without actually ever knowing it themselves! Meaning that in case of data leaks, usually only the encrypted version gets leaked. Which means data leaks are not always directly harmful.
Does this mean you are always safe? No! Hackers have big ‘fingerprint databases’ matching passwords with their commonly found hashed forms. If you use an easy to guess password, chances are they’ll find a match in their database. So make sure to use non-predictable passwords! But let’s stop the technical rant here, this was just a small post about babies, no?
Try it out!
You can start comparing baby names here.
The app is written in R Shiny, an easy framework to create data science apps. We afterwards build it using shinylive, which allows it to run entirely within a browser. By the use of github pages it can also be hosted for free. Excellent guidelines on how to do this can be found here.
This approach might be one of the easiest free ways to deploy an application. However, keep in mind that your app will be open to the entire web and might not be as performant as it could be. If you need a more advanced open source solution for deploying apps, we recommend taking a look at ShinyProxy.
Enjoy ecrypting and comparing your baby names!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.