Find out Bulk Email ID Reputations Risk using R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you are working in Info Sec / Cyber Security, One of the things that might be part of your day job is to filter email to remove spams / phishing emails. While this could be done at several levels and ways, monitoring the email id (like [email protected]
) and validating its reputation to see if it seems risky / suspicious or authentic and then allowing them to reach the user inbox – is one of the solid ways (while it’s also error-prone with False Positives). In this post, We’ll see how to check the reputation of Email Address in your R code.
emailrep – Intro
The package that’s going to help us in checking the reputation of Email ID is emailrep
by Bob Rudis. emailrep
is an R-binding for the EmailRep
API provided by the service emailrep.io
emailrep.io Reputation – What does it mean?
Before we move on to the code section, It’s important to understand what does the reputation mean? It simply means the email hasn’t been seen anywhere trustworthy on the internet with the assumption that Trustworthy email addresses have a history and record across multiple sources on the web.
emailrep – Installation
emailrep
can be installed from Bob Rudis’ CINC (which ironically stands for CINC Is Not CRAN)).
install.packages("emailrep", repos = "https://cinc.rud.is")
or from multiple other online repos from various Git services
remotes::install_git("https://git.rud.is/hrbrmstr/emailrep.git") # or remotes::install_git("https://git.sr.ht/~hrbrmstr/emailrep") # or remotes::install_gitlab("hrbrmstr/emailrep") # or remotes::install_bitbucket("hrbrmstr/emailrep") # or remotes::install_github("hrbrmstr/emailrep")
emailrep – Loading and Basic Example
Once installed, emailrep
can be loaded like any other R package:
library(emailrep)
emailrep
is quite simple in its structure with one function email_rep()
doing the job for us. Let’s try to find out the reputation of email id – [email protected]
email_rep("[email protected]") ## $email ## [1] "[email protected]" ## ## $reputation ## [1] "high" ## ## $suspicious ## [1] FALSE ## ## $references ## [1] 22 ## ## $details ## $details$blacklisted ## [1] FALSE ## ## $details$malicious_activity ## [1] FALSE ## ## $details$malicious_activity_recent ## [1] FALSE ## ## $details$credentials_leaked ## [1] TRUE ## ## $details$credentials_leaked_recent ## [1] FALSE ## ## $details$data_breach ## [1] TRUE ## ## $details$last_seen ## [1] "02/25/2019" ## ## $details$domain_exists ## [1] TRUE ## ## $details$domain_reputation ## [1] "high" ## ## $details$new_domain ## [1] FALSE ## ## $details$days_since_domain_creation ## [1] 11853 ## ## $details$suspicious_tld ## [1] FALSE ## ## $details$spam ## [1] FALSE ## ## $details$free_provider ## [1] FALSE ## ## $details$disposable ## [1] FALSE ## ## $details$deliverable ## [1] FALSE ## ## $details$accept_all ## [1] FALSE ## ## $details$valid_mx ## [1] FALSE ## ## $details$spoofable ## [1] FALSE ## ## $details$spf_strict ## [1] TRUE ## ## $details$dmarc_enforced ## [1] TRUE ## ## $details$profiles ## [1] "linkedin" "angellist" "facebook" "spotify" "vimeo" ## [6] "instagram" "github" "twitter" "pinterest" "aboutme"
As we can see above, the function returns a list with a lot of different basic attributes like reputation
and suspicious
. It also returns some interesting attributes like data_breach
– whether the email id was part of some data leak and profiles
– the places / profiles on internet where the email id has appeared.
emailrep – use-case: Multiple IDs
As a Data Scientist, It’d be rare that you are dealing with single email ID for which the reputation could be simply found online. Our programming skills would play well when we’ve got to do this for a bulk of email ids.
Let’s try to find out if reptuation of about 3 IDs together and assigning the output in a dataframe so that it could be used for any further purpose like visualization.
# list of email ids email_ids <- c("[email protected]", "[email protected]", "[email protected]")
We’ll use purrr
for a bit of functional programming (with map()
)
library(purrr) library(magrittr) result_df <- map(email_ids, email_rep) %>% map_df(., magrittr::extract, c("email","reputation","suspicious")) result_df ## # A tibble: 3 x 3 ## email reputation suspicious ## <chr> <chr> <lgl> ## 1 [email protected] medium TRUE ## 2 [email protected] none TRUE ## 3 [email protected] high FALSE
Summary
Thus, we learnt how to use emailrep
to bulk identify reptuation and other such risk attributes of email ids. This should help in Data Security, Validating email for Email Marketing and in Salesforce Automation and many other instances depending upon your area of business.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.