Find out Bulk Email ID Reputations Risk using R

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you are working in Info Sec / Cyber Security, One of the things that might be part of your day job is to filter email to remove spams / phishing emails. While this could be done at several levels and ways, monitoring the email id (like [email protected]) and validating its reputation to see if it seems risky / suspicious or authentic and then allowing them to reach the user inbox – is one of the solid ways (while it’s also error-prone with False Positives). In this post, We’ll see how to check the reputation of Email Address in your R code.

emailrep – Intro

The package that’s going to help us in checking the reputation of Email ID is emailrep by Bob Rudis. emailrep is an R-binding for the EmailRep API provided by the service emailrep.io

emailrep.io Reputation – What does it mean?

Before we move on to the code section, It’s important to understand what does the reputation mean? It simply means the email hasn’t been seen anywhere trustworthy on the internet with the assumption that Trustworthy email addresses have a history and record across multiple sources on the web.

emailrep – Installation

emailrep can be installed from Bob Rudis’ CINC (which ironically stands for CINC Is Not CRAN)).

install.packages("emailrep", repos = "https://cinc.rud.is")

or from multiple other online repos from various Git services

remotes::install_git("https://git.rud.is/hrbrmstr/emailrep.git")
# or
remotes::install_git("https://git.sr.ht/~hrbrmstr/emailrep")
# or
remotes::install_gitlab("hrbrmstr/emailrep")
# or
remotes::install_bitbucket("hrbrmstr/emailrep")
# or
remotes::install_github("hrbrmstr/emailrep")

emailrep – Loading and Basic Example

Once installed, emailrep can be loaded like any other R package:

library(emailrep)

emailrep is quite simple in its structure with one function email_rep() doing the job for us. Let’s try to find out the reputation of email id –

email_rep("[email protected]")
## $email
## [1] "[email protected]"
## 
## $reputation
## [1] "high"
## 
## $suspicious
## [1] FALSE
## 
## $references
## [1] 22
## 
## $details
## $details$blacklisted
## [1] FALSE
## 
## $details$malicious_activity
## [1] FALSE
## 
## $details$malicious_activity_recent
## [1] FALSE
## 
## $details$credentials_leaked
## [1] TRUE
## 
## $details$credentials_leaked_recent
## [1] FALSE
## 
## $details$data_breach
## [1] TRUE
## 
## $details$last_seen
## [1] "02/25/2019"
## 
## $details$domain_exists
## [1] TRUE
## 
## $details$domain_reputation
## [1] "high"
## 
## $details$new_domain
## [1] FALSE
## 
## $details$days_since_domain_creation
## [1] 11853
## 
## $details$suspicious_tld
## [1] FALSE
## 
## $details$spam
## [1] FALSE
## 
## $details$free_provider
## [1] FALSE
## 
## $details$disposable
## [1] FALSE
## 
## $details$deliverable
## [1] FALSE
## 
## $details$accept_all
## [1] FALSE
## 
## $details$valid_mx
## [1] FALSE
## 
## $details$spoofable
## [1] FALSE
## 
## $details$spf_strict
## [1] TRUE
## 
## $details$dmarc_enforced
## [1] TRUE
## 
## $details$profiles
##  [1] "linkedin"  "angellist" "facebook"  "spotify"   "vimeo"    
##  [6] "instagram" "github"    "twitter"   "pinterest" "aboutme"

As we can see above, the function returns a list with a lot of different basic attributes like reputation and suspicious. It also returns some interesting attributes like data_breach – whether the email id was part of some data leak and profiles – the places / profiles on internet where the email id has appeared.

emailrep – use-case: Multiple IDs

As a Data Scientist, It’d be rare that you are dealing with single email ID for which the reputation could be simply found online. Our programming skills would play well when we’ve got to do this for a bulk of email ids.

Let’s try to find out if reptuation of about 3 IDs together and assigning the output in a dataframe so that it could be used for any further purpose like visualization.

# list of email ids

email_ids <- c("[email protected]", 
               "[email protected]",
               "[email protected]")

We’ll use purrr for a bit of functional programming (with map())

library(purrr)
library(magrittr)

result_df <- map(email_ids, email_rep) %>%
  map_df(., magrittr::extract, c("email","reputation","suspicious"))

result_df
## # A tibble: 3 x 3
##   email                     reputation suspicious
##   <chr>                     <chr>      <lgl>     
## 1 [email protected]       medium     TRUE      
## 2 [email protected]   none       TRUE      
## 3 [email protected] high       FALSE

Summary

Thus, we learnt how to use emailrep to bulk identify reptuation and other such risk attributes of email ids. This should help in Data Security, Validating email for Email Marketing and in Salesforce Automation and many other instances depending upon your area of business.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)