Big Data Insights: Tale of IT Investments and Returns

July 11, 2016
By

(This article was first published on Coastal Econometrician Views, and kindly contributed to R-bloggers)

Once again, this post brings forth to the audience, a predictive analytical insight from huge volumes of information technology security data belonging to two fortune 500 companies (more or less having similar characteristics). Going to a quick background of the study, here, analytical interest was to know how both organizations understood and invested in their IT Security over a period of time and what was their ROI (Return on Investment)?

With respect to my earlier Big Data Insight post, I got many queries about data, hence, herein, I am publishing data used for plotting purposes, for quick play in R. As, just mentioned above, volumes were huge, and all initial volumes were processed in Apache Spark stack in cloud environment. Now, as usual, below analysis has been carried out using R Programming Language components viz., R-3.3.1, RStudio (favorite IDE), ggplot2 package for plotting.

Now, lets understand the below plot, x-axis has ‘year’ as measure that ranges from 1999 to 2015, y-axis has numbers observed for major threats and IT Security employees at both the organizations (Org). If one starts looking at the year 2000, it is evident that Org A has more threats than Org B, however, both organizations had their number of IT Security employees around 10 (Org A have only few more employees compared to Org B, also, it is clear that Org B has one more employee than Org A in earlier year 1999). But, Org A for next 2-3 years has increased its IT Security employess to 20 in number, where as Org B has more or less maintained same number of employees for next set of 10 years. As a result, Org B has reached a stage wherein their number of major threats exploded and went beyond existing teams control, whereas, Org A initial invesment in employees worked out better for them and their number of major threats were more or less either stable or decreased over a period of time (don’t forget, here acheiving zero is impossible given new technologies, applications coming every year).

Data employed for the plot:

dput(IT_threats_returns)
structure(list(Year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000,
2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2003, 2003,
2003, 2003, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2006,
2006, 2006, 2006, 2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008,
2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010, 2011, 2011, 2011,
2011, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014,
2014, 2014, 2015, 2015, 2015, 2015), Numeric_Value = c(28, 11,
9, 10, 36, 26, 13, 7, 28, 26, 17, 9, 26, 29, 21, 10, 32, 21,
19, 9, 25, 34, 19, 10, 30, 35, 20, 10, 22, 27, 19, 10, 31, 42,
19, 11, 29, 47, 19, 11, 28, 45, 22, 11, 25, 55, 23, 13, 30, 51,
21, 14, 25, 49, 22, 13, 32, 60, 22, 19, 25, 53, 25, 24, 19, 49,
25, 29), Desc = c("Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps",
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats",
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps"
)), .Names = c("Year", "Numeric_Value", "Desc"), row.names = c(NA,
68L), class = "data.frame")

# code used for plotting

library(ggplot2)
p <- ggplot(IT_threats_returns, aes(x=Year, y=Numeric_Value, col=Desc)) + geom_line(linetype=5, size=1) + theme_light() + theme(legend.position="none") + ylab("") + xlab("")
p + annotate("text", x=c(2012, 2012, 2004.5, 2012.5), y=c(47,34,18,10.5), label=c(" `Org_B` : No_of_Major_Threats", " `Org_A` : No_of_Major_Threats", " `Org_A` : No_of_IT_Security_Emps", " `Org_B` : No_of_IT_Security_Emps"), col=c("#C77CFF", "#7CAE00", "#F8766D", "#00BFC4"))

To leave a comment for the author, please follow the link and comment on their blog: Coastal Econometrician Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)