Keeping Up with Your Data Science Options

April 12, 2017

(This article was first published on R –, and kindly contributed to R-bloggers)

The field of data science is changing so rapidly that it’s quite hard to keep up with it all. When I first started tracking The Popularity of Data Science Software in 2010, I followed only ten packages, all of them classic statistics software. The term data science hadn’t caught on yet, data mining was still a new thing. One of my recent blog posts covered 53 packages, and choosing them from a list of around 100 was a tough decision!

To keep up with the rapidly changing field, you can read the information on a package’s web site, see what people are saying on blog aggregators such as or, and if it sounds good, download a copy and try it out. What’s much harder to do is figure out how they all relate to one another. A helpful source of information on that front is the book Disruptive Analtyics, by Thomas Dinsmore.

I was lucky enough to be the technical reviewer for the book, during which time I ended up reading it twice. I still refer to it regularly as it covers quite a lot of material. In a mere 262 pages, Dinsmore manages to describe each of the following packages, how they relate to one another, and how they fit into the big picture of data science:

  • Alluxio
  • Alpine Data
  • Alteryx
  • Apex
  • Arrow
  • Caffe
  • Cloudera
  • Deeplearning4J
  • Drill
  • Flink
  • Giraph
  • Hadoop
  • HAWQ
  • Hive
  • IBM SPSS Modeler
  • Ignite
  • Impala
  • Kafka
  • KNIME Analytics Platform
  • Kylin
  • MADLib
  • Mahout
  • MapR
  • Microsoft R Aerver
  • Phoenix
  • Pig
  • Python
  • R
  • RapidMiner
  • Samza
  • SAS
  • Skytree Server
  • Spark
  • Storm
  • Tajo
  • Tensorflow
  • Tez
  • Theano
  • Trafodion

As you can tell from the title, a major theme of the book is how open source software is disrupting the data science marketplace. Dinsmore’s blog, ML/DL: Machine Learning, Deep Learning, extends the book’s coverage as data science software changes from week to week.

I highly recommend both the book and the blog. Have fun keeping up with the field!

To leave a comment for the author, please follow the link and comment on their blog: R – offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)