I was chatting with some cyber-mates at a recent event and the topic of cyber attacks on the U.S. power-grid came up (as it often does these days). The conversation was brief, but the topic made its way into active memory and resurfaced when I saw today’s Data ...
Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically ... [Read more...]
The secret is out: Nina Zumel and I are busy working on Practical Data Science with R2, the second edition of our best selling book on learning data science using the R language. Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here: We ...
Here is the course link.
Course Description
As a data scientist, you will often find yourself working with non-numerical data, such as job titles, survey responses, or demographic information. This type of data is qualitative and can be ordinal, if... [Read more...]
Here is the course link.
Course Description
In this course, you'll learn to work with data using tools from the tidyverse in R. By data, we mean your own data, other people's data, messy data, big data, small data - any data with rows and columns that comes your way! ... [Read more...]
Here is the course link.
Course Description
In this course, you will learn to model with data. Models attempt to capture the relationship between an outcome variable of interest and a series of explanatory/predictor variables. Such models can be used for both explanatory purposes, e.g. "Does knowing professors' ... [Read more...]
As I spent an exciting day at satRday conference in Amsterdam I would like to share with you my thoughts. TL;DR: Event was cool, my presentation went fine ???? General Wow, I don’t remember an event where I had so much fun. From most of the talks, I took ... [Read more...]
The AUC* or concordance statistic c is the most commonly used measure for diagnostic accuracy of quantitative tests. It is a discrimination measure which tells us how well we can classify patients in two groups: those with and those without the outcome of interest. Since the measure is based on ... [Read more...]
Background Information Brands like Rolex employ a number of methods to maintain their position as a luxury watchmaker. They are selective about the brand ambassadors they hire, the location and decoration of flagship stores, the events they sponsor, and, most importantly, the price tag of the watches. With such effort, ... [Read more...]
For my PhD project, I want to use Supervised Machine Learning (SML) to replicate my manual coding efforts onto a larger data set.
That means, however, that I need to put in some manual coding effort before the SML algorithms can do their magic!
I used ...
For my PhD project, I want to use Supervised Machine Learning (SML) to replicate my manual coding efforts onto a larger data set.
That means, however, that I need to put in some manual coding effort before the SML algorithms can do their magic!
I used ...
For my PhD project, I want to use Supervised Machine Learning (SML) to replicate my manual coding efforts onto a larger data set. That means, however, that I need to put in some manual coding effort before the SML algorithms can do their magic!
I used ... [Read more...]
Today, we will look at the GDP data that is released every quarter or so by the Bureau of Economic Analysis (BEA), and get familiar with the BEA API (see the documentation here). For a primer on GDP in general, BEA publishes this guide.
To access the BEA API, we ...
This blogpost announces the release of the udpipe R package version 0.7 on CRAN. udpipe is an R package which does tokenization, parts of speech tagging, lemmatization, morphological feature tagging and dependency parsing. It's main feature is that it is a lightweight R package which works on more than 50 languages and ... [Read more...]
Azure HDInisght was recently updated with version 9.3 of ML Services in HDInsight, which provides integration with R and Python. In particular, it makes it possible to run R and Python within HDInsight's managed Spark instance. The integration provides: R and Python support, with interaction via Visual Studio, VS Code, or ... [Read more...]
Is your deep convolutional network misclassifying images? You can find out why with a heatmap of class activation overlaid on its misclassified pictures.
A heatmap overlay shows parts of an image most activated in a neural network’s last convolution...
When modeling frequency outcomes, we often need to go beyond the standard Poisson regression due to the strict distributional assumption and to consider more flexible alternatives. In general, there are two broad categories of modeling approaches in light of practical concerns about frequency outcomes. The first category of models are ...
A first update to the AsioHeaders package arrived on CRAN today. Asio provides a cross-platform C++ library for network and low-level I/O programming. It is also included in Boost – but requires linking when used as part of Boost. This standalone v... [Read more...]
I won’t write a very long introduction; we all know that Excel is ubiquitous in business, and that
it has a lot of very nice features, especially for business practitioners that do not know any
programming. However, when people use Excel for purposes it was not designed for, it ... [Read more...]