## RcppEigen 0.3.2.9.1

March 15, 2017
By

A new maintenance release 0.3.2.9.1 of RcppEigen, still based on Eigen 3.2.9 is now on CRAN and is now going into Debian soon. This update ensures that RcppEigen and the Matrix package agree on their #define statements for the CholMod / SuiteSparse l...

## Plotly for R workshop at Plotcon 2017

March 15, 2017
By

Carson Sievert, the lead developer of the Plotly package for R will be hosting a workshop at https://plotcon.plot.ly/. Here’s an outline of the material he will be covering during the workshop. More details here. The workshop will be based on Carson’s Plotly for R book. Broad Topic Details A tale of 2 interfaces Converting ggplot2

## Unit Testing in R

March 14, 2017
By

Software testing describes several means to investigate program code regarding its quality. The underlying approaches provides means to handle errors once they occur. Furthermore, software testing also show techniques to reduce the probability of that. R is becoming a increasingly promiment programming language. This not only includes pure statistical settings but also machine learning, dashboards … Continue...

## How Reproducible Data Analysis Scripts Can Help You Route Around Data Sharing Blockers

March 14, 2017
By

For aaaagggggggeeeeeeesssssss now, I’ve been wittering on about how just publishing “open data” is okay insofar as it goes, but it’s often not that helpful, or at least, not as useful as it could be. Yes, it’s a Good Thing when a dataset is published in support of a report; but have you ever tried

March 14, 2017
By

Happy pi day!

March 14, 2017
By

## How to choose a project to practice data science

March 14, 2017
By

Projects can be great for mastering data science, but you have to choose your projects carefully. This article will give you tips on how to choose a project that's appropriate for your skill level (and tell you some pitfalls to watch out for). The post How to choose a project to practice data science appeared first on

## FSelectorRcpp on CRAN

March 14, 2017
By

FSelectorRcpp - Rcpp (free of Java/Weka) implementation of FSelector entropy-based feature selection algorithms with a sparse matrix support, has finally arrived on CRAN after a year of development. It is also equipped with a parallel backend. Big th...

## 2D contours of several penalty functions in statistics as GIF images

March 13, 2017
By

Many statistical modeling problems reduce to a minimization problem of the general form: or where $f$ is some type of loss function, $\mathbf{X}$ denotes the data, and $g$ is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the way). Of course both, $f$ and $g$, may depend on further...

## Jobs for “Data Science” Up 7-fold, for “Statistician” Down by Half

March 13, 2017
By

The Bureau of Labor Statistics projects that jobs for statisticians will grow by 34% between 2014 and 2024. However, according to the nation’s largest job web site, the number of companies looking for “statisticians” is actually in sharp decline. Those … Continue reading →

## How to Start an R Project

March 13, 2017
By

R is the most widely used programming language in data analysis and data mining. When you first get started with R it can get a little but intimidating if you are a newbie, and sometimes even for statistics pros as the syntax can be a little bit new.There are several ways you can access R. You can install...

## Writing manuscripts in Rstudio, easy citations

March 13, 2017
By

Intro and setup This is a simple explanation of how to write a manuscript in RStudio. Writing a manuscript in RStudio is not ideal, but it has gotten better over time. It is now relatively easy to add citations to documents in RStudio. **The goal...

## Benchmarking rxNeuralNet for OCR

March 13, 2017
By

The MicrosoftML package introduced with Microsoft R Server 9.0 added several new functions for high-performance machine learning, including rxNeuralNet. Tomaz Kastrun recently applied rxNeuralNet to the MNIST database of handwritten digits to compare its performance with two other machine learning packages, h2o and xgboost. The results are summarized in the chart below: In addition to having the best performance...

## EARL abstract submissions close in two weeks

March 13, 2017
By

If you’ve been doing exciting things in your organisation with the help of R and you’ve been thinking about submitting an abstract, now’s the time to put yours together! Based on the abstracts received so far, EARL Conferences this year … Continue reading →

## Upcoming Talk: 2017 ACS Data Users Conference

March 13, 2017
By

My abstract was recently accepted for the 2017 ACS Data Users Conference, and on May 12 I will be giving a talk titled Mapping ACS Data in... The post Upcoming Talk: 2017 ACS Data Users Conference appeared first on AriLamstein.com.

## On Programming Languages; Why My Dad Went From Programming to Driving a Bus

March 13, 2017
By

I discuss, with stories, why programming language choice matters and provide guidelines, along with an infographic of popular 2017 languages.

## Release mongolite 1.0

March 13, 2017
By

After 2.5 years of development, version 1.0 of the mongolite package has been released to CRAN. The package is now stable, well documented, and will soon be submitted for peer review to be onboarded in the rOpenSci suite. MongoDB in R and mongolite I started working on mongolite in September 2014, and it was first announced at the rOpenSci

## 3rd Birthday of Warsaw R Enthusiasts Group

March 13, 2017
By

This Thursday Warsaw R Enthusiasts Group (in polish Spotkania Entuzjastów R - SER - cheese) celebrated it’s 3rd birthday! Check this post to find out what we have prepared for this special occasion. Summary of three years of the group activity ...

## Predicting RentHop Apartment Listings in the Two Sigma Kaggle Competition

March 12, 2017
By

Team POWER CHROME AMAZING Johnna Ayres    Bill Best    Mark Fridson    Trent Jerde    Marshall Yi   Introduction Ever searched for an apartment and The post Predicting RentHop Apartment Listings in the Two Sigma Kaggle Competition appeared first on NYC Data Science Academy Blog.

## First release of mlrMBO – the toolbox for (Bayesian) Black-Box Optimization

March 12, 2017
By

We are happy to finally announce the first release of mlrMBO on cran after a quite long development time. For the theoretical background and a nearly complete overview of mlrMBOs capabilities you can check our paper on mlrMBO that we presubmitted to arxiv. The key features of mlrMBO are: Global optimization of expensive Black-Box functions. Mulit-Criteria Optimization. Parallelization...

## Tic Tac Toe Simulation — Random Moves

March 12, 2017
By

Tic Tac Toe might be a futile children’s game but it can also teach us about artificial intelligence. Tic Tac Toe, or Naughts and Crosses, is a zero-sum game with perfect information. Both players know exactly what the other did and when nobody … Continue reading → The post Tic Tac Toe Simulation — Random Moves appeared first on...

## satRday Cape Town

March 12, 2017
By

SA's first R conference - Almost a full month has passed since satRday Cape Town. Time races on (as always) bringing plenty of challenges and opportunities with each new day, but part of me wants to go back and re-live the experience, just to have some more time to take it all...

## Unit testing in R using testthat library Exercises

March 12, 2017
By

testthat is a testing framework developed by Hadley Wickham, which makes unit testing easy for developers. Test scripts developed can be re-run after debugging or making changes to the functions without the hassle of developing the code for testing again. testthat has a heirarchical structure made up of expectations, tests and contexts. Visit this link Related exercise sets:

## RSentiment

March 12, 2017
By

Every system needs continuous improvement. Feedback, positive or negative, plays an important role in that improvement. Humans are fairly instinctive in interpreting the tone of the feedback. But, to teach a machine to understand the same, is highly complex. Various algorithms and tools are available today to automatically identify and categorize opinions of any textual feedback. The

## Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

March 11, 2017
By

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described. We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody … Continue...

## Introducing the PWFSLSmoke Package

March 11, 2017
By

Mazama Science has just released the PWFSLSmoke package. Source code is available on GitHub. Here is the package description: Utilities for working with air quality monitoring data with a focus on small particulates (PM2.5) generated by wildfire smoke. Functions are provided for …   read more ...

## Peter Lee (1940?-2017)

March 11, 2017
By

Just heard the sad news that Peter Lee, British Bayesian and author of Bayesian Statistics: An Introduction, has passed away yesterday night. While I did not know him, I remember meeting him at a few conferences in the UK and spending an hilarious evening at the pub. When the book came out, I thought it

## Native support for candlestick charts in Plotly and R

March 11, 2017
By

Plotly.js now supports candlestick charts as a chart-type and in this post we’ll highlight how to use this feature in R. We’ll use the quantmod package to retrieve data as well as generate some technical trading signals. Don’t forget to install the latest version of plotly from github: Syntax The syntax is straightforward: