## (Linear Algebra) Do not scale your matrix

June 2, 2017
By

In this post, I will show you that you generally don’t need to explicitly scale a matrix. Maybe you wanted to know more about WHY matrices should be scaled when doing linear algebra. I will remind about that in the beginning but the rest will focus on HOW to not explicitly scale matrices. We will apply our findings to...

## Hacking the principles of #openscience #workshops

June 2, 2017
By

In a previous post, I discussed the key elements that really stood out for me in recent workshops associated with open science, data science, and ecology. Summer workshop season is upon us, and here are some principles to consider that can be used to hack a workshop. These hacks can be applied a priori as an

## Teach kids about R with Minecraft

June 2, 2017
By

As I mentioned earlier this week, I was on a team at the ROpenSci Unconference (with Brooke Anderson, Karl Broman, Gergely Daróczi, and my Microsoft colleagues Mario Inchiosa and Ali Zaidi) to work on a project to interface the R language with Minecraft. The resulting R package, miner, is now available to install from Github. The goal of the...

## EARL Program Optimisation

June 2, 2017
By

Next week will be the first EARL conference in San Francisco.. The team here at Mango have worked hard to put together an excellent program with great speakers. However with so many interesting talks it can be hard to decide which to attend. For example, Rich Pugh is a huge fan of Shiny and would like to attend all...

## Weather forecast with regression models – part 1

June 2, 2017
By

In this tutorial we are going to analyse a weather dataset to produce exploratory analysis and forecast reports based on regression models. We are going to take advantage of a public dataset which is part of the exercise datasets of the “Data Mining and Business Analytics with R” book (Wiley) written by Johannes Ledolter. In Related PostWeighted Linear Support...

## The code (and other stuff…)

June 2, 2017
By

I've received a couple of emails or comments on one of the General Election posts to ask me to share the code I've used. In general, I think this is a bit dirty and lots could be done in a more efficient way \$-\$ effectively, I'm doing this out of my own curiosity and while I think the model is...

## Bringing Together People and Projects at Unconf17

June 2, 2017
By

We held our 4th annual unconference in Los Angeles, May 25-26, 2017. Scientists, R-software users and developers, and open data enthusiasts from academia, industry, government, and non-profits came together for two days to hack on projects they dreamed up and to give our online community an opportunity to connect in-person. The result? 70 people from 7 countries on 3 continents...

## Calculate Inflation with the blscrapeR Package

June 1, 2017
By

The Consumer Price Index (CPI) is the main standard for tracking the inflation of the U.S. dollar. The various CPI measures are published monthly by the Bureau of Labor Statistics. For this walk-through, we will be using the blcsrapeR package to downl...

## Calculate Inflation with the blscrapeR Package

June 1, 2017
By

The Consumer Price Index (CPI) is the main standard for tracking the inflation of the U.S. dollar. The various CPI measures are published monthly by the Bureau of Labor Statistics. For this walk-through, we will be using the blcsrapeR package to downl...

## A Shiny App for Exploring Commodities Prices and Economic Indicators, via Quandl

June 1, 2017
By

In a previous post, we created an R Notebook to explore the relationship between the copper/gold price ratio and 10-year Treasury yields (if you’re curious why we might care about this relationship, have a quick look at that previous post), relying on data from Quandl. Today, we’ll create a Shiny app that lets users choose which different commodities ratios...

## Python and R top 2017 KDnuggets rankings

June 1, 2017
By

The results of KDnuggets' 18th annual poll of data science software usage are in, and for the first time in three years Python has edged out R as the most popular software. While R increased its share of usage from 45.7% in last year's poll to 52.1% this year, Python's usage among data scientists increased even more, from 36.6%...

## Canada Labour Market: Future Perspectives

June 1, 2017
By

For anyone looking for job opportunities, it is nice to have an idea how the job market will perform in the future for your chosen career or industry. Many countries have open data sets that offer this kind of data. In these exercises we will use R to analyze the future perspective of Canadian labour Related exercise sets:Reshape 2...

## In defense of wrapr::let()

June 1, 2017
By

Saw this the other day: In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say: let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later … Continue reading In...

## A Primer in functional Programming in R (part -2)

June 1, 2017
By

In the last exercise, We have seen how powerful functional programming principles can be and how it can drammatically increase the readablity of the code and how easily you can work with them .In this set of exercises we will look at functional programming principles with purrr.Purrr comes with a number of interesting features and Related exercise sets:Higher Order...

## A tidy model pipeline with twidlr and broom

June 1, 2017
By

@drsimonj here to show you how to go from data in a data.frame to a tidy data.frame of model output by combining twidlr and broom in a single, tidy model pipeline.  The problem Different model functions take different types of inputs (data.frames, matrices, etc) and produce different types of output! Thus, we’re often confronted with the very untidy challenge presented in...

## Correcting bias in meta-analyses: What not to do (meta-showdown Part 1)

June 1, 2017
By

tl;dr: Publication bias and p-hacking can dramatically inflate effect size estimates in meta-analyses. Many methods have been proposed to correct for such bias and to estimate the underlying true effect. In a large simulation study, we found out which methods do not work well under which conditions, and give recommendations what not to use. Estimated The post Correcting bias...

## Simple bash script for a fresh install of R and its dependencies in Linux

June 1, 2017
By

- I’ve been working with Linux for some time but always in a dual boot setup. While I used Linux-Mint at home, my university computer always had Windows. Most of the times it was not a problem...

## A Partial Remedy to the Reproducibility Problem

May 31, 2017
By

Several years ago, John Ionnidis jolted the scientific establishment with an article titled, “Why Most Published Research Findings Are False.” He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues … Continue reading A...

## U.S. Residential Energy Use: Machine Learning on the RECS Dataset

May 31, 2017
By

Contributed by Thomas Kassel. He is currently enrolled in the NYC Data Science Academy remote bootcamp program taking place from January-May 2017. This post is based The post U.S. Residential Energy Use: Machine Learning on the RECS Dataset appeared first on NYC Data Science Academy Blog.

## Complete Subset Regressions, simple and powerful

May 31, 2017
By
$Complete Subset Regressions, simple and powerful$

By Gabriel Vasconcelos The complete subset regressions (CSR) is a forecasting method proposed by Elliott, Gargano and Timmermann in 2013. It is as very simple but powerful technique. Suppose you have a set of variables and you want to forecast … Continue reading →

## Euler Problem 23: Non-Abundant Sums

May 31, 2017
By

A solution in the R language to Euler Problem 23. Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers. Continue reading → The post Euler Problem 23: Non-Abundant Sums appeared first on The Devil is in the Data.

## Mapping County Unemployment with blscrapeR

May 31, 2017
By

The blscrapeR package makes it easy to produce choropleth maps of various employment and unemployment rates from the Bureau of Labor Statistics (BLS.) It’s easy enough to pull a metric for a certain county. The code below pulls the unemployment rates...

## My new DataCamp course: Forecasting Using R

May 31, 2017
By

For the past few months I’ve been working on a new DataCamp course teaching Forecasting using R. I’m delighted that it is now available for anyone to do. Course blurb Forecasting involves making predictions about the future. It is required in man...

## Calculate Wages and Benefits with blscrapeR

May 31, 2017
By

The most difficult thing about working with BLS data is gaining a clear understanding on what data are available and what they represent. Some of the more popular data sets can be found on the BLS Databases, Tables & Calculations website. The selec...

## Conditional Generative Adversarial Network with MXNet R package

May 31, 2017
By

This tutorial shows how to build and train a Conditional Generative Adversarial Network (CGAN) on MNIST images. How GAN works A Generative Adversarial Model simultaneously trains two models: a generator that learns to output fake samples from an unkn...

## Conditional Generative Adversial Network with MXNet R package

May 31, 2017
By

This tutorial shows how to build and train a Conditional Generative Adversial Network (CGAN) on MNIST images. How GAN works A Generative Adversial Model simultaneously trains two models: a generator that learns to output fake samples from an unknown ...

## To eat or not to eat! That’s the question? Measuring the association between categorical variables

May 31, 2017
By

1. Introduction I serve as a reviewer to several ISI and Scopus indexed journals in Information Technology. Recently, I was reviewing an article, wherein the researchers had made a critical mistake in data analysis. They converted the original categor...

## let there be progress

May 31, 2017
By

The 'wrapr'package for use with dplyr programming - UPDATED POST I’m the first to admit I’m not an R expert, (even duffers can blog about it though), but when I began thinking about writing some dplyr functions to help me create and anal...

## Shiny: data presentation with an extra

May 31, 2017
By

Shiny is an application based on R/RStudio which enables an interactive exploration of data through a dashboard with drop-down lists and checkboxes—programming-free. The apps can be useful for both the data analyst and the public. Shiny apps are based on the Internet: This allows for private consultation of the data on one’s own browser as … Continue reading Shiny:...