## Which science is all around? #BillMeetScienceTwitter

May 19, 2017
By

I’ll admit I didn’t really know who Bill Nye was before yesterday. His name sounds a bit like Bill Nighy’s, that’s all I knew. But well science is all around and quite often scientists on Twitter start interesting campaigns. Remember the #actua...

## New R Users group in Münster!

May 19, 2017
By

This is to announce that Münster now has its very own R users group! If you are from the area, come join us (or if you happen to know someone who is and who might be interested, please forward the info). You can find us on meetup.com: https://ww...

## A Primer in Functional Programming in R Exercises (Part – 1)

May 19, 2017
By

In the exercises below we cover the basics of functional programming in R( part 1 of a two series exercises on functional programming) . We consider recursion with R , apply family of functions , higher order functions such as Map ,Reduce,Filter in R . Answers to the exercises are available here. If you obtained Related exercise sets:

## Text Mining with R: A Tidy Approach

About the book This book applies tidy data principles to text analysis. The aim is to present tools to make many text mining tasks easier, more effective, and consistent with tools already in use, and in particular it presents the tidytext R pack...

## AzureDSVM: a new R package for elastic use of the Azure Data Science Virtual Machine

May 19, 2017
By

by Le Zhang (Data Scientist, Microsoft) and Graham Williams (Director of Data Science, Microsoft) The Azure Data Science Virtual Machine (DSVM) is a curated VM which provides commonly-used tools and software for data science and machine learning, pre-installed. AzureDSVM is a new R package that enables seamless interaction with the DSVM from a local R session, by providing functions...

## R/Finance 2017 livestreaming today and tomorrow

May 19, 2017
By

If you weren't able to make it to Chicago for R/Finance, the annual conference devoted to applications of R in the financial industry, don't fret: the entire conference is being livestreamed (with thanks to the team at Microsoft). You can watch the proceedings at aka.ms/r_finance, and recordings will be available at the same link after the event. Check out...

## Improving automatic document production with R

May 19, 2017
By

In this post, I describe the latest iteration of my automatic document production with R. It improves upon the methods used in Rtraining, and previous work on this topic can read by going to the auto deploying R documentation tag. The post Improving automatic document production with R appeared first on Locke Data. Locke Data are...

## How to interpret correspondence analysis plots (it probably isn’t the way you think)

May 19, 2017
By

Correspondence analysis is a popular data science technique. It takes a large table, and turns it into a seemingly easy-to-read visualization. Unfortunately, it is not quite as easy to read as most people assume. In How...

## 2017-01 Variable-width lines in R

May 18, 2017
By

This document describes the ‘vwline’ package, which provides an R interface for drawing variable-width curves. The package provides functions to draw line segments through a set of locations, or a smooth curve relative to a set of control points, with the width of the line allowed to vary along the length of the line. Paul … Continue...

## Review of Efficient R Programming

May 18, 2017
By

In the crowded market space of data science and R language books, Lovelace and Gillespie’s Efficient R Programming (2016) stands out from the crowd. Over the course of ten comprehensive chapters, the authors address the primary tenets of developing efficient R programs. Unless you happen to be a...

## A Note on on.exit()

I have used on.exit() for several years, but it was not until the other day that I realized a very weird thing about it: you’d better follow the default positions of its arguments expr and add, i.e., the first argument has to be expr and the second has to be add. on.exit(expr = NULL, add = FALSE) If you do...

## continental divide

May 18, 2017
By

While the Riddler puzzle this week was anticlimactic,  as it meant filling all digits in the above division towards a null remainder, it came as an interesting illustration of how different division is taught in the US versus France: when I saw the picture above, I had to go and check an American primary school

May 18, 2017
By

This set of exercises will help you to help you improve your skills with character functions in R. Most of the exercises are related with text mining, a statistical technique that analyses text using statistics. If you find them interesting I would suggest checking the library tm, this includes functions designed for this task. There Related exercise sets:

## shinydashboard 0.6.0

Shinydashboard 0.6.0 is now on CRAN! This release of shinydashboard was aimed at both fixing bugs and also bringing the package up to speed with users’ requests and Shiny itself (especially fully bringing bookmarkable state to shinydashboard’s sidebar). In addition to bug fixes and new features, we also added a new “Behavior” section to the

## New EARL Conference app

May 18, 2017
By

Get the most out of the EARL Conference with the new phone app, available on iTunes and Google play. View the agenda, speakers, and sponsors in the palm of your hand. Bookmark the sessions you’d like to attend and rate … Continue reading →

## On indexing operators and composition

May 18, 2017
By

In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example of permutation groups or semigroups: some … Continue...

## Unsupervised Learning and Text Mining of Emotion Terms Using R

May 18, 2017
By

Unsupervised learning refers to data science approaches that involve learning without a prior knowledge about the classification of sample data. In Wikipedia, unsupervised learning has been described as “the task of inferring a function to describe hidden structure from ‘unlabeled’ data (a classification of categorization is not included in the observations)”. The overarching objectives of Related Post

## x + x is not 2x

May 18, 2017
By
$x + x is not 2x$

A few days ago, Joel Courtheyn posted the following issue in the errors package repository on GitHub: Experimenting with the new package I detected a difference in calculation of the error depending on the way a formula was written. Originally I tried to calculate the error for z1 <- (x^3 - 2y)/x^0.5 but this gave me… Continuar leyendo...

## Generating metropolitan subsets of Census data with R and tigris

May 18, 2017
By

Need help working with Census data in your project? Contact me at [email protected] to discuss consulting support or a training workshop! Commonly, studies that use US Census data focus on topics at the scale of the metropolitan area. However, subsetting Census geographic data by metropolitan area is not always straightforward. Such a workflow for Census tracts might...

## Euler Problem 21: Amicable Numbers

May 17, 2017
By

A solution in the R language to Euler Problem 21 which asks to evaluate the sum of all the amicable, or friendly, numbers under 10000. Continue reading → The post Euler Problem 21: Amicable Numbers appeared first on The Devil is in the Data.

## How I Find, Manage, and Use GIFs

A few months ago Jenny wanted me (and Karthik, if I remember correctly) to share some experience with GIFs. I have been busy with writing the blogdown book recently and don’t really have much time, so I’m going to write a quick post just to take a short break. I may expand this post in the future. First...

## An Introduction to Spatial Data Analysis and Visualization in R

May 17, 2017
By

The Consumer Data Research Centre, the UK-based organization that works with consumer-related organisations to open up their data resources, recently published a new course online: An Introduction to Spatial Data Analysis and Visualization in R. Created by James Cheshire (whose blog Spatial.ly regularly features interesting R-based data visualizations) and Guy Lansley, both of University College London Department of Geography,...

May 17, 2017
By

Creating the experimental design for a max-diff experiment is easy in R. This post describes how to create and check a max-diff experimental design. If you are not sure what this is, it would be best to read A beginner's guide to max-diff first.

## Training Neural Networks with Backpropagation. Original Publication.

May 17, 2017
By

Neural networks have been a very important area of scientific study that has evolved by different disciplines such as mathematics, biology, psychology, computer science, etc.The study of neural networks leapt from theory to practice with the emergence of computers.Training a neural network by adjusting the weights of the connections is computationally very expensive so its...

## Introduction to copulas Exercises (Part-2)

May 17, 2017
By

Copulas are a powerful statistical tool commonly used in the finance sector to generate samples from a given multivariate joint distribution. The principal advantage of using those types of function over other methods is that copulas describe the multivariate joint distribution as his margin and the dependence structure between them, which give the user the Related exercise sets:

## R-Lab #3: A Shiny for the Milano Municipality Budget Data | Milan, May 27th

May 17, 2017
By

We are ready for the third R-Lab, the monthly appointment where we co-work together on a real data science problem using R. This time the R-Lab is promoted by nothing but the Assessorato alla Partecipazione, Cittadinanza Attiva e Open Data of the Comune di Milano! We will access their municipality budget data, and use one day of joint work...

## xts Cheat Sheet: Time Series in R

May 17, 2017
By

Even though the data.frame object is one of the core objects to hold data in R, you'll find that it's not really efficient when you're working with time series data. You'll find yourself wanting a more flexible time series class in R that offers a variety of methods to manipulate your data. xts  or the Extensible Time Series is one of...

## Count models in JAGS

May 17, 2017
By

Looks like I’ll be diving into some Bayesian analyses using JAGS. This post is primarily intended as a collection of links to useful information, but also includes a few initial thoughts (I might update it occasionally with new links). In terms of R packages, a very brief play suggests that R2jags is more user

## GSoC 2017 : Biodiversity data cleaning

May 17, 2017
By

By Ashwin Agrawal URL of the Project Idea: https://github.com/rstats-gsoc/gsoc2017/wiki/Biodiversity-data-cleaning Introduction There are an increasing number of scientists using R for their data analyses, however, the skill set required to handle biodiversity data in R, is considerably varies. Since, users need to retrieve, manage and assess high volume data with complex structure (Darwin Core standard, DwC); only