The first stable version of the new mongolite package has appeared on CRAN. Mongolite builds on jsonlite to provide a simple, high-performance MongoDB client for R, which makes storing and accessing small or large data as easy as converting it ...

There’s been lots of buzz about “statebin” maps of late. A recent tweet by @andrewxhill referencing work by @dannydb pointed to a nice shapefile that ends up being a really great way to handle statebin maps (and I feel like a fool for not considering it for a more generic solution earlier). Here is the

by Joseph Rickert A strong case can be made that base R graphics supplemented with either the lattice library or ggplot2 for plotting by subgroups provides everything a statistician might need for both exploratory data analysis and for developing clear, crisp for communicating results. However, it is abundantly clear that web based graphics, driven to a large extent by...

I’ll be the first to admit that the topic of plotting ordination results using ggplot2 has been visited many times over. As is my typical fashion, I started creating a package for this purpose without completely searching for existing solutions. Specifically, the ggbiplot and factoextra packages already provide almost complete coverage of plotting results from

Computerworld's Sharon Machlis published today a very useful list of R packages that every R user should know. The list covers packages for data import, data wrangling, data visualization and package development, but for beginning R users the biggest challenge is usually just dealing with data. To that end, I thought it was worth listing the package for data...

We better keep an eye on this one: she is tricky (Michael Banks, talking about Mary Poppins) Professor Bertrand teaches Simulation and someday, ask his students: Given a circumference, what is the probability that a chord chosen at random is longer than a side of the equilateral triangle inscribed in the circle? Since they must reach the … Continue reading...

Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster model is developed, one question arises: How can I describe my model? Here we present a way to approach this question, through the implementation of Coordinate Plot in R...

I have put together some basic material on survival analysis. It is available as: .html document with highlighted syntax here. Printer-ready .pdf document here. GitHub repository with all the source files here. Main motivation was that I wanted to learn the basics myself; also, it's tricky to find simple examples of survival models fitted in ... more

We will be showcasing our RMOA package at the next R User conference in Aalborg.
For the R users who are unfamiliar with streaming modelling and want to be ahead of the Gartner Hype cycle or want to evaluate existing streaming machine learning models, RMOA allows to build, run and evaluate streaming classification models which are built in

Join RStudio Chief Data Scientist Hadley Wickham at the University of Illinois at Chicago, on Wednesday May 27th & 28th for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers. As of this post, the workshop is two-thirds sold out. If you’re in or near Chicago

I was recently asked to write a survey on copulas for financial time series. The paper is, so far, unfortunately, in French, and is available on https://hal.archives-ouvertes.fr/. There is a description of various models, including some graphs and statistical outputs, obtained from read data. To illustrate, I’ve been using weekly log-returns of (crude) oil prices, Brent, Dubaï and Maya....

Introduction Recently I found a good introduction to the Shelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil). The model follows an ABM approach to simulate how urban segregation can be explained. I will concentrate on the R-code, if you want to know more about the Shelling-Segregation Model (which brought

by Gregory Vandenbrouck Software Engineer at Microsoft This post is the first in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux. We’ll start with a relatively simple case of pulling data from SQL Azure to an R client on...

I think you’ll agree with me if I say: It’s HARD to know whether to use Python or R for data analysis. And this is especially true if you’re a newbie data analyst looking for the right language to start with. It turns out that there are many good resources that can help you to figure out the The post

IntroductionIn this post I am going to show how to extract data from web pages in table format, transform these data into spatial objects in R and then plot them in maps.ProcedureFor this project we need the following two packages: XML and raster.The first package is used to extract data from HTML pages, in particular from the sections marked...

In my previous post I discussed how Longley-Cook, an actuary at an insurance company in the 1950's, used Bayesian reasoning to estimate the probability for a mid-air collision of two planes.Here I will use the same model to get started with Stan/RStan, a probabilistic programming language for Bayesian inference. Last week my prior was given as...

At this post i will show you how to deploy Shiny Apps easily with a simple git push. But, what’s a git push? I’m referring to the git command used with remote repositories. With this command you can deploy apps easily with a PaaS (Platform as a Service) like Heroku. If you never heard about The post

An elastic infrastructure for distributed R Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response

KDnuggets is once again running its annual poll of data science software tools, now in its 16th year. If you'd like to participate, visit the KDnuggets poll page and answer the question, "What Predictive Analytics, Data Mining, Data Science software/tools you used in the past 12 months?". The poll allows you to select up to 20 tools from the...

Devtools 1.8 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. You can learn more about developing packages at http://r-pkgs.had.co.nz/. Get the latest version of devtools with: install.packages("devtools") There are three main improvements: More helpers to get you