Getting started with MongoDB in R

May 14, 2015
By
Getting started with MongoDB in R

The first stable version of the new mongolite package has appeared on CRAN. Mongolite builds on jsonlite to provide a simple, high-performance MongoDB client for R, which makes storing and accessing small or large data as easy as converting it ...

Read more »

GeoJSON Hexagonal “Statebins” in R

May 14, 2015
By
GeoJSON Hexagonal “Statebins” in R

There’s been lots of buzz about “statebin” maps of late. A recent tweet by @andrewxhill referencing work by @dannydb pointed to a nice shapefile that ends up being a really great way to handle statebin maps (and I feel like a fool for not considering it for a more generic solution earlier). Here is the

Read more »

A first look at htmlwidgets

May 14, 2015
By

by Joseph Rickert A strong case can be made that base R graphics supplemented with either the lattice library or ggplot2 for plotting by subgroups provides everything a statistician might need for both exploratory data analysis and for developing clear, crisp for communicating results. However, it is abundantly clear that web based graphics, driven to a large extent by...

Read more »

Reinventing the wheel for ordination biplots with ggplot2

May 14, 2015
By
Reinventing the wheel for ordination biplots with ggplot2

I’ll be the first to admit that the topic of plotting ordination results using ggplot2 has been visited many times over. As is my typical fashion, I started creating a package for this purpose without completely searching for existing solutions. Specifically, the ggbiplot and factoextra packages already provide almost complete coverage of plotting results from

Read more »

Computerworld’s list of R packages for data wrangling

May 13, 2015
By

Computerworld's Sharon Machlis published today a very useful list of R packages that every R user should know. The list covers packages for data import, data wrangling, data visualization and package development, but for beginning R users the biggest challenge is usually just dealing with data. To that end, I thought it was worth listing the package for data...

Read more »

What is Data Science? Can Topic Modeling Help?

May 13, 2015
By
What is Data Science? Can Topic Modeling Help?

Predictive analytics often serves as an introduction to data science, but it may not be the best exemplar given its long history and origins in statistics. David Blei, on the other hand, struggles to define data science through his work on topic modeli...

Read more »

Bertrand or (The Importance of Defining Problems Properly)

May 13, 2015
By
Bertrand or (The Importance of Defining Problems Properly)

We better keep an eye on this one: she is tricky (Michael Banks, talking about Mary Poppins) Professor Bertrand teaches Simulation and someday, ask his students: Given a circumference, what is the probability that a chord chosen at random is longer than a side of the equilateral triangle inscribed in the circle? Since they must reach the … Continue reading...

Read more »

Advertising a Few Systematic ETFs (Strictly Of My Own Volition)

May 13, 2015
By
Advertising a Few Systematic ETFs (Strictly Of My Own Volition)

This post will introduce several ETFs from Alpha Architect and Cambria Funds (run by Meb Faber) that I think readers … Continue reading →

Read more »

Data Science – Short lesson on cluster analysis

May 13, 2015
By
Data Science – Short lesson on cluster analysis

Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster model is developed, one question arises: How can I describe my model? Here we present a way to approach this question, through the implementation of Coordinate Plot in R...

Read more »

Survival analysis: basic terms, the exponential model, censoring, examples in R and JAGS

May 13, 2015
By

I have put together some basic material on survival analysis. It is available as: .html document with highlighted syntax here. Printer-ready .pdf document here. GitHub repository with all the source files here. Main motivation was that I wanted to learn the basics myself; also, it's tricky to find simple examples of survival models fitted in ... more

Read more »

streaming machine learning with RMOA: stream_in > train > predict

streaming machine learning with RMOA: stream_in > train > predict

We will be showcasing our RMOA package at the next R User conference in Aalborg. For the R users who are unfamiliar with streaming modelling and want to be ahead of the Gartner Hype cycle or want to evaluate existing streaming machine learning models, RMOA allows to build, run and evaluate streaming classification models which are built in

Read more »

OpenMP, OS-X and R

May 12, 2015
By
OpenMP, OS-X and R

This is a quick technical post, that is as much about disseminating the information as putting it in a place where I can find it again in the future. I have been trying to use openMP in an R package … Continue reading →

Read more »

Hadley Wickham Master R Developer Workshop in Chicago – Space Limited

May 12, 2015
By
Hadley Wickham Master R Developer Workshop in Chicago – Space Limited

Join RStudio Chief Data Scientist Hadley Wickham at the University of Illinois at Chicago, on Wednesday May 27th & 28th for this rare opportunity to learn from one of the R community’s most popular and innovative authors and package developers. As of this post, the workshop is two-thirds sold out. If you’re in or near Chicago

Read more »

Copulas and Financial Time Series

May 12, 2015
By
Copulas and Financial Time Series

I was recently asked to write a survey on copulas for financial time series. The paper is, so far, unfortunately, in French, and is available on https://hal.archives-ouvertes.fr/. There is a description of various models, including some graphs and statistical outputs, obtained from read data. To illustrate, I’ve been using weekly log-returns of (crude) oil prices, Brent, Dubaï and Maya....

Read more »

Agent Based Modelling with data.table OR how to model urban migration with R

May 12, 2015
By
Agent Based Modelling with data.table OR how to model urban migration with R

Introduction Recently I found a good introduction to the Shelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil). The model follows an ABM approach to simulate how urban segregation can be explained. I will concentrate on the R-code, if you want to know more about the Shelling-Segregation Model (which brought

Read more »

The 2015 Strata + Hadoop World London

May 12, 2015
By
The 2015 Strata + Hadoop World London

By Mark Sellors, Mango UK On Tuesday 5th of May, O’Reilly Media and Cloudera, a distributor of a Hadoop based big data platform, brought their ‘Strata + Hadoop World‘ conference to London. The conference features a mixture of Data Science, … Continue reading →

Read more »

Using Azure as an R data source, Part 1

May 12, 2015
By
Using Azure as an R data source, Part 1

by Gregory Vandenbrouck Software Engineer at Microsoft This post is the first in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux. We’ll start with a relatively simple case of pulling data from SQL Azure to an R client on...

Read more »

Choosing R or Python for data analysis? An infographic

May 12, 2015
By
Greenshot_2015-05-12_22-10-50

I think you’ll agree with me if I say: It’s HARD to know whether to use Python or R for data analysis. And this is especially true if you’re a newbie data analyst looking for the right language to start with. It turns out that there are many good resources that can help you to figure out the The post

Read more »

Ninth Torino R net meeting and free modelling areal data tutorial

May 12, 2015
By
Ninth Torino R net meeting and free modelling areal data tutorial

On 4 June 2015 – 14:30 there will be a free tutorial: analysing and modelling areal data with ‘spdep’. Starting at 16:30 there will be the Ninth Torino R net meeting. Events will take place at Campus Luigi Einaudi, Università degli Studi di Torino. … Continue reading →

Read more »

Presentations of the eighth Torino R net meeting are online

May 12, 2015
By
Presentations of the eighth Torino R net meeting are online

Presentations of the eighth Torino R net meeting are now available on line, section Downloads. Thank you to all who attended the meeting on Thursday 17th September in Torino and special thanks to presenters. … Continue reading →

Read more »

Global Economic Maps

May 12, 2015
By
Global Economic Maps

IntroductionIn this post I am going to show how to extract data from web pages in table format, transform these data into spatial objects in R and then plot them in maps.ProcedureFor this project we need the following two packages: XML and raster.The first package is used to extract data from HTML pages, in particular from the sections marked...

Read more »

Hello Stan!

May 12, 2015
By
Hello Stan!

In my previous post I discussed how Longley-Cook, an actuary at an insurance company in the 1950's, used Bayesian reasoning to estimate the probability for a mid-air collision of two planes.Here I will use the same model to get started with Stan/RStan, a probabilistic programming language for Bayesian inference. Last week my prior was given as...

Read more »

Git pushing Shiny Apps with Docker & Dokku

May 11, 2015
By
Git pushing Shiny Apps with Docker & Dokku

At this post i will show you how to deploy Shiny Apps easily with a simple git push. But, what’s a git push? I’m referring to the git command used with remote repositories. With this command you can deploy apps easily with a PaaS (Platform as a Service) like Heroku. If you never heard about The post

Read more »

quantile functions: mileage may vary

May 11, 2015
By
quantile functions: mileage may vary

When experimenting with various quantiles functions in R, I was shocked by how widely the execution times would vary. To the point of blaming a completely different feature of R. Borrowing from Charlie Geyer’s webpage on the topic of probability distributions in R, here is

Read more »

Centering and Standardizing: Don’t Confuse Your Rows with Your Columns

May 11, 2015
By
Centering and Standardizing: Don’t Confuse Your Rows with Your Columns

R uses the generic scale( ) function to center and standardize variables in the columns of data matrices. The argument center=TRUE subtracts the column mean from each score in that column, and the argument scale=TRUE divides by the column standard devi...

Read more »

Scaling R clusters? AWS Spot Pricing is your new best friend

May 11, 2015
By
Scaling R clusters? AWS Spot Pricing is your new best friend

An elastic infrastructure for distributed R Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response

Read more »

What data science software tools do you use?

May 11, 2015
By

KDnuggets is once again running its annual poll of data science software tools, now in its 16th year. If you'd like to participate, visit the KDnuggets poll page and answer the question, "What Predictive Analytics, Data Mining, Data Science software/tools you used in the past 12 months?". The poll allows you to select up to 20 tools from the...

Read more »

Stata’s Academic Growth Nearly as Fast as R’s

May 11, 2015
By
Stata’s Academic Growth Nearly as Fast as R’s

by Bob Muenchen Analytics tools take significant effort to master, so once learned people tend to stick with them for much of their careers. This makes the tools used in academia of particular interest in the study of future trends … Continue reading →

Read more »

devtools 1.8.0

May 11, 2015
By
devtools 1.8.0

Devtools 1.8 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. You can learn more about developing packages at http://r-pkgs.had.co.nz/. Get the latest version of devtools with: install.packages("devtools") There are three main improvements: More helpers to get you

Read more »