Twitter’s R package for detecting breakouts in time series

November 24, 2014
By
Twitter’s R package for detecting breakouts in time series

With so many more devices and instruments connected to the "Internet of Things" these days, there's a whole lot more time series data available to analyze. But time series are typically quite noisy: how do you distinguish a short-term tick up or down from a true change in the underlying signal? To solve this problem, Twitter created the BreakoutDetection...

Read more »

rvest: easy web scraping with R

November 24, 2014
By
rvest: easy web scraping with R

rvest is new package that makes it easy to scrape (or harvest) data from html web pages, by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with: install.packages("rvest") rvest in action To see rvest

Read more »

R and Data Mining Workshop at AusDM 2014, Brisbane, 27 November

November 24, 2014
By
R and Data Mining Workshop at AusDM 2014, Brisbane, 27 November

R and Data Mining Workshop at AusDM 2014 http://ausdm14.ausdm.org/workshop There will be a half-day workshop on R and Data Mining at the AusDM 2014 conference in Brisbane, Thursday afternoon, 27 November. The workshop will be composed of several sessions on … Continue reading →

Read more »

GTrendsR package to Explore Google trending for Field Dependent Terms

November 24, 2014
By
GTrendsR package to Explore Google trending for Field Dependent Terms

My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel’s GTrendsR GitHub package this week. It’s a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that … Continue reading →

Read more »

an ABC experiment

November 23, 2014
By
an ABC experiment

  In a cross-validated forum exchange, I used the code below to illustrate the working of an ABC algorithm: Hence I used the median and the mad as my summary statistics. And the outcome is rather surprising, for two reasons: the first one is that the posterior on the mean μ is much wider than

Read more »

Interpreting regression coefficient in R

November 23, 2014
By
Interpreting regression coefficient in R

Linear models are a very simple statistical techniques and is often (if not always) a useful start for more complex analysis. It is however not so straightforward to understand what the regression coefficient means even in the most simple case when there are no interactions in the model. If we are not only fishing for

Read more »

Slides of keynote speeches, tutorials and panelist presentations at IEEE Big Data 2014

November 23, 2014
By
Slides of keynote speeches, tutorials and panelist presentations at IEEE Big Data 2014

Slides of keynote speeches, tutorials and panelist presentations at the 2014 IEEE International Conference on Big Data can be found at the conference website at links below. (1) Keynote speech http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm - Never-Ending Language Learning, Tom Mitchell – E. Fredkin … Continue reading →

Read more »

Calculates population growth rate λ along element changes

November 23, 2014
By
Calculates population growth rate λ along element changes

The previous article introduced the sensitivity and elasticity to seasonal matrix model of imaginary annual plant.  Both sensitivity and elasticity are partial derivatives.  This means the values can only predict … Continue reading →

Read more »

When should I change to snow tires in Netherlands

November 23, 2014
By
When should I change to snow tires in Netherlands

The Royal Netherlands Meteorological Institute has weather information by day for a number of Dutch stations. In this post I want to use those data for a practical problem: when should I switch to winter tires? (or is that snow tires? In any case nails...

Read more »

proper use of GOSemSim

November 22, 2014
By
proper use of GOSemSim

One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN. The function of this package is not impressive, and I already knew some related works, including http://intscore.molgen.mpg.de/. The authors of this webserver contacted me for the usages of GOSemSim when they developing it. Read more »

Nov 20 Data Science Talklet: Incorporating Text Data into Your Feature Set

November 22, 2014
By

As promised, here are the slides and notes from my DSDC talklet on strategies for incorporating text data into the feature set of a predictive model. Slides Notes github Thanks to Harlan for asking, and to Dan and David for...

Read more »

Simulating scientists doing experiments

November 22, 2014
By

Following a discussion on Gelman's blog, I was playing around with simulating scientists looking for significant effects. Suppose each of 1000 scientists run 200 experiments in their lifetime, and suppose that 20% of the experiments are such that the n...

Read more »

Flowers/Fractals

November 21, 2014
By
Flowers/Fractals

Last week, I attended a "Flower Fest" where I had the opportunity to admire several of the most beautiful and awarded flowers, orchids, and decoration plants. Surprisingly, though, I never had thought of flowers like fractals the way I did this time. Fractals attract lots of interest, especially from mathematicians who actually spend some time … Read More...

Read more »

New package: curl. High performance http(s) streaming in R.

November 21, 2014
By
New package: curl. High performance http(s) streaming in R.

A bit ago I blogged about new streaming features in jsonlite: library(jsonlite) diamonds2 <- stream_in(url("http://jeroenooms.github.io/data/diamonds.json")) In the same blog post it was also mentioned that R does currently not support https connections. The RCurl package does support https, but does not have a connection interface. This bothered me so I decided to write one....

Read more »

Information Density and Custom Chart Designs

November 21, 2014
By
Information Density and Custom Chart Designs

I’ve been doodling today with a some charts for the Wrangling F1 Data With R living book, trying to see how much information I can start trying to pack into a single chart. The initial impetus came simply from thinking about a count of laps led in a particular race by each drive; this morphed

Read more »

Ford uses R for data-driven decision making

November 21, 2014
By

Mike Cavaretta is Ford Motor Company’s Chief Data Scientist, and was tasked by the incoming CEO Alan Mulally to help change the culture so that "important decisions within the company had to be based on data". In a feature article at Dataconomy, he reveals that R is a big part of this revolution at Ford: On the statistical side,...

Read more »

Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015

November 21, 2014
By
Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and … Continue reading →

Read more »

Visualization of probabilistic forecasts

November 21, 2014
By
Visualization of probabilistic forecasts

This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are

Read more »

Synchronization for R with the flock Package

November 20, 2014
By

Have you tried synchronizing R processes? I did and it wasn’t straightforward. In fact, I ended up creating a new package – flock. One of the improvements I did not too long ago to my R back-testing infrastructure was to start using a database to store the results. This way I can compute all interesting

Read more »

Tips & Tricks 5: Extracting Classifiers Using Substring

November 20, 2014
By

Today's exercise in another easy one, and is inspired by a question from Ariel Marcy of University of Queensland.Exercise 5 - How to extract classifiers from names of specimens.Well-organised morphometricians will have a consistent naming system for th...

Read more »

An xpd-tion into R plot margins

November 20, 2014
By
An xpd-tion into R plot margins

This is a guest post by Prasad Patil that answers the question: how to put a shape in the margin of an R plot? The help page for R's par() function is a somewhat impenetrable list of abbreviations that allow you to manipulate anything and everything in the plotting device. You may have used this function in the past to create an...

Read more »

Geomorph update 2.1.2 Now Available!

November 20, 2014
By
Geomorph update 2.1.2 Now Available!

Geomorph users,We have uploaded version 2.1.2 to CRAN. The windows and mac binaries have been compiled and the tarball is available.Version 2.1.2 comes with some small changes and new features: New functions advanced.procD.lm()for statis...

Read more »

RNA-seq Data Analysis Course Materials

November 20, 2014
By

Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed. Participants logged into AWS, launched the image, and we kicked off the morning ...

Read more »

LA R meetup – January 21: RStudio’s Shiny with Joe Cheng

November 20, 2014
By

At this meetup we’re fortunate to have RStudio’s lead developer Joe Cheng flying in from...

Read more »

Partying R Style with Sqor Sports, R on Azure, and data.table

November 20, 2014
By
Partying R Style with Sqor Sports, R on Azure, and data.table

by Joseph Rickert We usually have a pretty good time at the monthly Bay Area useR Group (BARUG) meetings, but this month's meeting was a bit more of a party than usual. The very well connected PR team at Sqor Sports, our host company for the evening, secured San Francisco's tres trendy 111 Minna Gallery for the venue. There...

Read more »

Stacking files made easy

November 20, 2014
By

Every campaign cycle I usually do similar things, go to a repository, download a bounce of data, merge and store them to an existing RData file for posterior analysis. I've already wrote about this topic some time ago, but this time I think my script became simpler. Set the Directory Let's assume you're not in … Read More...

Read more »

R Debugging GUI Improvements for Bio7

November 20, 2014
By

20.11.2014 In my last post i presented the first version of a debugging GUI for R in Bio7 which is just a visual wrapper for the default R debugging functions. In the meantime i added some new methods and also improved the visualization. For the upcoming Bio7 release i have to make some cleanups but

Read more »

The three types of Reddit posts, and how they make it to the front page

November 19, 2014
By
The three types of Reddit posts, and how they make it to the front page

Todd Schneider's blog post on solving the traveling salesman problem with R hit the front page of reddit.com. This is a big deal: front-page placement on the popular social news site can drive a ton of traffic (in Todd's case, 1.3 million pageviews). But what factors determine which of reddit's contributed links make it to the front page? (There...

Read more »

The ensurer package (validation inside pipes)

November 19, 2014
By
The ensurer package (validation inside pipes)

Guest post by Stefan Holst Milton Bache on the ensurer package. If you use R in a production environment, you have most likely experienced that some circumstances change in ways that will make your R scripts run into trouble. Many things can go wrong; package updates, external data sources, daylight savings time, etc. There is a general

Read more »