854 search results for "parallel"

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

December 10, 2012
By
Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting

Read more »

Please stop using Excel-like formats to exchange data

December 7, 2012
By
Please stop using Excel-like formats to exchange data

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my Related posts:

Read more »

R and the SGeMS blockdata format

December 7, 2012
By
R and the SGeMS blockdata format

The popular geostatistical software SGeMS has some options for working with non-point support (block) data through the BGeost set of algorithms by Yongshe Liu (see his PhD thesis), and published in Liu and Journel (2009). A specific but ...

Read more »

R in the Cloud

December 6, 2012
By
R in the Cloud

I've been having some great fun parallelizing R code on Amazon's cloud. Now that things are chugging away nicely, it's time to document my foibles so I can remember not to fall into the same pits of despair again. The goal was to perform lots of trails of a randomized statistical simulation. The jobs were independent and fairly chunky, taking...

Read more »

ggplot2 0.9.3 and plyr 1.8 have been released!

December 6, 2012
By
ggplot2 0.9.3 and plyr 1.8 have been released!

We’re pleased to announce new versions of ggplot2 (0.9.3) and plyr (1.8).  To get up and running with the new versions, start a clean R session without ggplot2 or plyr loaded, and run install.packages(c("ggplot2", "gtable", "scales", "plyr")). Read on to find out what’s new. ggplot2 0.9.3 Most of the changes version 0.9.3 are bug fixes. Perhaps

Read more »

Loading Big files in R

December 5, 2012
By
Loading Big files in R

Far as I remember, today was the first day in my life I succeed to load a text file bigger than 1.5 Gb into R (~ 5 million lines and 18 columns). My computer is not a small stuff, I’m using a Macbook Pro with rough 8GB of RAM, but the issue why I couldn’t

Read more »

Big Data Trees with Hadoop HDFS

December 4, 2012
By

Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regresson trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package). It also introduced the possibility of applying this and the other big-data statistical methods of RevoScaleR to data files distributed in in Hadoop's HDFS file system*,...

Read more »

pbdR Updates – Distributed lm.fit() and More

December 3, 2012
By

Over the weekend, we updated all of the pbdR packages currently available on the CRAN.  The updates include tons of internal housecleaning as well as many new features. Notably, pbdBASE_0.1-1 and pbdDMAT_0.1-1 were released, which contain lm.fit() methods.  This function in particular has been available at my github for over a month, but didn't make its way to the...

Read more »

forecast package v4.0

December 2, 2012
By
forecast package v4.0

A few days ago I released version 4.0 of the forecast package for R. There were quite a few changes and new features, so I thought it deserved a new version number. I keep a list of changes in the Changelog for the package, but I doubt that many people look at it. So for the record, here are...

Read more »

Trading with Support Vector Machines (SVM)

November 30, 2012
By
Trading with Support Vector Machines (SVM)

Finally all the stars have aligned and I can confidently devote some time for back-testing of new trading systems, and Support Vector Machines (SVM) are the new “toy” which is going to keep me busy for a while. SVMs are a well-known tool from the area of supervised Machine Learning, and they are used both

Read more »