high-performance computing

Using R for Map-Reduce applications in Hadoop

May 4, 2011 | David Smith

Data Scientist Antonio Piccolboni recently published this comparison of the various language and interfaces available for programming Big Data analysis tasks in the map-reduce framework. The interfaces he reviewed included: Java Hadoop (mature and efficient, but verbose and difficult to program) Cascading (brings an SQL-like flavor to Java programming with ... [Read more...]

Parallel processing in R for Windows

March 4, 2011 | David Smith

The doSMP package (and its companion package, revoIPC), previously bundled only with Revolution R, is now available on CRAN for use with open-source R under the GPL2 license. In short, doSMP makes it easy to do SMP parallel processing on a Windows box with multiple processors. (It works on Mac ... [Read more...]

Packages for By-Group Processing in R

February 24, 2011 | David Smith

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing "by-group" processing of data. The task consisted of: ... read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a ... [Read more...]

Using R and Hadoop to analyze VOIP data

November 8, 2010 | David Smith

Last month, the newest member of Revolution's engineering team, Saptarshi Guha, gave a presentation at Hadoop World 2010 on using R and Hadoop to analyze 1.3 billion voice-over-IP packets to identify calls and measure call quality. Saptarshi, of course, is the author of RHIPE, which lets R programmers write map-reduce algorithms in ... [Read more...]

Making sense of MapReduce

September 24, 2010 | Joseph Rickert

From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep ... [Read more...]

Guidelines for efficient R programming

September 22, 2010 | David Smith

R is designed to make it easy to clearly express statistical ideas in code, but when it come to writing code that runs as fast as possible, there are a few tips, tricks and caveats to be aware of. As part of the BioConductor conference this past summer, Martin Morgan ... [Read more...]

Saptarshi Guha on Hadoop, R

September 20, 2010 | David Smith

Saptarshi Guha (author of the Rhipe package) joins the likes of Ebay, Yahoo, Twitter and Facebook and as one of just 37 presenters at the Hadoop World conference. (Revolution Analytics is proud to sponsor Saptarshi's presence at this event, which take place in New York on October 12.) He'll be talking about ... [Read more...]

plyr and reshape: better, faster, more productive

September 10, 2010 | David Smith

Hadley Wickham has just released updates to his data-manipulation packages for R, plyr and reshape (now called reshape2), that are much faster and more memory-efficient than the previous incarnations. The reshape2 package lets you flexibly restructure and aggregate data using just three functions (melt, acast and dcast), whereas the plyr ... [Read more...]

Slides and replay for “Big Data with Revolution R”

August 25, 2010 | David Smith

Thanks to everyone who attended our webinar this morning, Big Data Analysis for R Using Revolution R Enterprise, and in particular thanks for all the thoughtful questions during the Q&A session. If you missed the live broadcast, a replay is now available (requires the ability to view WMV files), ... [Read more...]

Webinar: Big Data Analysis with Revolution R

August 24, 2010 | David Smith

Don't forget that I'll be hosting a webinar tomorrow talking about the new RevoScaleR package included with the forthcoming Revolution R Enterprise 4.0. The webinar will also feature a live demonstration from Joseph Rickert. The full details are below, and you can register for the webinar here. Big Data Analysis for ... [Read more...]

Taking R to the Limit: Parallelism and Big Data

August 23, 2010 | David Smith

In a two-part series at the Los Angeles R User Group[*], Ryan Rosario took a look at the many ways you can take the R language to the limits of high-performance computing. In Part I (see video at this link; slides and code also available), Ryan focuses on the various ... [Read more...]

Wanted: Big-data beta testers

July 15, 2010 | David Smith

We're nearing completion of the package of statistical tools for very large data sets that I gave an early preview of at R/Finance 2010. It will be released for Revolution R Enterprise later this year, but we're looking for some R users with big data sets to put the 1.0 version ... [Read more...]

How to peg 7 cores with doSMP

June 28, 2010 | David Smith

Statistics PhD student Nathan VanHoudnos has an 8-core laptop, and by his own admission, takes "an almost unhealthy pleasure in pushing [his] computer to its limits". It seems like he's found an outlet for this passion with the new doSMP library included with Revolution R, that allows him to use ... [Read more...]

Making Data Work online conference

June 3, 2010 | David Smith

O'Reilly is hosting a conference on June 9 on the topic of the analysis of large data sets. The title of the conference is Making Data Work: Ever since Hal Varian proclaimed that data analysis is the sexy career for the coming decade, people have been talking about data. And big ... [Read more...]

Prediction in the cloud: turbulent

May 19, 2010 | David Smith

While Microsoft rolled out its Technical Computing Initiative -- promising new tools for distributed parallel computing on large data sets in the cloud -- with much fanfare earlier this week, Google made a rather more understated response. In a post to the developer-focused Google Code Blog, they quietly announced two ... [Read more...]

Parallel Computing with R for Life Sciences

May 18, 2010 | David Smith

I hadn't heard of the CloudAsia 2010 conference before, but from the programme the workshop Master Class on HPC Application For Life Sciences looked like it was interesting. One workshop session in particular caught my eye: Practical Parallel Computing in R by Xie Chao and Tan Tin Wee (from the National ... [Read more...]
1 2 3

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)