Monthly Archives: May 2012

knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

May 2, 2012
By
knitr-Example: Use World Bank Data to Generate Report for Threatened Bird Species

I'll use the below script that retrieves data for threatened bird species from the World Bank via its API and does some processing, plotting and analysis. There is a package (WDI) that allows you to access the data easily.# world bank indicators for sp...

Read more »

EU rules that computer languages cannot be copyrighted

May 2, 2012
By
EU rules that computer languages cannot be copyrighted

The European Court of Justice has published its decision in SAS v WPL; the title of the press release says it all “The functionality of a computer program and the programming language cannot be protected by copyright”. To summarise the background, World Programming Ltd developed a system that was capable of emulating the input/output behavior

Read more »

Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

May 2, 2012
By
Big Data, R and SAP HANA: Analyze 200 Million Data Points and Later Visualize in HTML5 Using D3 – Part III

Mash-up Airlines Performance Data with Historical Weather Data to Pinpoint Weather Related DelaysFor this exercise, I combined following four separate blogs that I did on BigData, R and SAP HANA.  Historical airlines and weather dat...

Read more »

Function to Generate a Random Data Set

May 2, 2012
By
Function to Generate a Random Data Set

Often I find myself needing data sets to try functions and code out on or for teaching purposes.  I have a few stand-bys such as the mtcars and CO2 data sets in the base packages of R but sometimes I … Continue reading →

Read more »

Finding Earth II

May 2, 2012
By
Finding Earth II

By 2030, we will have found approximately 10,000 exoplanets. "If it is just us... seems like an awful waste of space." -- from the movie Contact (1997) based on the book Contact by Carl Sagan. By the year 2030, it's possible that over ten th...

Read more »

Computational Journalism Server – The Way Forward

May 2, 2012
By

As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options: Remain on openSUSE / SUSE Studio and ...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R indicates the generic BLAS,  sMKL is the sequential (single core Intel MKL, and pMKL is the parallel Intel MKL using...

Read more »

Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012
By
Speeding up R with Intel’s Math Kernel Library (MKL)

I did some comparisons of the generic BLAS with Intel's MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R ...

Read more »

2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

May 2, 2012
By
2nd round of call for chapter proposals for book Data Mining Applications with R: due by 31 May

2nd CALL FOR CHAPTERS: proposals due by 31 May 2012 Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Introduction —————— R is one of the most widely used data mining tools in scientific and business … Continue reading →

Read more »

Measuring time series characteristics

May 2, 2012
By
Measuring time series characteristics

A few years ago, I was working on a project where we measured various characteristics of a time series and used the information to determine what forecasting method to apply or how to cluster the time series into meaningful groups. The two main papers to come out of that project were: Wang, Smith and Hyndman (2006) Characteristic-​​based clustering for...

Read more »