R code for Chapter 1 of Non-Life Insurance Pricing with GLM

March 1, 2012
By
R code for Chapter 1 of Non-Life Insurance Pricing with GLM

Insurance pricing is backwards and primitive, harking back to an era before computers. One standard (and good) textbook on the topic is Non-Life Insurance Pricing with Generalized Linear Models by Esbjorn Ohlsson and Born Johansson. We have been doing some work in this area recently. Needing a robust internal training course and documented methodology, we have...

Read more »

Parallelizing Voting simulation

March 1, 2012
By
Parallelizing Voting simulation

Last week I have compared synchronous and asynchronous implementation of NetLogo Voting model. An interesting afterthought is that synchronous model implementation can be easily made much faster using vectorization.The two versions of the Voting synchr...

Read more »

I see high frequency data

March 1, 2012
By
I see high frequency data

In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD – it has only best bid/ask and the trades). Now, once you saved some data – what should you do next? Next logical step would be data sanity check and visualization.

Read more »

Bad Science at Strata 2012

March 1, 2012
By

Ben Goldacre, the physician and biostatistician behind the always-excellent Bad Science column in the Guardian, gave a barnburner of a talk at Strata 2012 yesterday, "The Information Architecture of Medicine is Broken". For anyone not aware of the problems caused by publication bias in clinical trials (for example, ineffective drugs with a wide variety of side-effects coming to market),...

Read more »

First Milano R net meeting

March 1, 2012
By
First Milano R net meeting

May 8, 2012 - 18:00 - 21:00 Fiori Oscuri Bistrot & Bar Via Fiori Oscuri, 3 - Milano (Zona Brera) Continue reading →

Read more »

Example 9.22: shading plots and inequalities

March 1, 2012
By
Example 9.22: shading plots and inequalities

A colleague teaching college algebra wrote in the R-sig-teaching list asking for assistance in plotting the solutions to the inequality x^2 - 3 > 0. This type of display is handy in providing a graphical solution to accompany an analytic one. RThe plot...

Read more »

First Milano R net meeting details

March 1, 2012
By

First Milano R net meeting. When: May 8, 2012, from 18.00 to 21.00 Where: Fiori Oscuri Bar & Bistrot, Via Fiori Oscuri 3, Milano. Further details

Read more »

R Tutorial Series: Centering Variables and Generating Z-Scores with the Scale() Function

March 1, 2012
By
R Tutorial Series: Centering Variables and Generating Z-Scores with the Scale() Function

Centering variables and creating z-scores are two common data analysis activities. While they are relatively simple to calculate by hand, R makes these operations extremely easy thanks to the scale() function. Tutorial FilesBefore we begin, you may wan...

Read more »

doSMP pulled

March 1, 2012
By
doSMP pulled

They have finally pulled that buggy unreliable piece of code that was doSMP from the CRAN mirrors while (I hear) Revolutions are re-writing it. To use all your cores for analysis on the Windows platform, you can try doSNOW instead; my code is something like the fragment...

Read more »

doSMP pulled

March 1, 2012
By
doSMP pulled

They have finally pulled that buggy unreliable piece of code that was doSMP from the CRAN mirrors while (I hear) Revolutions are re-writing it. To use all your cores for analysis on the Windows platform, you can try doSNOW instead; my code is something like the fragment below. Neither option is as attractive...

Read more »

Kölner R User Meeting 30 March 2012

March 1, 2012
By
Kölner R User Meeting 30 March 2012

Am 30. März 2012 möchte ich gerne das erste Kölner R Benutzer Treffen organisieren. Ich habe an den Treffen in London in den vergangen Jahren teilgenommen und hoffe auch in Köln Gleichgesinnte zu finden, die sich gerne bei einem Kölsch über R and...

Read more »

Generation of correlated random numbers: recommended article

February 29, 2012
By
Generation of correlated random numbers: recommended article

This quick blog entry to share an excellent article of Thijs van den Berg entitled Generating Correlated Random Numbers. This author describes in a nicely way how to generate sequences of correlated random numbers using the Cholesky decomposition, and a Eigenvector … Continue reading →

Read more »

R turns 12; R 2.14.2 is out

February 29, 2012
By

As promised by the R Core Group, R 2.14.2 is out. This is the final patchlevel of the R 2.14.x series (R 2.15.0 is due on March 30), and so R 2.14.2 will be the R engine for the next release of Revolution R Enterprise in a couple of months. Today also marks the 12th anniversary since R 1.0.0...

Read more »

Massive Increase in Ethanol Production

February 29, 2012
By
Massive Increase in Ethanol Production

Description: Yearly production of Ethanol in the United States since 1980. Data: http://www.ethanolrfa.org/ Analysis: When it comes to fuel - especially for transportation - oil is king. In 2010, the United States imported 180.8 billion gallons ...

Read more »

ROracle 1.1-1 Delivers Performance Improvements

February 29, 2012
By

The Oracle R Advanced Analytics team is happy to announce the release of the ROracle 1.1-1 package on the Comprehensive R Archive Network (CRAN).  We’ve rebuilt ROracle from the ground up, working hard to fix bugs and add optimizations. The new version introduces key improvements for interfacing with Oracle Database from open-source R. Specific improvements in ROracle 1.1-1 include:...

Read more »

A Direct Marketing In-flight Forecasting System

February 29, 2012
By
A Direct Marketing In-flight Forecasting System

This is an edited version of A Direct Marketing In-flight Forecasting System. The original article was written by Shannon Terry and Ben Ogorekm, Nationwide Insurrance, in order to enter the “Applications of R in Business” contest organised by Revolution Analytics. This is the winning entry of the contest. I added some notes in the third

Read more »

Programatically rename files (or do other stuff to them) in R

February 29, 2012
By

In order to do something to a bunch of files at once, we first need a vector which contains the file paths of just the files we are interested in. startingDir<-"/myDirectory"filez<-list.files(startingDir,pattern="searchPattern")head(filez) "/m...

Read more »

Functional ANOVA using INLA – update

February 29, 2012
By
Functional ANOVA using INLA – update

INLA author Håvard Rue wrote me to point out a problem in the Functional ANOVA code given in this post. I made a mistake in setting the precision of the fixed effects (I used “default” instead of “prec”). I’ve put Håvard’s corrected version of the code below.  

Read more »

Graphical message boxes with R package tcltk

February 29, 2012
By
Graphical message boxes with R package tcltk

Sometimes you just need a graphical messagebox....know what I mean? If only because it pops up in front of all the other open windows and alerts you to the fact that your R script is waiting for you to do something, or is finished doing something el...

Read more »

ABC in Roma [R lab #1]

February 29, 2012
By
ABC in Roma [R lab #1]

Here are the R codes of the R labs organised by Serena Arima in supplement of my lectures. This is quite impressive and helpful to the students, as illustrated by the first example below (using the abc software). I am having a great time teaching this “ABC in Roma” course, in particular because of the

Read more »

Statistics project ideas for students

February 29, 2012
By

Here are a few ideas that might make for interesting student projects at all levels (from high-school to graduate school). I’d welcome ideas/suggestions/additions to the list as well. All of these ideas depend on free or scraped data, which means tha...

Read more »

XYZ geographic data interpolation, part 2

February 29, 2012
By
XYZ geographic data interpolation, part 2

Having recently received a comment on a post regarding geographic xyz data interpolation, I decided to return to my original "xyz.map" function and open it up for easier interpretation. This should make the method easier to adapt and follow.The above graph shows the distance to Mecca as interpolated from 1000 randomly generated lat/lon...

Read more »

A minimum variance portfolio in 2011

February 29, 2012
By
A minimum variance portfolio in 2011

2011 was a good vintage for minimum variance, at least among stocks in the S&P 500. Previously The post “Realized efficient frontiers” included, of course, a minimum variance portfolio.  That portfolio seemed interesting enough to explore some more. “What does ‘passive investing’ really mean” suggests that minimum variance should be considered a form of passive … Continue reading...

Read more »

Custom Amazon EC2 config for Rstudio

February 29, 2012
By

IntroductionThis post is a work in progress building on the previous post. It's my attempt to simultaneously learn Amazon's AWS tools and set up R and Rstudio Server on a customized "cloud" instance. I look forward to testing some R jobs that have la...

Read more »

Expanding Visualization of published system edges (R)

February 28, 2012
By
Expanding Visualization of published system edges (R)

I happened to be looking over a revised text of a systems author I happen to follow. I will be a bit vague about specifics, as the system itself is based on well know ideas, but I'll leave the reader to research related systems.  The basic message...

Read more »

Parsing R code: Freedom of expression is not always a good idea

February 28, 2012
By
Parsing R code: Freedom of expression is not always a good idea

With my growing interest in R it was inevitable that I would end up writing a parser for it. The fact that the language is relatively small (the add-on packages do the serious work) hastened the event because it did not look like much work; famous last words. I knew about R’s design and implementation

Read more »

Webinar tomorrow: Big-data statistics with Revolution R with IBM Netezza

February 28, 2012
By

As explained in detail by Michele Chambers at the IBM Netezza blog, there are two keys to getting fast performance with statistical analysis on massive data sets with R: Massive parallelization: break the problem down into small pieces, and run them in parallel Bring the R engine to the data (not the other way around), to avoid data transfer...

Read more »

PCA for NIR Spectra_part 006: "Mahalanobis"

February 28, 2012
By
PCA for NIR Spectra_part 006: "Mahalanobis"

Outliers have an important influence over the PCs, for this reason they must be detected and examinee.We have just the spectra without lab data, and we have to check if any of the sample spectra is an outlier ( a noisy spectrum, a sample which belongs ...

Read more »

People voice about Lynas Malaysia through Twitter Analysis with R CloudStat

February 28, 2012
By
People voice about Lynas Malaysia through Twitter Analysis with R CloudStat

People voice about Lynas Malaysia through Twitter Analysis with R CloudStat: CloudStat Analysis: This is a twitter analysis report for “Lynas” from 21 till 28 February 2012, generated by CloudStat Twitter Application. Lynas was a hot topic, espec...

Read more »