## Bounding sums of random variables, part 1

September 27, 2012
For the last course MAT8886 of this (long) winter session, on copulas (and extremes), we will discuss risk aggregation. The course will be mainly on the problem of bounding  the distribution (or some risk measure, say the Value-at-Risk) for two random variables with given marginal distribution. For instance, we have two Gaussian risks. What could be be worst-case scenario...

## Simplest possible heatmap with ggplot2

September 27, 2012
Featuring the lovely “spectral” palette from Colorbrewer. This really just serves as a reminder of how to do four things I frequently want to do: Make a heatmap of some kind of matrix, often a square correlation matrix Reorder a factor vari...

## Calling Minimum Correlation Algorithm from Excel using RExcel & VBA

September 26, 2012
I want to show the example of calling the Minimum Correlation Algorithm from Excel. I will use RExcel to connect R and Excel and will create a small VBA cell array function to communicate between Excel and R. I have previously discussed the concept of connecting R and Excel in the “Calling Systematic Investor Toolbox

## eeptools 0.1 Available on CRAN Now!

September 26, 2012
eeptools 0.1 is available now on CRAN! You can install it by simply typing:install.packages('eeptools')in your R console now. The package allows users to play with a number of built in datasets for folks in education beginning to learn R, custom themes...

## structure and uncertainty, Bristol, Sept. 26

September 26, 2012
Another day full of interesting and challenging—in the sense they generated new questions for me—talks at the SuSTain workshop. After another (dry and fast) run around the Downs; Leo Held started the talks with one of my favourite topics, namely the theory of g-priors in generalized linear models. He did bring a new perspective on

## Association Rule Learning and the Apriori Algorithm

September 26, 2012
Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target knows your pregnant or when you’re buying an item on Amazon.com they know what else you want

## Merging Data Sets Based on Partially Matched Data Elements

September 26, 2012
A tweet from @coneee yesterday about merging two datasets using columns of data that don’t quite match got me wondering about a possible R recipe for handling partial matching. The data in question related to country names in a datafile that needed fusing with country names in a listing of ISO country codes. The original

## R courses in Belgium

Every year, the Leuven Statistics Research Center (Belgium) is offering short courses for professionals and researchers in statistics and statistical tools. The following link shows the overview of the courses: http://lstat.kuleuven.be/consulting/shortcourses/ENcourse%20overview.htm or get it here in pdf: http://lstat.kuleuven.be/consulting/shortcourses/BRO_LSTAT_2012-2013.pdf This year, BNOSAC is presenting the course on Advanced R Programming Topics, which will be held on Oktober 18-19. This course...

## Creating Scientific Posters using R, Latex, Beamer and Beamerposter

September 26, 2012
A while ago I had the need to produce some posters that included lots of data (scientific style).  Having recently got back into R and learning LaTex I googled for a way to do this using R.  Here's what I found and ended up with, using R, LaT...

## Using R in production: industry experts share their experiences

September 26, 2012
I had a great time yesterday moderating the "R in Action" panel discussion at the DataWeek conference in San Francisco. Each of the panelists represented a company that is actively using R and/or Revolution R Enterprise. Here (from memory, since I couldn't take notes) are some the things they shared: Jesse Bridgewater from eBay talked about how R is...

## R Studio and Revolution R impressions

September 26, 2012
I have used R for about six years now. Over the years I’ve done the majority of my coding in Linux and so R has been nothing more than a terminal. I enjoy the simplicity and purity of the terminal but...

## Some regressions on school data

September 26, 2012
Eric and I have been exchanging emails about potential analyses for the school data and he published a first draft model in Offsetting Behaviour. I have kept on doing mostly data exploration while we get a definitive full dataset, and … Continue reading →

## rasterVis to the rescue

September 26, 2012
Programmers like Oscar Perpiñán Lamigueiro are the reason I love R!  Oscar is the maintainer of the rasterVis package and it in this post I’ll explain why it is must have package for anyone working with the raster package in R.  My latest project is focused on the NOAA’s Climate Reference Network. The details can

## Predict Bounce Rate based on Page Load Time in Google Analytics

September 26, 2012
Welcome to the second part. In the last blog post on Linear Regression with R, we have discussed about what is regression? and how it is used ? Now we will apply that learning on a specific problem of prediction. In this post, I will create a basic model to predict bounce rate as function

## Linear Regression using R

September 26, 2012
Regression Through this post I am going to explain How Linear Regression works? Let us start with what is regression and how it works? Regression is widely used for prediction and forecasting in field of machine learning. Focus of regression is on the relationship between dependent and one or more independent variables. The “dependent variable”

## Modifying select off-diagonal items in a matrix

September 25, 2012
This is something I have had the occasion to do, and never remember how, so this is legitimately a reminder to my future self of how to do things with off-diagonal elements of a matrix. Select rows and columns are easy: mat or mat, for...

## Minimum Correlation Algorithm Speed comparison

September 25, 2012
The Minimum Correlation Algorithm is a heuristic method discovered by David Varadi. Below I will benchmark the execution speed of 2 versions of the Minimum Correlation Algorithm versus the traditional minimum variance optimization that relies on solving a quadratic programming problem. I have run the code above for n=10 (10 assets), n=100 (100 assets), n=500

September 25, 2012
The development of Armadillo 3.4.* continues with bug fixes and more sparse matrix support. Conrad release 3.4.2 this morning. I wrapped up the corresponding RcppArmadillo 0.3.4.2 before leaving for work, and this version should now have all CRAN mirr...

## R Helper Functions

September 25, 2012
If you do a lot of R programming, you probably have a list of R helper functions set aside in a script that you include on R startup or at the top of your code. In some cases helper functions add capabilities that aren’t otherwise available. In other cases, they replicate functionality that is available elsewhere without loading unnecessary...

## R Helper Functions

September 25, 2012
If you do a lot of R programming, you probably have a list of R helper functions set aside in a script that you include on R startup or at the top of your code. In some cases helper functions add capabilities that aren’t otherwise available. In other cases, they replicate functionality that is available The post R...

## Playing with The Circular Law in Julia

September 25, 2012
Introduction Statistically-trained readers of this blog will be very familiar with the Central Limit Theorem, which describes the asymptotic sampling distribution of the mean of a random vector composed of IID variables. Some of the most interesting recent work in mathematics has been focused on the development of increasingly powerful proofs of a similar law,

## Guest post: Visualizing data using a 3D printer

September 25, 2012
In a break from my usual obsessions and interests here is a guest blog post by Ian Walker. I'm posting it because I think it is rather cool and hope it will be of interest to some of my regular readers. Ian is perhaps best known (in the blogosphere) f...

## Spatial segregation in cities – An explanation by a neural network model (Demographics & neural network)

September 25, 2012
## Two particular courses and other upcoming events

September 25, 2012
Featured I’ll be leading two courses in the near future: Value-at-Risk versus Expected Shortfall 2012 October 30-31, London. 30th: “Addressing the critical challenges and issues raised by the Basel proposal to replace VaR with Expected Shortfall” 31st: “Variability in Value-at-Risk and Expected Shortfall” led by Patrick Burns Details at CFP Events. Finance with R Workshop … Continue reading...

## Thanks to our guest bloggers

September 25, 2012
I'm back from a very relaxing holiday in Australia. Many thanks to our guest bloggers for filling in over the last couple of weeks with some great information about R while I was away. If you missed any of the posts, be sure to check them out: Douglas McNair, "Population health management with RevoScaleR" Yihui Xie, "Integrate data and...

## Specifying Variables in R

September 25, 2012
R has several ways to specify which variables to use in an analysis. Some of the most frustrating errors can result from not understanding the order in which R searches for variables. This post demonstrates that order, hopefully smoothing your … Continue reading →

## Learning Kernels SVM

September 25, 2012
Machine Learning and Kernels A common application of machine learning (ML) is the learning and classification of a set of raw data features by a ML algorithm or technique. In this context a ML kernel acts to the ML algorithm … Continue reading →

## Visually-weighted regression plots, with Zelig

September 25, 2012
As a follow-up to yesterday’s post on producing visually-weighted regression plots, here is some code which illustrates the production of similar plots, but using Zelig’s convenient modeling and simulation functions. This code was produced...