Matrix in R is formed using matrix, rbind, or cbind function. These functions have the following descriptions:matrix - used to transform a concatenated data into matrix form of compatible dimensions. rbind - short for row bind, that binds a conca...

Maybe you have encountered this situation: you run a large-scale study over the internet, and out of curiosity, you frequently check the correlation between two variables. My experience with this practice is usually frustrating, as in small sample sizes (and we will see what “small” means in this context) correlations go up and down, change sign,

Soil survey data are typically built upon a foundation of soil-landscape relationships that have been verified in the field. SSURGO data contain several geomorphic descriptions of landscape, landform, hillslope position, and surface shape for each...

The results of the 2013 KDNuggets software poll are in, with RapidMiner and R in a near-tie for first place. Of a record 1880 respondents, 737 reported using Rapid-I RapidMiner/RapidAnalytics, and 704 reported using R. Excel came in third: with 527 respondents, it was the lone commercial tool in the top 5. You can see the top 10 responses...

I have written a little function that allows users to run R scripts out of Dropbox directly from any location. It was aided by this post on biobucket. The reason I am particularly interested in this feature is because I am often using a ser...

The Frisch–Waugh–Lovell (FWL) theorem is of great practical importance for econometrics. FWL establishes that it is possible to re-specify a linear regression model in terms of orthogonal complements. In other words, it permits econometricians to partial out right-hand-side, or control, variables. This is useful in a variety of settings. For example, there may be cases

A Armadillo release 3.900.0 was provided by Conrad yesterday. It has been rolled into a new RcppArmadillo release 0.3.900.0 which is now on CRAN and in Debian. It has a number of nice changes, mostly on the performance side of things (see below) an...

Since R uses the computer RAM, it may handle only rather small sets of data. Nevertheless, there are some packages that allow to treat larger volumes and the best solution is to connect R with a Big Data environment. This … Continue reading →

The statistical software R has an ever-expanding array of packages that provide pre-programmed functions and datasets. One such package is named Lahman, bundling the contents of the Lahman database into a quick-and-easy resource for R users. In addition to the data tables, the package resources also contain a variety of analyses and graphics undertaken using...

Next on modelling survival data from Veterinary Epidemiologic Research: semi-parametric analyses. With non-parametric analyses, we could only evaluate the effect one or a small number of variables. To evaluate multiple explanatory variables, we analyze data with a proportional hazards model, the Cox regression. The functional form of the baseline hazard is not specified, which make

Last week, we had a discussion with some colleagues about the fact that – in order to prepare for the SOA exams – we did not have time (so far) to mention results on extreme values in our actuarial program. I did gave an introduction in my nonlife actuarial models class, but it was only an introduction, in three...

Highlighted Value at Risk and Expected Shortfall A two-day course exploring Value at Risk and Expected Shortfall, and their role in risk management. 2013 June 25 & 26, London. Lead by Patrick Burns. Details at the CFP Events site. New Events Thalesians — San Francisco 2013 June 5. Jesse Davis on “Risk Model Imposed Manager-to-Manager … Continue reading...

I subscribe to the Ensembl blog and found, in my feed reader this morning, a post which linked to the Variant Effect Predictor (VEP). The original blog post, strangely, has disappeared. Not to worry: so, the VEP takes genotyping data in one of several formats, compares it with the Ensembl variation + core databases and

In a paper arXived on Friday, Roberto Fontana relates the generation of Sudoku grids to the one of Latin squares (which is unsurprising) and to maximum cliques of a graph (more surprising). The generation of a random Latin square proceeds in three steps: generate a random Latin square L with identity permutation matrix on symbol

If you're thinking about starting a project (for example, a report or paper) using the R language for analysis, the Nice R code blog has some great advice. Following the principles of reproducible research, Macquarie University postdocs Rich FitzJohn and Daniel Falster suggest: Creating a directory structure to separate R code, data, reports, and output Treating data as read-only...

In this blogpost, I will be talking briefly about Predictive Analytics and why it holds value from a web analytics perspective. Broadly speaking, Predictive Analytics is a set of methodologies that assist us in anticipating customer behavior. The customer behavior of interest could be anything ranging from spend, buying habits, page views, response to a

Adam Duncan Also avilable on R-bloggers.com Setting up a Jekyll/Jekyll Bootstrap blog site is a very worthwhile experience. Should you choose to use Jekyll as your blogging platform, you will find many resources out there describing the setup process. This post is not about getting set up using Jekyll or Jekyll Bootstrap. It’s about establishing a good workflow...

When I first learned about Granger-causality this past February, I was bemused and quite skeptical of the whole procedure. I felt it belonged on the scrapheap of impractical academic endeavors, preferring to possibly use an ARIMA transfer function model for the same task. However, several contemporaries threw the red challenge flag and upon further review, my initial impressions have...