I don't do much GIS but I like to. It's rather enjoyable and involves a tremendous skill set. Often you will find your self grabbing data sets from some site, scraping, data cleaning and reshaping, and graphing.

Credit scoring is the practice of analysing a persons background and credit application in order to assess the creditworthiness of the person. One can take numerous approaches on analysing this creditworthiness. In the end it basically comes down to first selecting the correct independent variables (e.g. income, age, gender) that lead to a given level of creditworthiness. In...

R and Python have different strengths. There's little you can do in R you absolutely can't do in Python and vice versa, but there's a lot of stuff that's really annoying in one and nice and simple in the other. I'm sure simulations can be run in R, but it seems frightfully tricky. Recently I wrote a simple Tennis simulator...

A few days ago I came across Jack Heinemann and collaborators’ article (Sustainability and innovation in staple crop production in the US Midwest, Open Access) comparing the agricultural sectors of USA and Western Europe‡. While the article is titled around the word sustainability, the main comparison stems from the use of Genetically Modified crops in

I showed an example of Extending Commodity time series back in 2012. Since then, the web site that I used to get the Thomson Reuters/Jefferies CRB Index data is no longer working. But there are a few alternatives: Thomson Reuters / Jefferies CRB Index. To get data, first select “TRJ/CRB Index-Total Return”, next click “See

Akaike’s Information Criterion (AIC) is a very useful model selection tool, but it is not as well understood as it should be. I frequently read papers, or hear talks, which demonstrate misunderstandings or misuse of this important tool. The following points should clarify some aspects of the AIC, and hopefully reduce its misuse. The AIC is a penalized likelihood,...

For some time now, I’ve advocated for the view that non-life loss reserving constitutes a categorized linear regression. I’ll emphasize that the idea of a linear regression isn’t remotely novel. Further, the categorization is the de facto approach. I’m merely recognizing it and suggesting instances where a decision may be made about the optimality of

In his book Quantum Computing Since Democritus, Scott Aaronson poses the following question: Suppose that you’re at a party where every guest is given a hat as they walk in. Each hat has either a pineapple or a watermelon on top, picked at random with equal probability. The guests don’t get to see the fruit

In our previous webinar, we discussed on predictive analytics and basic things to perform predictive analysis. We also discussed on an eCommerce problem and how it can be solved using predictive analysis. In this post, I will explain R script that I used to perform predictive analysis during webinar. Before I explain about R script,

Back to R (!) for the current Le Monde puzzle: Given an unknown permutation of the set {1,…,6}, written on the faces of a cube, there exist a sequence of summits such that increasing by one unit the three numbers of the faces sharing the successive summits in the sequence leads to identical values over

A paper published today in The R Journal discusses a fundamental limitation affecting reliability and reproducibility of R code. It explains how lack of dependency versioning causes R based applications break down, Sweave documents to stop working and CRAN to hit scaling problems. The paper suggests several solutions inspired by other open-source communities that could ...

Lately, I have been working with finite mixture models for my postdoctoral work on data-driven automated gating. Given that I had barely scratched the surface with mixture models in the classroom, I am becoming increasingly comfortable with them. With this in mind, I wanted to explore their application to classification because there are times when a single class is clearly made up of...

R can be used also as a scripting tool. We just need to add shebang in the first line of a file (script):#!/usr/bin/Rscriptand then the R code should follow.Often we want to pass arguments to such a script, which can be collected in the script by the c...

Like your .bashrc, .vimrc, or many other dotfiles you may have in your home directory, your .Rprofile is sourced every time you start an R session. On Mac and Linux, this file is usually located in ~/.Rprofile. On Windows it's buried somewhere in the R...

Analyzing Likert scale responses really comes down to what you want to accomplish (e.g. Are you trying to provide a formal report with probabilities or are you trying to simply understand the data better). Sometimes a couple of graphs are sufficient and a formalize statistical test isn’t even necessary. However, with how easy it is

Diego Salmerón and Juan Antonio Cano from Murcia, Spain (check the movie linked to the above photograph!), kindly included me in their recent integral prior paper, even though I mainly provided (constructive) criticism. The paper has just been arXived. A few years ago (2008 to be precise), we wrote together an integral prior paper, published

If you use R and ssh into other machines a lot, e.g. for doing some big data stuff on ec2, ess-remote is a great tool. Just use M-x ssh to ssh into the remote machine, then launch R. Now just M-x ess-remote and you can use the R process just like a local process! Productivity win. Also see