R and Salesforce

February 25, 2012
By
R and Salesforce

Introduction R is widely used among scientists and statisticians to perform statistical analysis while Salesforce.com is one of the leading CRM software packages used for Marketing and Salesforce automation. Salesforce.com contains vital information regarding Leads, Customers, Contacts, Opportunities and Cases. Currently this data is mainly used for operational purposes by Sales and Marketing professionals. How

Read more »

A Roma

February 25, 2012
By
A Roma

Today, I am going to Rome for a week, teaching my PhD course on ABC I first gave in Paris. The course takes place in La Sapienza Università di Roma, from Monday till Thursday. There will be an R lab in addition to the lectures. (I have no further item of information at the moment.)

Read more »

PCA for NIR Spectra_part 003: "NIPALS"

February 25, 2012
By
PCA for NIR Spectra_part 003: "NIPALS"

> X<-yarn$NIR> X_nipals<-nipals(X,a=10,it=100)Two matrices are generated (P and T)As in other posts, we are going to look to the loadings & scores, for firsts three principal components:> wavelengths<-seq(1,268,by=1)> matplot(w...

Read more »

Why I don’t like Dynamic Typing

February 25, 2012
By

A lot of people consider the static typing found in languages such as C, C++, ML, Java and Scala as needless hairshirtism. They consider the dynamic typing of languages like Lisp, Scheme, Perl, Ruby and Python as a critical advantage (ignoring other features of these languages and other efforts at generic programming such as the Related posts:

Read more »

Creating beautiful maps with R

February 24, 2012
By
Creating beautiful maps with R

Spanish R user and solar energy lecturer Oscar Perpiñán Lamigueiro has written a detailed three-part guide to creating beautiful maps and choropleths (maps color-coded with regional data) using the R language. Motivated by the desire to recreate this graphic from the New York Times, Oscar describes how he creates similar high-quality maps using R. In Part 1, Oscar grabbed...

Read more »

I’m Hiring!

February 24, 2012
By

I direct the Bioinformatics Core at the University of Virginia, and I'm hiring. Visit this link on the UVA Jobs website for more information. Here's the description:The University of Virginia Bioinformatics Core is seeking a full-time position as a bio...

Read more »

How to save high frequency data in mongodb

February 24, 2012
By

Are you looking for ways how to save real time, high frequency data taken from Interactivebrokers.com API ? I built an example in C++ which saves all incoming data in Mongodb. Check this link if you are interested: https://github.com/kafka399/TwsMongo  

Read more »

Synchronous vs. asynchronous agent activation example

February 24, 2012
By
Synchronous vs. asynchronous agent activation example

This time I have implemented NetLogo Voting model to verify how agent activation scheme influences the results.The code executing the simulation is given below. It simulates two types of voter preferences encoded as 1 and -1. In this way...

Read more »

Analyzing weblog data with R

February 23, 2012
By
Analyzing weblog data with R

The R-chart blog explains how to read a weblog file into R, so you can analyze traffic to a website. For example, here's a page request chart created with R: Now, charts like this are stock-in-trade for tools like Google Analytics, but this is still useful if you want to look at the performance of a site that hasn't...

Read more »

GSoC Project #2 for 2012

February 23, 2012
By
GSoC Project #2 for 2012

In my prior post, I discussed the origins of the first GSoC project I posted this year. The second GSoC project I’ve proposed is around the writing and code of Attilio Meucci, an adjunct professor at Baruch College – CUNY and an excellent speaker (I saw him at the University of Chicago when he spoke

Read more »

Large-scale Inference

February 23, 2012
By
Large-scale Inference

Large-scale Inference by Brad Efron is the first IMS Monograph in this new series, coordinated by David Cox and published by Cambridge University Press. Since I read this book immediately after Cox’ and Donnelly’s Principles of Applied Statistics, I was thinking of drawing a parallel between the two books. However, while none of them can

Read more »

Pocketbook costs of software

February 23, 2012
By
Pocketbook costs of software

I have always been provided SAS as part of my job, so I never really realized how much it cost. I’ve bought Stata before, and of course R . I recently found out how much a reasonable bundle of SAS modules along with base SAS costs per year per seat, at least under the GSA.

Read more »

Ternary ifelse ( ?: ) in different languages

February 23, 2012
By

AWK$ awk 'ORS=NR%3?",":"\n"' student-marksPerl /PHP$result = ($a > $b) ? $x : $y;In Per6, use double ? and ! instead.$result = ($a > $b) ?? $x !! $y;Rifelse(a>0,a,0)Ternary operator (if?true:false)bash/linuxternary operator ? : is ju...

Read more »

PCA for NIR Spectra_part 002: "Score planes"

February 23, 2012
By
PCA for NIR Spectra_part 002: "Score planes"

The idea of this post is to compare the score plots for the first 3 principal components obtained with the algorithm “svd” with the scores plot of  other chemometric software (Win ISI in this case). Previously I had exported the yarn spectra t...

Read more »

Prediction: the Lasso vs. just using the top 10 predictors

February 23, 2012
By
Prediction: the Lasso vs. just using the top 10 predictors

One incredibly popular tool for the analysis of high-dimensional data is the lasso. The lasso is commonly used in cases when you have many more predictors than independent samples (the n « p) problem. It is also often used in the context of predictio...

Read more »

Visualization in regression analysis

February 23, 2012
By
Visualization in regression analysis

Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtai...

Read more »

Example 9.21: The birthday "problem" re-examined

February 23, 2012
By
Example 9.21: The birthday "problem" re-examined

The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 peo...

Read more »

Gini index and Lorenz curve with R

February 23, 2012
By
Gini index and Lorenz curve with R

You can do anything pretty easily with R, for instance, calculate concentration indexes such as the Gini index or display the Lorenz curve (dedicated to my students). Although I did not explain it during my lectures, calculating a Gini index or displaying the Lorenz curve can be done very easily with R. All you have

Read more »

Maps with R (III)

February 23, 2012
By
Maps with R (III)

In my previous posts (1 and 2) I wrote about maps with complex legends but without any kind of interactivity. …Continuar leyendo »

Read more »

another X’idated question

February 23, 2012
By
another X’idated question

An X’idated reader of Monte Carlo Statistical Methods had trouble with our Example 3.13, the very one our academic book reviewer disliked so much as to “diverse a 2 star”. The issue is with computing the integral when f is the Student’s t(5) distribution density. In our book, we compare a few importance sampling solutions,

Read more »

Visualization in regression analysis

February 23, 2012
By
Visualization in regression analysis

Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtained from http://data.worldbank.org/, with, for each country, the GDP per capita (in some common currency) and the infant mortality rate...

Read more »

Setting up FastRWeb on Mac OS X

February 23, 2012
By
Setting up FastRWeb on Mac OS X

FastRWeb is an infrastructure that allows any webserver to use R scripts for generating dynamic content, such as web pages and graphics. In this post, you’ll learn how to install and set up FastRWeb on a Mac. This tutorial is expendable to any Unix-like operating system. It is an adaptation from Jay Emerson’s post, Setting

Read more »

What to do about publishing data?

February 22, 2012
By
What to do about publishing data?

There seems to be a scale that’s tipping at the moment: Data or code that should be published* is often not.  This should seem like an odd statement for anyone in science, but it should be easy to show that most publications … Continue reading →

Read more »

Introduction to R and Revolution R Enterprise: Slides

February 22, 2012
By

If you missed this morning's webinar, Revolution R Enterprise, 100% R and More, I've embedded the slides below. Interestingly, about half of today's participants were SAS users, and the remainder R users. The first section introduces open-source R, and the second describes the additional features of Revolution R Enterprise. View more presentations from Revolution Analytics Unfortunately we had a...

Read more »

Diamonds vs. water smackdown in playitbyr-powered podcast

February 22, 2012
By

Apparent Reason, my new monthly podcast, is a boisterous and non-technical discussion of economics and statistics. In that format I don't have the luxury of showing charts and graphs to complement my discussion, so I use the playitbyr package to represent the data as sound. (Apparently February is a great month to start R-related podcasts!

Read more »

A Sequence Clustering Model in R

February 22, 2012
By
A Sequence Clustering Model in R

I’ve just released my first R package! Over the past 1.5 years or so, I’ve been studying an obscure statistical model over ranking (full, or partial) data called Mallows’ model.  It hypothesizes that a set of sequence data has a “modal” sequence about which the data cluster, and that the data fall away from that

Read more »

PCA for NIR Spectra_part 001: "Plotting the loadings"

February 22, 2012
By
PCA for NIR Spectra_part 001: "Plotting the loadings"

There are different algorithms to calculate the Principal Components (PCs). Kurt Varmuza & Peter Filzmozer explain  them in their book: “Introduction to Multivariate Statistical Analysis in Chemometrics”.I´m going to apply one of them, to...

Read more »

The New York Yankees Payroll vs Everyone Else (Major League Baseball)

February 22, 2012
By
The New York Yankees Payroll vs Everyone Else (Major League Baseball)

Description: Major League Baseball payrolls for all teams since 1985. The New York Yankees payroll is highlighted with results defined by the shape of the point.Data:http://www.baseball-databank.org/ Analysis: For years fans of Major League Baseball (MLB) have been crying...

Read more »

Non overlapping labels on a ggplot scatterplot

February 22, 2012
By
Non overlapping labels on a ggplot scatterplot

This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp … Continue reading...

Read more »