Monitoring des médias 2

May 6, 2013
(This article was first published on Learning Data Science , and kindly contributed to R-bloggers) Petit monitoring de notre observatoire des médias sur Twitter. Chez Mediapart : Le Monde Le Figaro Le parisien Vue globale Le code pour réaliser ce post : To leave a comment for the author, please follow the link and comment on their blog: Learning...

Creating a QGIS-Style (qml-file) with an R-Script

May 6, 2013
How to get from a txt-file with short names and labels to a QGIS-Style (qml-file)? I used the below R-script to create a style for this legend table where I copy-pasted the parts I needed to a txt-file, like for the WRB-FULL (WRB-FULL: Full soil code o...

The half variance approximation for mean returns

May 6, 2013
What’s that thing about arithmetic and geometric returns and the variance? Previously An introduction to the difference between simple and log returns is: A tale of two returns Issue Suppose you are predicting the mean annual return of an asset for some number of years.  To simplify the discussion, let’s buy into the fantasy that … Continue reading...

analyze the social security administration public use microdata files (ssapumf) with r

May 5, 2013
the social security administration (ssa) must be overflowing with quiet heroes, because their public-use microdata files are as inconspicuous as they are thorough.  sure, ssa publishes enough great statistical research of their own that outside re...

Google Analytics + R = FUN!

May 5, 2013
The scope of this post it to show how simple it is to get data out of the Google Analytics and create your own reports (that you hope that they can be semi-automated at least) and you favourite statistical graphs (those that GA is currently missing). As you already know R is a favourite tool

… ridiculously photogenic factors (heatmap with p-values)

May 5, 2013
Some months ago, I had to explore a vast amount of categorical variables before making some multivariate analyses. One good way to know your raw data, to make new hypotheses…etc, is to calculate some pairwise “crude” chi-square tests of independence … Sigue leyendo →

How to Calculate a Partial Correlation Coefficient in R: An Example with Oxidizing Ammonia to Make Nitric Acid

Introduction Today, I will talk about the math behind calculating partial correlation and illustrate the computation in R with an example involving the oxidation of ammonia to make nitric acid using a built-in data set in R called stackloss.  In a separate post, I will also share an R function that I wrote to estimate partial correlation.

R, D3.js and SNA Course

I took the SNA course by Lada Adamic in coursera. It's a super interesting course. In fact, I was using the networks only how a visualization tool, and that is what it make me little bit embarrassing because there are more, a lot of more. You can detec...

R/Finance 2013 Is Coming Quickly…

May 5, 2013
There is about two weeks remaining until R/Finance 2013 - being held on May 17th and 18th at UIC in Chicago.  Make sure you register beforehand to ensure you have a spot, and – yes - you do want to come to the conference dinner on Friday.   I am particularly excited about the lineup of keynotes

Simulation shows gain of clmm over ANOVA is small

May 5, 2013
After last post's setting up for a simulation, it is now time to look how the models compare. To my disappointment with my simple simulations of assessors behavior the gain is minimal. Unfortunately, the simulation took much more time than I ...

Volatility Regimes: Part 2

Adam Duncan from January, 2013Also avilable on R-bloggers.com Strategy Implications In this part of the volatility regimes analysis, we’ll use the regime identification framework established in part 1 to draw conclusions about which strategies work best is each regime. That should prove useful to us and goes a long way to answering the question, “What strategies should I be...

Quandl Package – 5,000,000 free datasets at the tip of your fingers!

May 5, 2013
# Yes, you read that correctly and no Quandl (http://www.quandl.com/) did not pay me anything.# Quandl is a new database management tool which seeks to become the place to find datasets.  They boast of having over 5x10^6 data sets available t...

AIC & BIC vs. Crossvalidation

May 4, 2013
Model selection is a process of seeking the model in a set of candidate models that gives the best balance between model fit and complexity (Burnham & Anderson 2002). I have always used AIC for that. But you can also…Read more →

A Prototype of Monotonic Binning Algorithm with R

May 4, 2013
I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (http://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (http://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the

Backporting R 3.0.0 to Quantal, Precise, and Lucid

May 4, 2013
Today (May 4, 2013) I will begin the process of backporting R 3.0.0 to Quantal, Precise, and Lucid. This will include all the recommended packages and the packages for R found in the universe repository for Ubuntu. Things to keep in mind: If you do...

LaTeX in R graphs

May 3, 2013
A nice post was recently published on the rsnippets blog, about the tikzDevice R package. This package is – indeed – awesome. Even if it has been removed from the CRAN website. Of course, it can be download from the archive folder, on http://cran.r-project.org/…, but also (for a more recent version)  on http://download.r-forge.r-project.org/…. But first, it is necessary to install...

Animation, from R to LaTeX

May 3, 2013
$X_{i,j}\sim\mathcal{B}(1/2)$

Just a short post, to share some codes used to generate animated graphs, with R. Assume that we would like to illustrate the law of large number, and the convergence of the average value from binomial sample. We can generate samples  using > n=200 > k=1000 > set.seed(1) > X=matrix(sample(0:1,size=n*k,replace=TRUE),n,k) Each row  will be a trajectory of heads and...

Old Post with New d3 Life–GARCH and MA Performance

May 3, 2013
Parallel coordinates become much more useful when they are interactive, so I recreated one of my favorite blog posts "Trend is Not Your Friend" Applied to 48 Industries and convert the chart to a living breathing d3 parallel coordinates chart courtesy ...

Extending RevoScaleR for Mining Big Data – Naive Bayes

May 3, 2013
by Derek McCrae Norton, Senior Sales Engineer In this third installment (following part 1 and part 2) of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to create a Naive Bayes model. Motivation: Fit a Naive Bayes model to big data. Naive Bayes is a simple probabilistic classifier based...

All About Spherically Distributed Regression Errors

May 2, 2013
This post is based on a handout that I use for one of my courses, and it relates to the usual linear regression model,                                   y = Xβ + ε In our list of standard assumptions about the error term in this linear multiple regression...

Improved R Profiling Summaries

May 2, 2013
In my last post I mentioned that I had improved on R’s summaryRprof() function with a custom function called proftable(). I’ve updated proftable() to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below. proftable reads in a file generated by...

…learning LaTeX, from scratch!

May 2, 2013
” LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation, and is the de facto standard for the communication and publication of scientific documents.” It is also… Free and Open. Specially … Sigue leyendo →

How R Grows

May 2, 2013
by Joseph Rickert Saturday morning I was drinking my coffee wondering how much effort goes into R worldwide. (It’s my job.) I noticed that there were 4469 packages on CRAN, and it occurred to me that tabulating the packages by publication date would give some indication of how much effort is being expended to improve packags and keep them...

Changing The Presidential Election with R in the Browser

May 2, 2013
After I finished with the tutorial post d3 <- R with rCharts and slidify and then saw R creates d3/javascript charts in Ipython Style Notebook, a light clicked.  I could finally answer the lingering question I have had ever since I saw the NYT ...

Do Torontonians Want a New Casino? Survey Analysis Part 1

May 2, 2013
Toronto City Council is in the midst of a very lengthy process of considering whether or not to allow the OLG to build of a new casino in Toronto, and where.  The process started in November of 2012, and set … Continue reading →

Why Blog?

May 2, 2013
The Blog Review ProcessA series of events in my life have lead me to reconsider the value of blogging.The Back StoryShort story: I got fired.Long story: Recently I was hired to write occasional blog posts for Quandl. They probably figured that due to m...