How logistic regression work ?

May 31, 2013
Discussing with a non statistician colleague, it seems that the logistic regression is not intuitive; Some basics questions like : - Why don't use the linear model? - What's logistic function? - How can we compute by hand, step by step t...

Generating Nice Looking Tree Diagrams in R

May 31, 2013
This function generates nice looking tree diagrams (see sample) below from tree objects (generated by package tree). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option)

Using the rasterVis package for raster plotting (in R)

May 31, 2013
Here is a post discussing the possibilities of the rasterVis package: http://rpubs.com/Lionel/6374Filed under: R and Stat Tagged: R, raster

Snowfall

May 31, 2013
Yesterday I had a short post reminding EViews users that their package (versions 7 or 8) will access all of the cores on a multi-core machine. I've been playing around with parallel processing in R on my desktop machine at work over the last few days. It's something I've been meaning to do...

The arteries of the world, in Tweets

May 31, 2013
What happens when you plot billions of geotagged Tweets on a map? You can see the arteries of the world. Here's Europe: According to creator Miguel Rios (Engineering Manager, Data Visualization at Twitter), the dots on this chart represent every geotagged Tweet since 2009. The color represents number of tweets in the region, and the intensity shows where people...

Are parallel computations worth it ?

May 31, 2013
$n$

Yesterday, Daniel Marcelino published an interesting post on his blog, untitled Parallel Processing: When does it worth ? I was asking myself the same question for a chapter I am currently writing. And I did like his approach, so I tried, on my computer to do the same. I did use three packages to run parallel R codes, >...

Visualizing a One-Way ANOVA using D3.js

May 31, 2013
A while ago I was playing around with the JavaScript package D3.js, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I wanted to make the visualization interactive, and I did integrate some interactive elements. For instance, if you hover over a data point it will show the residual, and its value will be highlighted in...

Regression regularization example

May 31, 2013
Recently I needed a simple example showing when application of regularization in regression is worthwhile. Here is the code I came up with (along with basic application of parallelization of code execution). Assume you have 60 observations and 50 expla...

accurate ABC: comments by Oliver Ratman [guest post]

May 31, 2013
Here are comments by Olli following my post: I think we found a general means to obtain accurate ABC in the sense of matching the posterior mean or MAP exactly, and then minimising the KL distance between the true posterior and its ABC approximation subject to this condition. The construction works on an auxiliary probability

Version control, gitbucket and SourceTree style

May 31, 2013
Last time I wrote about version control using Subversion (and its implementation in Eclipse). I still haven’t given up on it, but since I’m using a private repository, sharing code has been a bit tedious. I was introduced to git a while ago, but somehow decided to go for Subversion. A few days ago a

”How to draw the line” with ggplot2

May 30, 2013
In a recent tutorial in the eLife journal, Huang, Rattner, Liu & Nathans suggested that researchers who draw scatterplots should start providing not one but three regression lines. I quote, Plotting both regression lines gives a fuller picture of the data, and comparing their slopes provides a simple graphical assessment of the correlation coefficient. Plotting

my 1st post for the Guardian Australia

May 30, 2013
I’ll be contributing a piece about once a week for the Guardian Australia, under a part of the web site we’re calling The Swing. The set of graphs from my 1st effort were rendered in-line and rather low-res. Bigger, full res versions appear below; click on the in-line versions. It would be great to find

Uncovering the Unreliable Friend Distribution

May 30, 2013
Head down to your local hardware store and pick up a smoke detector. Pop off the cover and look inside. You’ll see a label that mentions Americium 241, a radioactive isotope. Put on your HEV suit, grab a pair of tweezers and a fine-tipped pen, and remove the 0.3 millionths of a gram of Americium.

Scenario analysis for option strategies.

May 30, 2013
Introduction to the project I am working on at the moment. It is more a playground for option strategies than "project" at this stage. The idea is to develop accurate scenario analysis on portfolio, which is based on a single underlying. In the past I ...

If…then in Japan

May 30, 2013
If Japan starts to spiral out of control, then what do they do? A spiral would be a sudden move higher in JGB rates with a simultaneous crash in the Japanese Yen. Their response would be to try to slow the positive feedback loop through external interv...

Ryan Sheftel: "R on the Trading Desk"

May 30, 2013
by Joseph Rickert In a post last week, I offered some first impressions about R/Finance 2013. Apparently, I was way off in estimating that 30% of the attendees were academics. The R/Finance organizers were quick to point out that percentage of academics attending the conference has been a constant 10% over the years; and this year was no different....

Are the Current Criteria for Empirically Supported Treatments Too Lenient?

May 30, 2013
The practice of classifying treatments as empirically supported (ESTs) has been widely debated for a long time. Recently Jessica Nasser published an article in the Journal of Contemporary Psychotherapy named “Empirically Supported Treatments and Efficacy Trials: What Steps Do We Still Need to Take?”. In the article the author raises several concerns and suggestions regarding the current use of...

Couch, apis and all that

May 30, 2013
It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures like a data.frame or plot. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that's returned from a query...

Are Fox News Polls Biased?

May 29, 2013
Especially after the outcome of the mid-term election, I think there is a common contention among some groups that there is something wonky about Fox News when it comes to reporting polls relative to President Obama and the Democratic Party in general....

Parallel Processing: When does it worth?

May 29, 2013
Most computers nowadays have few cores that incredibly help us with our daily computing duties. However, when statistical softwares do use parallelization for analyzing data faster? R, my preferred analytical package, does not take too much advantage of multicore processing by default. In fact, R has been inherently a “single-processor” package until nowadays. Stata, another

Two forthcoming R books

May 29, 2013
Today I learned about two forthcoming R books that I'm now looking forward to. The first is Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Max Kuhn is the author of the caret package, an extremely useful and powerful R package for fitting and optimizing all kinds of predictive models in R. It's available now on Amazon Kindle...

Shiny talk by Joe Cheng

May 29, 2013
Shiny is a framework work for creating web applications with R. Joe Cheng of RStudio, Inc. presented on Shiny last evening in Zillow's offices 30 stories up in the former WaMu Center. Luckily, the talk was interesting enough to compete with the view ...

Look Familiar? Mapping in R

May 29, 2013
For those who have been following the R-Bloggers this picture should be yesterdays news, but there was an article on the BBC on it, full story HERE. I find it interesting how it was on Drudge Report (www.drudgereport.com) which gets over 1 million hits...

Futile.logger 1.3.3 RC available

May 29, 2013
This is a preview release of futile.logger 1.3.3 so some people with feature requests can try out the code before …Continue reading »

Convert IP addresses to geolocation, latitude and longitude etc etc

May 29, 2013
Whoa! Now this is cool. It turns out there is a database at freegeoip.net which you can query for the location of a particular IP address. And as it has a neat little API for batches of IPs, you can … Continue reading →

Data sonification with R: the sound of Twitter data

May 29, 2013
What does a tweet sound like? Not the kind that flies around in the air, but the kind that zips to and from our mobile devices. I’m intensely interested in finding ways to make sense of data. Sonification of data – representing data with sound – offers one way to do that. This post steps

SAS Dominates Analytics Job Market; R up 42%

May 29, 2013
I’m continuing to gather and analyze data to update The Popularity of Data Analysis Software. In this installment I cover the latest employment figures. Employment is important to us all, so what software skills are employers seeking? A thorough answer … Continue reading →

Analysis of Cable Morning Trade Strategy

May 29, 2013
A couple of years ago I implemented an automated trading algorithm for a strategy called the “Cable Morning Trade”. The basis of the strategy is the range of GBPUSD during the interval 05:00 to 09:00 London time. Two buy stop orders are placed 5 points above the highest high for this period; two sell stop

Stepping up to Big Data with R and Python: A Mind Map of All the Packages You Will Ever Need

May 29, 2013
On May 8, we kicked off the transformation of R Users DC to Statistical Programming DC   (SPDC) with a meetup at iStrategyLabs in Dupont Circle. The meetup, titled “Stepping up to big data with R and Python,” was an … Continue reading →The post Stepping up to Big Data with R and Python: A Mind Map...