Experiences with using SAS and R in insurance and banking

August 23, 2011
By

Hong Ooi talks about some of the more interesting projects that he has used R for in the last year. These include fitting models for mortgage loss given default, a Monte Carlo application for stress-testing loan portfolios (in combination with Excel an...

Read more »

A warning on the R save format

August 23, 2011
By
A warning on the R save format

The save() function in the R platform for statistical computing is very convenient and I suspect many of us use it a lot. But I was recently bitten by a “feature” of the format which meant I could not recover my data. I recommend that you save data in a data format (e.g. CSV or CDF), not using...

Read more »

A warning on the R save format

August 23, 2011
By
A warning on the R save format

The save() function in the R platform for statistical computing is very convenient and I suspect many of us use it a lot. But I was recently bitten by a “feature” of the format which meant I could not recover my data. I recommend that you save data in a data format (e.g. CSV or CDF), not using...

Read more »

Maiden voyage

August 23, 2011
By
Maiden voyage

Who Me. I'm an associate professor of Statistics at Youngstown State University in Youngstown, Ohio, USA. I've been using R for about 7 years, Emacs about 3 years, git about 1 year, and Org-Mode for less than a year. What I want this blo...

Read more »

Subjugation to the Sigmas

August 23, 2011
By
Subjugation to the Sigmas

No doubt you've heard about the tyranny of the 9s in reference to computer system availability. You're probably also familiar with the phrase six sigma, either in the context of manufacturing process quality control or the improvement of business processes. As we discovered in the recent Guerrilla Data Analysis Techniques class, the two concepts are related.

Read more »

Popular topics at the BioStar Q&A site

August 23, 2011
By
Popular topics at the BioStar Q&A site

Which topics are the most popular at the BioStar bioinformatics Q&A site? One source of data is the tags used for questions. Tags are somewhat arbitrary of course, but fortunately BioStar has quite an active community, so “bad” tags are usually edited to improve them. Hint: if your question is “How to find SNPs”, then

Read more »

Drawdown Visualization

August 22, 2011
By
Drawdown Visualization

Drawdown is my favorite measure of risk.  It picks up extended autocorrelated pain often not seen in risk measures, and best illustrates frustration, panic, and loss of confidence (Drawdown Control Can Also Determine Ending Wealth).  I though...

Read more »

More useR! 2011 roundups

August 22, 2011
By

If you missed last week's worldwide R user conference at the University of Warwick, several attendees have posted informative roundups of the event. Check out these posts from Patrick Burns, Karl Broman, Colin Gillespie, Pairach Piboonrungroj and Richie Cotton (which features a rare, good Statistics joke). My own roundup of the conference was posted on Friday, in case you...

Read more »

Webinar Wednesday Aug 24: Revolution R Enterprise, 100% R and More

August 22, 2011
By

A heads-up that I'll be giving a free webinar this Wednesday, August 24. In 30 minutes, I'll give an overview of the open-source R project and the additional features of Revolution R Enterprise: R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution...

Read more »

Bayesian analysis: Comparing algorithms Part 1?

August 22, 2011
By
Bayesian analysis: Comparing algorithms Part 1?

I recently had the opportunity to engage in some Bayesian analysis at work. I was able to state the problem in terms of the lognormal distribution, and took advantage of JAGS and its integration with "R" using the R2jags package. The client was very ha...

Read more »

Tenure track position in systematics at the University of Vermont

August 22, 2011
By
Tenure track position in systematics at the University of Vermont

There is an awesome position opening up for an assistant professor in systematics at the University of Vermont. Below is the announcement, and see the original post at the Distributed Ecology blog. Why is this related to R? One can do a lot of systemat...

Read more »

RTextTools v1.3 Released + Rstem Now Available on CRAN

RTextTools v1.3 was released on August 21, and the package binaries are now available on CRAN. This update fixes a major bug with the stemmers, and it is highly recommended you upgrade to the latest version. Other changes include optimization of existing functions and improvements to the documentation.Additionally, Duncan Temple Lang has graciously released Rstem on CRAN, meaning that the...

Read more »

RTextTools v1.3 Released + Rstem Now Available on CRAN

RTextTools v1.3 was released on August 21, and the package binaries are now available on CRAN. This update fixes a major bug with the stemmers, and it is highly recommended you upgrade to the latest version. Other changes include optimization of existing functions and improvements to the documentation.Additionally, Duncan Temple Lang has graciously

Read more »

SIGKDD 2011 Conference — Day 1 (Graph Mining and David Blei/Topic Models)

August 22, 2011
By
SIGKDD 2011 Conference — Day 1 (Graph Mining and David Blei/Topic Models)

I have been waiting for the KDD conference to come to California, and I was ecstatic to see it held in San Diego this year. AdMeld did an awesome job displaying KDD ads on the sites that I visit, sometimes multiple times per page. That’s good targeting! Mining and Learning on Graphs Workshop 2011 I had originally planned to attend the...

Read more »

R Code for Bow Tie Plots

August 22, 2011
By

Earlier, I discussed the nice properties of bow tie plots for visualizing and understanding inferences from simple randomized treatment experimental designs. R code to quickly create these plots is available here. You can use the command source("htt...

Read more »

Last and final on Richter’s painting

August 22, 2011
By
Last and final on Richter’s painting

For a quick recap, Pierre and I supervised a team project at Ensae last year, on a statistical critique of the abstract painting 1024 Colours by painter Gerhard Richter. The four students, Clémence Bonniot, Anne Degrave, Guillaume Roussellet and Astrid Tricaud, did an outstanding job. Here is a selection of graphs and results they produced.

Read more »

Recession forecasting II: Assessing Hussman’s Accuracy

August 22, 2011
By
Recession forecasting II: Assessing Hussman’s Accuracy

In my last post on recessions, I implemented John Hussman's Recession Warning Composite in R. In this post I will examine how well this index performs and discuss how we might improve it. If you would like to follow along at home, be sure to run the ...

Read more »

More useless statistics

August 22, 2011
By
More useless statistics

Over at the ExploringDataBlog, Ron Pearson just wrote a post about the cases when means are useless. In fact, it’s possible to calculate a whole load of stats on your data and still not really understand it. The canonical dataset for demonstrating this (spoiler alert: if you are doing an intro to stats course, you

Read more »

A view of useR!2011

August 22, 2011
By
A view of useR!2011

Start Brian Ripley The conference was opened with a talk by Brian Ripley.  I’ll distort his talk into 3 points that came across to me. 1. R Core is finite The time available from R Core members is a strictly limited good.  The more that is pushed onto R Core, the less attention to details.  … Continue reading...

Read more »

The performance cost of a for-loop, and some alternatives

August 21, 2011
By

I’ve recently been spending a lot of time running various simulations in R. Because I often use snow to perform simulations across several computers/cores, results typically come back in the form of a list object. Summarizing the results from a list … Continue reading →

Read more »

tty Connection + sas7bdat: useR! 2011 Presentation Slides

August 21, 2011
By
tty Connection + sas7bdat: useR! 2011 Presentation Slides

Experimenting with a tty Connection for R I presented twice at this years useR!. The first was a regular talk on the tty connection patch for R. The talk went smoothly, despite a live demonstration using the DLP-232PC data acquisition module (datasheet). The slides for this presentation are here: shotwell-tty-useR-2011.pdf The image above is a

Read more »

Prime testing function in R

August 20, 2011
By
Prime testing function in R

I was hoping to begin tinkering a bit with the multicore package in R beyond some extremely trivial examples.  Thanks to a combination of R’s dumb quirkiness (for example, being worthless on loops), my poor planning, and general bad programming, my Saturday afternoon tinkering project is ultimately worthless in fulfilling that purpose. I was really

Read more »

useR! Conference 2011 highlights

August 20, 2011
By
useR! Conference 2011 highlights

I was at the useR! Conference at The University of Warwick in Coventry, UK, last week. My goal in going was to learn the latest things regarding (simple) dynamic graphics, (simple) web-based apps, parallel computing, and memory management (dealing with big data sets). I got just what I was hoping for and more. There are

Read more »

Statistical Analysis Functions in R

August 20, 2011
By

Lately, I've been using statistical tests on a daily basis. I've noticed that I have to format my data the same way in order to get it into R (tab-delimited flat file essentially). Every other change in order to prep that data structure for any sort of...

Read more »

When are averages useless?

When are averages useless?

Of all possible single-number characterizations of a data sequence, the average is probably the best known.  It is also easy to compute and in favorable cases, it provides a useful characterization of “the typical value” of a sequence of numbers.  It is not the only such “typical value,” however, nor is it always the most useful one: two other...

Read more »

Reading & plotting phylogenies

August 20, 2011
By

This is a basic procedure, but could come handy. I have been reading and doing a basic manipulations with phylogenetic trees a lot lately, so there is a chunk of code for this. > library (ape)   # ape is a … Continue reading →

Read more »

Statistical construction error

August 20, 2011
By
Statistical construction error

Yes, the title is meant to have two readings. The effect The Numbers Guy, among other examples, talks about the UK Office for National Statistics needing to revise its estimate for the construction sector output because of an error. Original: 2.3% growth Corrected: 0.5% growth Here is the Telegraph article cited by The Numbers Guy. … Continue reading...

Read more »

useR!2011

August 19, 2011
By
useR!2011

useR!2011 ended yesterday. First of all, much thanks to the organizers who managed to run a conference with 400+ participants, from 41 countries smoothly. Thumbs up! It was great to meet some people from the R blog-O-sphere in person, like … Continue reading →

Read more »

useR!2011

August 19, 2011
By
useR!2011

useR!2011 ended yesterday. First of all, much thanks to the organizers who managed to run a conference with 400+ participants, from 41 countries smoothly. Thumbs up! It was great to meat some people from the R blog-O-sphere in person, like Tal "R-blogg...

Read more »