New Powerball (lottery) Rules Will Cost You More

December 16, 2011
By

The popular news are reporting that the Multi-State Lottery Commission (MUSL) will change the rules for their lottery game Powerball, effective Jan. 15, 2012. I sent an email to the MUSL (at 8:00am Dec, 14th) asking for the new official rules, but haven't received a response yet (as of 10:30am Dec, 16th). Hence, these

Read more »

Optimal regularization for smoothing splines

December 16, 2011
By
Optimal regularization for smoothing splines

In smooth.spline procedure one can use df or spar parameter to control smoothing level. Usually they are not set manually but recently I was asked a question which one of them is a better measure of regularizatio...

Read more »

SVN Version Control, R, and some rambling thought on AWS,Rscripts

December 16, 2011
By

I do a alot of my modelling on Rstudio hosted on EC2 instances. If you don’t use, I would highly recommend. A brilliant tool. Kudos to the Rstudio team. I have made a personal and professional pledge to obsessively use version control. I hope to show...

Read more »

CrossValidated: A place to post your statistics questions

December 16, 2011
By
CrossValidated:  A place to post your statistics questions

Seth Rogers writes: I am a member of an online community of statisticians where I burn a great deal of time (and a recovering cog sci researcher). Our community website is a peer-reviewed Q and A spanning stats topics ranging from applications to mathematical theory. Our online community consists of mostly university faculty, grad The post CrossValidated:...

Read more »

Psycho dice and Monte Carlo

December 16, 2011
By
Psycho dice and Monte Carlo

Following Pierre’s post on psycho dice, I want here to see by which average margin repeated plays might be called influenced by mind will. The rules are the following (exerpt from the novel Midnight in the Garden of Good and Evil, by John Berendt): You take four dice and call out four numbers between one

Read more »

The Bay Area R User Group Meeting on Data Mining with R

December 16, 2011
By

By Joseph Rickert Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on...

Read more »

Backtesting Rebalancing methods

December 15, 2011
By
Backtesting Rebalancing methods

I wrote about Rebalancing in the Asset Allocation Process Summary post. Deciding how and when to rebalance (update the portfolio to the target mix) is one of the critical steps in the Asset Allocation Process. I want to study the portfolio performance and turnover for the following Rebalancing methods: Periodic Rebalancing: rebalance to the target

Read more »

Update your Windows PATH – revisited

December 15, 2011
By
Update your Windows PATH – revisited

Yihui got me psyched a little about GitHub After my last post about running your R infrastructure from an USB drive, he commented on my function that would update the Windows PATH (which is at least important for R and Rtools). Now I found some time to polish it a little. Feel free to test … Continue reading...

Read more »

EMC survey differentiates BI and Data Science

December 15, 2011
By
EMC survey differentiates BI and Data Science

EMC last week published the results of a survey of 462 IT decision makers who self-identified as either a data scientist or business intelligence professional (plus 35 invitees who were attendees at the EMC Data Scientist Summity and/or Kaggle competitors). There's a nice summary of the conclusions at the EMC blog, (where data scientists are described as "The New...

Read more »

Bayesian inference and the parametric bootstrap

December 15, 2011
By
Bayesian inference and the parametric bootstrap

This paper by Brad Efron came to my knowledge when I was looking for references on Bayesian bootstrap to answer a Cross Validated question. After reading it more thoroughly, “Bayesian inference and the parametric bootstrap” puzzles me, which most certainly means I have missed the main point. Indeed, the paper relies on parametric bootstrap—a frequentist

Read more »

Query a MySQL Database from R using RMySQL

December 15, 2011
By

I use this all the time, and the setup is dead simple. Follow the code below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to ...

Read more »

With Size, Does Risk–>Return?

December 15, 2011
By
With Size, Does Risk–>Return?

A basic tenet in finance is that higher risk should lead to higher return as the time horizon stretches to infinity.  However, in bonds, higher risk has not meant higher return with either credit risk (high-yield) or long duration risk (maturity &...

Read more »

RMySQL: using the latest MySQL version

December 15, 2011
By
RMySQL: using the latest MySQL version

In order to connect my R related stuff to a webserver and MySQL I went all the way from using xampp to setting up my own (W)AMPP  (Apache MySQL PHP Perl) to finally back to xampp. And I’m quite happy about this very last switch to xampp! Why I like xampp it can be run … Continue reading...

Read more »

Conversion of Several Variables to Factors

December 15, 2011
By
Conversion of Several Variables to Factors

..often needed when preparing data for analysis (and usually forgotten until I need it for the next time).With the below code I convert a set of variables to factors - it could be that there are slicker ways to do it (if you know one let me know!) > da...

Read more »

R / Finance 2012 Call for Papers

December 15, 2011
By

Last night, the text below went out to r-sig-finance along with updates to the R/Finance website and its Call for Papers page; followed by some tweeting and Goggle+'ing (and please do feel free to retweet and share at will...) Call for Papers: ...

Read more »

Volatility estimation and time-adjusted returns

December 15, 2011
By
Volatility estimation and time-adjusted returns

Do non-trading days explain the mystery of volatility estimation? Previously The post “The volatility mystery continues” showed that volatility estimated with daily data tends to be larger (in recent years) than when estimated with lower frequency returns. Time adjusting One of the comments — from Joseph Wilson — was that there is a problem with … Continue reading...

Read more »

R pitfall #3: friggin’ factors

December 15, 2011
By

I received an email from one of my students expressing deep frustation with a seemingly simple problem. He had a factor containing names of potato lines and wanted to set some levels to NA. Using simple letters as example names … Continue reading →

Read more »

R/Finance 2012 Call for Papers

December 15, 2011
By

I'm excited to share the call for papers for the upcoming R/Finance conference.  Even if you don't submit a presentation, I hope to see you there!Call for Papers:R/Finance 2012: Applied Finance with RMay 11 and 12, 2012University of Illinois, Chic...

Read more »

More orthodox ARMA/GARCH trading

December 14, 2011
By
More orthodox ARMA/GARCH trading

The system described in the earlier series for ARMA trading was in fact an “extreme” version of the more common, orthodox approach prevailing in the literature. Recently I tried using R to reproduce the results of a particular paper, and that lead to a lot of new developments … How is typically ARMA trading simulated?

Read more »

CalendaR 2012 with ggplot2

December 14, 2011
By
CalendaR 2012 with ggplot2

Season’s Greetings Hi, dear R-bloggers and its readers. Here in Japan it’s very cold now. The end

Read more »

CloudStat’s Infographic with…

December 14, 2011
By
CloudStat’s Infographic with…

CloudStat’s Infographic with Piktocharthttp://visual.ly/cloudstat-infographic CloudStat is a cloud-based statistical platform that allows users to analyze data in the cloud at anywhere, any time, across multiple platform while connecting with the exp...

Read more »

List of public databases from the Washington Post

December 14, 2011
By

The Washington Post has put together an excellent list of publicly-accessible databases, ideal for analysis using the R language. You'll find databases from the domains of Census and Demographics (for example, the American Community Survey), Crime and Courts (like this list of federal prosecutions), Transportation and Development (e.g. older driver crash statistics); Health and Safety; Real Estate and Business;...

Read more »

Revolution Newsletter: December 2011

December 14, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full December edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Applications of R in Business Contest. A judging panel of industry experts and R...

Read more »

… And now for solution 17, still using Rcpp

December 14, 2011
By

Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time Essentially, my previous version (15) was using stringstream although we don't really need its functionality ...

Read more »

R / Finance 2012 Call for Papers

December 13, 2011
By

Last night, the text below went out to r-sig-finance along with updates to the R/Finance website and its Call for Papers page; followed by some tweeting and Goggle+'ing (and please do feel free to retweet and share at will...) Call for Papers: ...

Read more »

RcppArmadillo 0.2.34

December 13, 2011
By

And another quick bugfix release by Conrad Sanderson made it version 2.4.2 bug of Armadillo. And this is in RcppArmadillo release 0.2.34 which got to CRAN this morning The NEWS entry below summarises the changes. 0.2.34 2011-12-12 o Upgr...

Read more »

Maximum Covariance Analysis (MCA)

December 13, 2011
By
Maximum Covariance Analysis (MCA)

Maximum Covariance Analysis (MCA) (Mode 1; scaled) of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S.  MCA coefficients (scaled) are below. The mode represents 94% of the squared covariance fraction (SCF).Maximum Correlation Analysis...

Read more »

Data is the new gold

December 13, 2011
By
Data is the new gold

We need more data journalism. How else will we find the nuggets of data and information worth reading? Life should become easier for data journalists, as the Guardian, one of the data journalism pioneers, points out in this article about the new open ...

Read more »

Unshorten (almost) any URL with R

December 13, 2011
By
Unshorten (almost) any URL with R

Introduction I was asked by a friend how to find the full final address of an URL which had been shortened via a shortening service (e.g., Twitter’s t.co, Google’s goo.gl, Facebook’s fb.me, dft.ba, bit.ly, TinyURL, tr.im, Ow.ly, etc.). I replied I had no idea and maybe he should have a look over on StackOverflow.com or, possibly,

Read more »