More on higher moments: rolling skewness of S&P 500 daily returns

October 15, 2011
By
More on higher moments: rolling skewness of S&P 500 daily returns

In this post, Portfolio Probe explores a way to decide whether market kurtosis and skewness are predictable. Market skewness, in naive financial modeling, is some kind of measure of (as-)symmetrical distribution of (daily) returns around the average market return. A higher skewness would tend to indicate a denser distribution of higher returns, compared to lower

Read more »

Once you’re comfortable with 2-arrays and 2-matrices, you…

October 15, 2011
By
Once you’re comfortable with 2-arrays and 2-matrices, you…

Once you’re comfortable with 2-arrays and 2-matrices, you can move up a dimension or two, to 4-arrays or 4-tensors. You can move up to a 3-array / 3-tensor just by imagining a matrix which “extends back into the blackboard”. Like a 5 × 5 ma...

Read more »

Once you’re comfortable with 2-arrays and 2-matrices, you…

October 15, 2011
By
Once you’re comfortable with 2-arrays and 2-matrices, you…

Once you’re comfortable with 2-arrays and 2-matrices, you can move up a dimension or two, to 4-arrays or 4-tensors. You can move up to a 3-array / 3-tensor just by imagining a matrix which “extends back into the blackboard”. Like a 5 × 5 ma...

Read more »

Principal component analysis : Use extended to Financial economics : Part 1

October 15, 2011
By
Principal component analysis : Use extended to Financial economics : Part 1

While working for my Financial economics project I came across this elegant tool called Principal component analysis (PCA)which is an extremely powerful tool when it comes to reducing the dimentionality of a data set comprising of highly correlated var...

Read more »

Random art on the web

October 15, 2011
By
Random art on the web

Since we explored some statitics of an abstract painting with Pierre (we even have an article in Variances last issue!), I became more sensitive to art linked to randomness. Here are some pointers to related websites I have digged out. Random.org, mentioned here by Pierre, is, at it reads, a true random number service that

Read more »

Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig

October 14, 2011
By
Free auditing of Stanford AI and Machine Learning Courses w/Peter Norvig

Just wanted to notify viewers of a few great courses that are being offered free for auditing and/or participation by well known industry experts, including co-author of the classic text on AI, 'Artificial Intelligence: A Modern Approach,' Peter Norvig...

Read more »

Maximum Loss and Mean-Absolute Deviation risk measures

October 14, 2011
By
Maximum Loss and Mean-Absolute Deviation risk measures

During construction of typical efficient frontier, risk is usually measured by the standard deviation of the portfolio’s return. Maximum Loss and Mean-Absolute Deviation are alternative measures of risk that I will use to construct efficient frontier. I will use methods presented in Comparative Analysis of Linear Portfolio Rebalancing Strategies: An Application to Hedge Funds by

Read more »

Trading Mean Reversion with Augen Spikes

October 14, 2011
By
Trading Mean Reversion with Augen Spikes

One of the more interesting things I have come across is the idea of looking at price changes in terms of recent standard deviation, a concept put forward by Jeff Augen. The gist is to express a close to close return as a function of the standard devia...

Read more »

New food web dataset

October 14, 2011
By
New food web dataset

So, there is a new food web dataset out that was put in Ecological Archives here, and I thought I would play with it. The food web is from Otago Harbour, an intertidal mudflat ecosystem in New Zealand. The web contains 180 nodes, with 1,924 links. Fu...

Read more »

Implementing K-means clustering for Hadoop in R and Java

October 14, 2011
By
Implementing K-means clustering for Hadoop in R and Java

At the Bay Area R User Group meeting this week, Antonio Piccolboni gave an overview of the design goals and implementation of the RHadoop Project packages that connect Hadoop and R: rhdfs, rhbase and rmr: (The image above was captured from Antionio's slides.) The most revealing part of the talk for me was the comparison of implementing the K-means...

Read more »

Tomorrow: ACM Data Mining Camp at eBay

October 14, 2011
By

If you're in the Bay Area, tomorrow would be a great day to head down to San José for the ACM Data Mining Camp. Hundreds of data scientists, data hackers and data miners will be there for a fun "unconference", with talks and practical sessions organized on the spot according to demand. Revolution Analytics is proud to be a...

Read more »

Mining Lending Club’s Goldmine of Loan Data Part I of II – Visualizations by State

October 14, 2011
By
Mining Lending Club’s Goldmine of Loan Data Part I of II – Visualizations by State

I have a few friends that keep bragging about their 14% annual returns by investing their money with Lending Club, a peer-to-peer lending service that cuts out the complexities and difficulties of getting approved for a loan through a bank. To give you an idea of the sheer amount of volume Lending Club has been

Read more »

Another Mystery: sas7bdat != sd2

October 14, 2011
By

I received an email from a very inconvenienced statistician a few weeks ago. The problem was an old data file with the extension .sd2. Apparently, this is an obsolete data storage format used by past versions of SAS. A quick glance at the file contents revealed that this sd2 formatted file is incompatible with the

Read more »

principles of uncertainty

October 13, 2011
By
principles of uncertainty

“Bayes Theorem is a simple consequence of the axioms of probability, and is therefore accepted by all as valid. However, some who challenge the use of personal probability reject certain applications of Bayes Theorem.“  J. Kadane, p.44 Principles of uncertainty by Joseph (“Jay”) Kadane (Carnegie Mellon University, Pittsburgh) is a profound and mesmerising book on

Read more »

plyr, ggplot2 and triathlon results, part II

October 13, 2011
By
plyr, ggplot2 and triathlon results, part II

I ended my previous post by mentioning how one could imagine other ways of looking at the triathlon data with plyr and ggplot2. I couldn’t help but carry on playing with it so here are more stats and graphs from … Continue reading →

Read more »

System in 10 Minutes After Twitter

October 13, 2011
By
System in 10 Minutes After Twitter

On Twitter last night, I spotted @milktrader from www.algorithmzoo.com doing some range research on equity indexes.  I offered a tweet on the crazy Russell 2000 17% move over 7 days.  Within 10 minutes, I discovered a signal that worked very ...

Read more »

Maximum likelihood

October 13, 2011
By
Maximum likelihood

This post is one of those ‘explain to myself how things work’ documents, which are not necessarily completely correct but are close enough to facilitate understanding. Background Let’s assume that we are working with a fairly simple linear model, where … Continue reading →

Read more »

There’s a lot to like about R

October 13, 2011
By

I once heard John Chambers (the inventor of the S language, and member of the R Core Group) say, "Show me a programming language no-one complains about, and I'll show you a language no-one uses". The R language has its fair share of complainants, to be sure -- and that's to be expected for a language with more than...

Read more »

Waiting in line, waiting on R

October 13, 2011
By
Waiting in line, waiting on R

I should state right away that I know almost nothing about queuing theory. That’s one of the reasons I wanted to do some queuing simulations. Another reason: when I’m waiting in line at the bank, I tend to do mental calculations for how long it should take me to get served. I look at the

Read more »

Example 9.9: Simplifying R using the mosaic package (part 1)

October 13, 2011
By
Example 9.9: Simplifying R using the mosaic package (part 1)

While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time. RThe mosaic package is designed to help simplify the interface for such new users, while allowing ...

Read more »

Phylogenetic community structure: PGLMMs

October 13, 2011
By
Phylogenetic community structure: PGLMMs

So, I've blogged about this topic before, way back on 5 Jan this year.Matt Helmus, a postdoc in the Wootton lab at the University of Chicago, published a paper with Anthony Ives in Ecological Monographs this year (abstract here).  The paper addres...

Read more »

Modelling with R: part 4

October 13, 2011
By

In part 3, we ran a logistic model to determine the probability of default of a customer. We also saw the outputs and tried to judge the performance of the model b plotting the ROC curve. Let's try a different approach today. How about a decision tree?...

Read more »

Introduction to Asset Allocation

October 12, 2011
By
Introduction to Asset Allocation

This is the first post in the series about Asset Allocation, Risk Measures, and Portfolio Construction. I will use simple and naive historical input assumptions for illustration purposes across all posts. In these series I plan to discuss: Maximum Loss, MAD, CVaR, CDaR, Omega Risk Measures 130:30 Long/Short portfolios and Cardinality Constraints Arithmetic and Geometric

Read more »

S&P 500 components heatmap in R

October 12, 2011
By
S&P 500 components heatmap in R

In this article, Hans Gilde exposes the clever use of a heatmap hidden in the Bioconductor library. In his example, he describes a way to show different ‘observations’ on subjects, with the concept of time. Financial indices, like the S&P 500 or the Dow Jones indices, are mathematically some kind of measure of overall market

Read more »

A true data-doodler – Christophe Ladroue (R ddly and plyr on Triathlon Results)

October 12, 2011
By

To me, this post by Christophe Ladroue personifies what data doodlers do.They take a dataset that is of interest to them (In his case, his triathlon results) and then they manipulate the numbers to see what insights can be drawn. Most bloggers only sho...

Read more »

Typos in Introduction to Monte Carlo Methods with R

October 12, 2011
By
Typos in Introduction to Monte Carlo Methods with R

The two translators of our book in Japanese, Kazue & Motohiro Ishida, contacted me about some R code mistakes in the book. The translation is nearly done and they checked every piece of code in the book, an endeavour for which I am very grateful! Here are the two issues they have noticed (after incorporating

Read more »

Bay Area R Users group has 1300 members

October 12, 2011
By

Impressive. You are not alone!

Read more »

Percentage of Organic Farming Operations by State

October 12, 2011
By
Percentage of Organic Farming Operations by State

With data from the USDA on certified organic farms for 2008.  I created a map using the Geo Map function from the googleVis API package available in R.  I’ve copied and pasted the image below as WordPress.com sites don’t support … Continue reading →

Read more »

Slides and replay for "Introduction to R for SAS and SPSS users"

October 12, 2011
By

If you missed last week's webinar from Bob Muenchen, "Introduction to R for SAS and SPSS users", you missed a great overview of the R Project and how it compares to commercial statistical software. Bob's slides are below, and you can download the slides and replay from the Revolution Analytics website. Bob pointed out a couple of really useful...

Read more »