In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...
Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...
THIS IS NOT INVESTMENT ADVICE AND WILL PROBABLY WIPE OUT ALL YOUR MONEY IF PURSUED. While exploring utilities, I discovered a strange phenomenon that I have not quite thoroughly understood, but I attribute to the business cycle. If I dust o...
In yesterday's webinar, "New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis", Sue Ranney demonstrated the features of the RevoScaleR big data analysis package included with Revolution R Enterprise. In the webinar, she showed how to use the rxImport function to import big data sets from SAS, SPSS ... [Read more...]
I gave a talk today on doing very basic phylogenetics in R, including getting sequence data, aligning sequence data, plotting trees, doing trait evolution stuff, etc.Please comment if you have code for doing bayesian phylogenetic inference in R. ...
The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are ... [Read more...]
Converting HTML to plain text usually involves stripping out the HTML tags whilst preserving the most basic of formatting. I wrote a function to do this which works as follows (code can be found on github): The above uses an XPath approach to achieve it’s goal. Another approach would ... [Read more...]
Of course, a few days before I leave for a much needed vacation, USA Today released their updated NCAA coaching salary database. For sports junkies, there’s an unlimited number of analysis and visualizations that can be done on the data. I took a quick break from packing to condense ... [Read more...]
During the final stage of asset allocation process we have to decide how to implement our desired allocation. In many cases we will allocate capital to the mutual fund managers who will invest money according to their fund’s mandate. Usually there is no perfect relationship between asset classes and ... [Read more...]
The setup Dan Meyer, a (former?) math teacher with some extraordinary ideas, has a nifty concept for teaching expected values: “So one month before our formal discussion of expected value, I’d print out this image, tack a spinner to it, … Continue reading → [Read more...]
The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full November edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. R Training from Hadley Wickham: The ... [Read more...]
Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...
Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading → [Read more...]
I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a ... [Read more...]
When looking for functions whose exact name is unknown # Functions related to “shrinkage” methods help.search(“shrinkage”) Package sos does a great job in finding functions install.packages(“sos”) library(sos) shrinkageResults [Read more...]
Reading data into R when dealing with column types and values that need to be considered as NA Below are code snippets to introduce a few arguments of the read.csv function in R # Create sample data strVals [Read more...]
A reminder that Sue Ranney will be presenting the webinar New Features in Revolution R Enterprise 5.0 (Including RevoScaleR) to support Scalable Data Analysis tomorrow (Thursday) at 11AM Pacific time. To whet your appetite, here's another video demonstation of more of the new big data analysis features, including the rxDataStep function ... [Read more...]
This is a follow-up of the post Power of running world records As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a [now] well-known [to me] sports result provided in a 2000 Nature paper by ... [Read more...]
After reading Bloomberg’s article, JPMorgan Chase & Co. and Goldman Sachs Group Inc., among the world’s biggest traders of credit derivatives, disclosed to shareholders that they have sold protection on more than $5 trillion of debt globally. ...