Data is everywhere!

November 19, 2011
By
Data is everywhere!

I was writing earlier today that I am getting really fed to using the same datasets over and over again. Of course using the same data over time with different methods (eg look this) serves really well on a comparison scope but still we can use other data in a web world. For example, you ...read more

Read more »

Public vote open for Mendely-PLoS Binary Battle: vote rOpenSci!

November 19, 2011
By
Public vote open for Mendely-PLoS Binary Battle: vote rOpenSci!

http://www.surveygizmo.com/s3/722753/Mendeley-PLoS-Binary-Battle-Public-Vote

Read more »

randu dataset, part 2

November 19, 2011
By
randu dataset, part 2

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...

Read more »

Plotting randu dataset

November 18, 2011
By
Plotting randu dataset

Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...

Read more »

Let the Lagging Lead

November 18, 2011
By
Let the Lagging Lead

THIS IS NOT INVESTMENT ADVICE AND WILL PROBABLY WIPE OUT ALL YOUR MONEY IF PURSUED.  While exploring utilities, I discovered a strange phenomenon that I have not quite thoroughly understood, but I attribute to the business cycle.  If I dust o...

Read more »

Analyzing birth rates from census data from RevoScaleR

November 18, 2011
By
Analyzing birth rates from census data from RevoScaleR

In yesterday's webinar, "New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis", Sue Ranney demonstrated the features of the RevoScaleR big data analysis package included with Revolution R Enterprise. In the webinar, she showed how to use the rxImport function to import big data sets from SAS, SPSS or ODBC, how to use the rxDataStep function...

Read more »

My talk on doing phylogenetics in R

November 18, 2011
By

I gave a talk today on doing very basic phylogenetics in R, including getting sequence data, aligning sequence data, plotting trees, doing trait evolution stuff, etc.Please comment if you have code for doing bayesian phylogenetic inference in R.  ...

Read more »

My talk on doing phylogenetics in R

November 18, 2011
By
My talk on doing phylogenetics in R

I gave a talk today on doing very basic phylogenetics in R, including getting sequence data, aligning sequence data, plotting trees, doing trait evolution stuff, etc.Please comment if you have code for doing bayesian phylogenetic inference in R.  ...

Read more »

Why balloons are better than balls (in urn schemes)

November 18, 2011
By

The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are preferred because their volume may

Read more »

htmlToText(): Extracting Text from HTML via XPath

November 18, 2011
By
htmlToText(): Extracting Text from HTML via XPath

Converting HTML to plain text usually involves stripping out the HTML tags whilst preserving the most basic of formatting. I wrote a function to do this which works as follows (code can be found on github): The above uses an XPath approach to achieve it’s goal. Another approach would be to use a regular expression. These

Read more »

FBS Coaches Avg. Salary

November 18, 2011
By
FBS Coaches Avg. Salary

Of course, a few days before I leave for a much needed vacation, USA Today released their updated NCAA coaching salary database. For sports junkies, there’s an unlimited number of analysis and visualizations that can be done on the data. I took a quick break from packing to condense the data to a csv and

Read more »

Style Analysis

November 17, 2011
By
Style Analysis

During the final stage of asset allocation process we have to decide how to implement our desired allocation. In many cases we will allocate capital to the mutual fund managers who will invest money according to their fund’s mandate. Usually there is no perfect relationship between asset classes and fund managers. To determine the true

Read more »

Spinner Doctor

November 17, 2011
By
Spinner Doctor

The setup Dan Meyer, a (former?) math teacher with some extraordinary ideas, has a nifty concept for teaching expected values: “So one month before our formal discussion of expected value, I’d print out this image, tack a spinner to it, … Continue reading →

Read more »

Revolution Newsletter: November 2011

November 17, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full November edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. R Training from Hadley Wickham: The R guru (and author of ggplot2, plyr and several...

Read more »

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

November 17, 2011
By
GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...

Read more »

Using neural network for regression

November 17, 2011
By
Using neural network for regression

Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading →

Read more »

Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011
By

I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a probability statement about a parameter.

Read more »

Finding functions in R

November 17, 2011
By
Finding functions in R

When looking for functions whose exact name is unknown # Functions related to “shrinkage” methods help.search(“shrinkage”) Package sos does a great job in finding functions install.packages(“sos”) library(sos) shrinkageResults <- findFn("shrinkage", maxPages = 1) shrinkageResults # This opens a webpage in your browser with the results The table in the webpage created above have sortable columns.

Read more »

Missing values and column types when reading data into R

November 17, 2011
By
Missing values and column types when reading data into R

Reading data into R when dealing with column types and values that need to be considered as NA Below are code snippets to introduce a few arguments of the read.csv function in R # Create sample data strVals <- do.call("c",lapply(1:1000,function(x)paste(sample(letters,sample(5:20,1)),collapse=""))) miscVals <- sample(c("","999","—-","MISS"),100,replace=T) numVals <- rnorm(1000) # Scenario 1 : Pure numeric and strings dataTemp<-data.frame(numericVals

Read more »

Webinar Tomorrow: What’s new in Revolution R Enterprise 5.0

November 16, 2011
By

A reminder that Sue Ranney will be presenting the webinar New Features in Revolution R Enterprise 5.0 (Including RevoScaleR) to support Scalable Data Analysis tomorrow (Thursday) at 11AM Pacific time. To whet your appetite, here's another video demonstation of more of the new big data analysis features, including the rxDataStep function to preprocess a data set using R functions...

Read more »

Power-laws: choose your x and y variables carefully

November 16, 2011
By
Power-laws: choose your x and y variables carefully

This is a follow-up of the post Power of running world records As suggested by Andrew, plotting running world records could benefit from a change of variables. More exactly the use of different variables sheds light on a well-known sports result provided in a 2000 Nature paper by Sandra Savaglio and Vincenzo

Read more »

Update on Scary Derivatives

November 16, 2011
By
Update on Scary Derivatives

After reading Bloomberg’s article, JPMorgan Chase & Co. and Goldman Sachs Group Inc., among the world’s biggest traders of credit derivatives, disclosed to shareholders that they have sold protection on more than $5 trillion of debt globally. ...

Read more »

an easy way to writing data.frame to Excel

November 16, 2011
By

you can write it aswrite.table(r.data.frame, "excel.file.xls", sep="\t", na="", row.names=F)which I can usually open in Excel just by clicking on it.Credit: http://tolstoy.newcastle.edu.au/R/help/05/04/3388.html

Read more »

Using SyntaxHighlighter and R Brush in Blogger

November 16, 2011
By
Using SyntaxHighlighter and R Brush in Blogger

If you're thinking it is time to give the code examples in your blog a more readable look, you may follow this path and use the SyntaxHighlighterFirst thing: check the SyntaxHighlighter Website for the basics.Read more »

Read more »

Performance measurement is about decisions

November 16, 2011
By
Performance measurement is about decisions

The return of a hypothetical fund was 17.9% in 2010.  We want to know if that is good or bad. The benchmark method The assets in the portfolio are constituents of the S&P 500, so we can compare our fund return to the return of the index. Figure 1: 2010 returns of: the fund and … Continue reading...

Read more »

fgui: Automatically Creating Widgets for Arguments of a Function – A Quick Example

November 16, 2011
By
fgui: Automatically Creating Widgets for Arguments of a Function – A Quick Example

Here’s something I came across by accident, an R package called fgui which has the ability to automatically create a widget just by passing it a function with parameters, e.g.: The GUI produced from the code above looks like this: I love how easy that was to do, very cool, and useful too! The package

Read more »

Lambert’s W function and the generalised logarithm

November 16, 2011
By
Lambert’s W function and the generalised logarithm

Yesterday I ran into an equation that was a sum of an exponential and a linear term: It doesn’t take long to figure out that there is no analytical solution, and so I set out to write some crappy numerical code. After wasting some time with a fixed point iteration that did not really work,

Read more »

Weather forecast and good development practices

November 16, 2011
By
Weather forecast and good development practices

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below: Weather summary for Trieste, Friuli-Venezia Giulia: The weather in Trieste is clear. The temperature is currently 14°C (57°F). Humidity: 63%. Fortunately,...

Read more »

PhD defense on copulas

November 15, 2011
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics - Tag - R-english. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave,...

Read more »