Interacting with bioinformatics webservers using R

September 8, 2011
By
Interacting with bioinformatics webservers using R

In an ideal world, all bioinformatics tools would be made available via the Web as a web service with an API, as well as a standalone package to download for local use. This is rarely the case and sometimes, even where one or the other is available, factors such as cost come into play. So

Read more »

A brief history of S&P 500 beta

September 8, 2011
By
A brief history of S&P 500 beta

Data The data are daily returns starting at the beginning of 2007.  There are 477 stocks for which there is full and seemingly reliable data. Estimation The betas are all estimated on one year of data. The times that identify the betas mark the point at which the estimate would become available.  So the betas … Continue reading...

Read more »

Multiple plots with subplot in R

September 8, 2011
By

I'm in the middle of creating a poster and wanted to compresss the content by transforming some of the charts into subplots of other charts.I made a little survey and found that there is a TeachingDemos library in CRAN that fits my needs. Well, the parameterization of the functions is a bit tricky but after a few tries...

Read more »

Shared and reproducible computing with OpenCPU

September 7, 2011
By
Shared and reproducible computing with OpenCPU

While looking for an online computing provider, I bumped into OpenCPU.org: OpenCPU is a new initiative to make innovations in statistics, visualization and data-science more widely applicable. I guess the idea of online analysis and visualization, and online cloud R computing platform isn’t really new at this point anymore, but the real incentive is the

Read more »

Analyzing big data in R: two presentations from useR! 2011

September 7, 2011
By

At last month's useR! 2011 conference at Warwick University, there were two talks on the RevoScaleR package for big data statistics in R. The first was a keynote presentation from Revolution Analytics' Chief Scientist, Lee Edlefsen. Here is the overview of his talk, Scalable Data Analysis in R: For the past several decades the rising tide of technology --...

Read more »

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

September 7, 2011
By
Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

tl;dr See this movie visualization for a case study on how a post propagates through Quora. How does information spread through a network? Much of Quora’s appeal, after all, lies in its social graph — and when you’ve got a network of users, all broadcasting their activities to their neighbors, information can cascade in multiple

Read more »

Hey! I made you some Wiener processes!

September 7, 2011
By
Hey! I made you some Wiener processes!

Check them out. Here are thirty homoskedastic ones: > homo.wiener for (j in 1:30) {  for (i in 2:length(homo.wiener)) {          homo.wiener for (j in 1:30) {        plot( homo.wiener,           type = "l", col = rgb(.1,....

Read more »

Hey! I made you some Wiener processes!

September 7, 2011
By
Hey! I made you some Wiener processes!

Check them out. Here are thirty homoskedastic ones: > homo.wiener for (j in 1:30) {  for (i in 2:length(homo.wiener)) {          homo.wiener for (j in 1:30) {        plot( homo.wiener,           type = "l", col = rgb(.1,....

Read more »

Link to StatDNA Guest Post

September 7, 2011
By
Link to StatDNA Guest Post

The post is officially up on the StatDNA blog. Go check it out.As I said in my previous post, this is a very rough and preliminary model. This is why my work was not any sort of formal entry, just some fun with some great data.I used an Vector Genera...

Read more »

R is a cool sound editor!

September 7, 2011
By

Capabilities of R are definitely unless! After my previous posts about some easy image editing in R (they are here, and here), now is the time to explore if R is capable of sound editing!Just for fun, here I created a function that receives a phone number (or another sequence of numbers), and returns the equivalent melody...

Read more »

A simple example for writting parallel code

September 7, 2011
By
A simple example for writting parallel code

Today, programmers have to deal with multi-core and multi-computer technologies. Several people claim that software developers are far behind hardware technologies. My two favorite posts for this statement are Editor’s Desk: Software Lags Behind Hardware, But That’s a Good Thing A Hacker’s Craic -Why is software so far behind hardware? Parallel computing is not that

Read more »

Google Spreadsheets API: Listing Individual Spreadsheet Sheets in R

September 7, 2011
By
Google Spreadsheets API: Listing Individual Spreadsheet Sheets in R

In Using Google Spreadsheets as a Database Source for R, I described a simple Google function for pulling data into R from a Google Visualization/Chart tools API query language query applied to a Google spreadsheet, given the spreadsheet key and worksheet ID. But how do you get a list of sheets in spreadsheet, without opening

Read more »

2011 Perth City to Surf Stats

September 6, 2011
By
2011 Perth City to Surf Stats

Like every year, August sees the thousands taking part in the Perth City to Surf, and with that comes the chance for some stats. Why? Curiosity more than anything, and to convince myself that my time in the 12km run … Continue reading →

Read more »

Fortune: Data Science is the hot new job

September 6, 2011
By

An article in the September 5 issue of Fortune Magazine notes that despite the economy, companies are scrambling to hire data scientists: Data scientists have been a fixture at online companies like Google (GOOG) and Amazon (AMZN) for years. But these days organizations as diverse as Wal-Mart (WMT) and Foursquare are hiring computer science experts who can analyze all...

Read more »

Bayes-250, Edinburgh [day 2]

September 6, 2011
By
Bayes-250, Edinburgh [day 2]

After a terrific run this morning to the top of Arthur’s Seat, and then around (the ribs are feeling fine, now!), the Bayes-250 talks were exhilarating and challenging. Jim Smith gave an introduction to the challenges of getting different experts to collaborate on a complex risk assessment, much in the spirit of his book, that

Read more »

Webinar: Leveraging R in Hadoop Environments

September 6, 2011
By
Webinar: Leveraging R in Hadoop Environments

On Wednesday September 21, Revolution Analytics' CTO David Champagne will give a live webinar introducing three new open-source packages for R and Hadoop, which make it possible to work with Hadoop data in R, and bring in-database R analytics to Hadoop. Here are the details: Date: Wednesday, September 21st Time: 10:00AM - 10:30AM Pacific Time Presenter: David Champagne, Chief...

Read more »

Example 9.4: New stuff in SAS 9.3– MI FCS

September 6, 2011
By
Example 9.4: New stuff in SAS 9.3– MI FCS

We begin the new academic year with a series of entries exploring new capabilities of SAS 9.3, and some functionality we haven't previously written about.We'll begin with multiple imputation. Here, SAS has previously been limited to multivariate norma...

Read more »

Free R Book Collection

September 6, 2011
By

I have just encountered some R PDF books that seem quite interesting. One of them is written by Venables himself.The Art of R Programming by Norman MatloffAn Introduction to R by W.N. Venables and D. M. SmithThe R Inferno by Patrick BurnsThe R Guide by...

Read more »

Salesforce.com and Analytics

September 5, 2011
By
Salesforce.com and Analytics

Salesforce.com has become one of the most successful cloud applications. I am quite astounded by it’s mega hit penetration into myriad of industries.  It is being used by leading organizations not only to implement their customer relationship management system but also to develop their own applications running on cloud. But complete absence of meaningful analytical

Read more »

KDNuggest: R most commonly used software for data mining & analytics

September 5, 2011
By
KDNuggest: R most commonly used software for data mining & analytics

In a poll with 570 respondents conducted last month at KDNuggets, the R software was the most frequent response to the question, "What programming languages you used for data mining / data analysis in the past 12 months?". The results are tabled below (respondents could select more than one response): In another poll conducted earlier this year, KDNuggets also...

Read more »

Review of “Risk and Meaning” by Nicolas Bouleau

September 5, 2011
By
Review of “Risk and Meaning” by Nicolas Bouleau

The subtitle is: Adversaries in Art, Science and Philosophy. Executive Summary Genius or madness? I haven’t decided. Irreversibility of interpretation The book drives home that once we decide how something is we can’t go back to our state of innocence. Figures 1 through 3 exhibit this idea via a randomly generated polygon.  Look at Figure … Continue reading...

Read more »

A misleading title…

September 4, 2011
By
A misleading title…

When I received this book, Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz,  from/for the Short Book Reviews section of the International Statistical Review, I was obviously impressed by its size (around 1700 pages and 3 kilos…). From briefly glancing at the table of contents, and the list of standard

Read more »

googleVis 0.2.9

September 4, 2011
By
googleVis 0.2.9

We have published googleVis 0.2.9 on CRAN. The new version updates the package for the new features of the Google Visualisation API and brings an new in-page editor option. Here is a simple example, displaying the participants of the R user Conference...

Read more »

Ladies and Gents: GDP has finally gotten its long awaited forecast

September 4, 2011
By
Ladies and Gents: GDP has finally gotten its long awaited forecast

Today we will be finally creating our long awaited GDP forecast.  In order to create this forecast we have to combine both the forecast from our deterministic trend model and the forecast from our de-trended GDP model. Our model for the trend is:t...

Read more »

Scatter plots with images

September 4, 2011
By

Edward Tufte has written extensively on the presentation of data covering good and bad practice. He has made a number of suggestions for adaptations of regularly used graph types to assist with the interpretation and understanding of data. One idea for enhancing scatter plots covered in Tufte’s book Beautiful Evidence is the use of images

Read more »

Microfinance in India: Getting a sense of the geographic distribution

September 3, 2011
By
Microfinance in India: Getting a sense of the geographic distribution

I am working on a review paper on microfinance in India and use data from the MIX market. Today, I was amazed by how quick I conjured a map of India with the headquarters of the microfinance institutions that report data to the MIX market depicted on that map. Ideally, I would have more geolocation

Read more »

The Problems with Pairing R + Java

A core focus of the RTextTools project has been to make the package as accessible and user-friendly as possible. In its early iterations, the package contained dependencies such as RWeka, openNLP, and

Read more »

An example of ROC curves plotting with ROCR

September 3, 2011
By
An example of ROC curves plotting with ROCR

Decided to start githib with ROC curve plotting example. There is not a one ROC curve but several - according to the number of comparisons (classifications), also legend with maximal and minimal ROC AUC are added to the plot. ROC curves and ROC AU...

Read more »

rmongodb – R Driver for MongoDB

September 3, 2011
By

The source code to rmongodb (home page at http://cnub.org/rmongodb.ashx), a driver to MongoDB for the R language, has been released as open source at GitHub: https://github.com/gerald-lindsly/rmongodb.  This portable full-featured package was developed on top of the mongodb.org supported C driver. It runs almost entirely in native code so you can expect high performance.  Plans are to submit rmongodb to CRAN soon for pre-built binary distribution, but first I would...

Read more »