R’s Distrotheque

November 28, 2011
By
R’s Distrotheque

(Update: The csound package is now available on CRAN.) Do your random variables need to groove more? Of course they do. That's why I've been working on the upcoming csound package for R, which connects to Csound computer synthesis software to make any sound imaginable. Your computer'll be the hippest sample space on the randomized

Read more »

Retrieve GBIF Species Occurrence Data with Function from dismo Package

November 28, 2011
By
Retrieve GBIF Species Occurrence Data with Function from dismo Package

..The dismo package is awesome: with some short lines of code you can read & map species distribution data from GBIF (the global biodiversity information facility) easily:Read more »

Read more »

Course: Financial Data Modeling and Analysis in R

November 28, 2011
By

The University of Washington is holding a web-based course which will be of interest to anyone who wants to learn about financial modeling with R: Financial Data Modeling and Analysis in R (AMATH 542) is a comprehensive introduction to the R statistical programming language for computational finance offered by the University of Washington Computational Finance program and taught by...

Read more »

Where the Worlds of Dentistry and Cartography Collide

November 28, 2011
By
Where the Worlds of Dentistry and Cartography Collide

As I was getting a root canal last week, my dental X-Rays reminded me anew of an optical illusion that stumped us for a short time recently when we were developing our heatmapping engine.My X-Rays, before during and after a recent root canal.  The...

Read more »

Predicting Gender

November 28, 2011
By
Predicting Gender

If there are two (can be generalized to n) classes and both follow the same distribution (but with different parameters) it is possible to predict which class an observations comes from. Here I’ll try to predict a sample’s gender based on their height. The distribution of a person’s height is more or less normal. There

Read more »

Another aspect of speeding up loops in R

November 28, 2011
By
Another aspect of speeding up loops in R

Any frequent reader of R-bloggers will have come across several posts concerning the optimization of code - in particular, the avoidance of loops.Here's another aspect of the same issue. If you have experience programming in other languages besides R, this is probably a no-brainer, but for laymen, like myself, the following example was...

Read more »

A nice short article on memory in R

November 28, 2011
By
A nice short article on memory in R

There is a nice short article on memory issue in R at http://www.matthewckeller.com/html/memory.html. If you use R to process large data, you might find it helpful. It introduces: - checking how much memory an object is taking; - the memory … Continue reading →

Read more »

Prime Number in R Language (CloudStat)

November 28, 2011
By
Prime Number in R Language (CloudStat)

A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself. R Language Code The Prime Function prime = function(n){   n = as.integer(n)   if(n > 1e8) stop(“n too large”)   primes = re...

Read more »

A Story of Life and Death. On CRAN. With Packages.

November 27, 2011
By

The Comprehensive R Archive Network, or CRAN for short, has been a major driver in the success and rapid proliferation of the R statistical language and environment. CRAN currently hosts around 3400 packages, and is growing at a rapid rate. Not too ...

Read more »

Regression via Gradient Descent in R

November 27, 2011
By
Regression via Gradient Descent in R

In a previous post I derived the least squares estimators using basic calculus, algebra, and arithmetic, and also showed how the same results can be achieved using the canned functions in SAS and R or via the matrix programming capabilities offered by ...

Read more »

Basic Econometrics in R and SAS

November 27, 2011
By
Basic Econometrics in R and SAS

Regression Basicsy= b0 + b1 *X  ‘regression line we want to fit’The method of least squares minimizes the squared distance between the line ‘y’ andindividual data observations yi. That is minimize: ∑ ei2 = ∑ (yi - b0 -  b1 Xi...

Read more »

Gradient Descent in R

November 27, 2011
By
Gradient Descent in R

In a previous post I discussed the concept of gradient descent.  Given some recent work in the online machine learning course offered at Stanford,  I'm going to extend that discussion with an actual example using R-code  (the actual code...

Read more »

Dealing with Non-Positive Definite Matrices in R

November 27, 2011
By

Last time we looked at the Matrix package and dug a little into the chol(), Cholesky Decomposition, function.  I noted that often in finance we do not have a positive definite (PD) matrix.  The chol() function in both the Base and Matrix...

Read more »

Cleaning time-series and other data streams

Cleaning time-series and other data streams

The need to analyze time-series or other forms of streaming data arises frequently in many different application areas.  Examples include economic time-series like stock prices, exchange rates, or unemployment figures, biomedical data sequences like electrocardiograms or electroencephalograms, or industrial process operating data sequences like temperatures, pressures or concentrations.  As a specific example, the figure below shows four data sequences:...

Read more »

GTA R Users Group – Using R for Data Mining Competitions

November 27, 2011
By
GTA R Users Group – Using R for Data Mining Competitions

Here are the presentation slides I used for my talk on “Using R for Data Mining Competitions” at Ryerson University as part of the Greater Toronto Area (GTA) R User’s Meetup Group. Presentation (Prezi) Presentation (PDF) Meetup Event page Special thanks to Anthony Goldbloom from Kaggle and various competition winners for sharing their code and

Read more »

Analytics using R: Most active in my Twitter list

November 27, 2011
By
Analytics using R: Most active in my Twitter list

I follow some 80 odd people/ news sources on my twitter account. For a while I wondered which of these sources are most active on twitter. I picked a simple metric '# of status messages posted to twitter' as the measure of activity. Using R I quickly wrote a program to generate my top 10 most active...

Read more »

Putting it all together: concise code to make dotplots with weighted bootstrapped standard errors

November 27, 2011
By
Putting it all together: concise code to make dotplots with weighted bootstrapped standard errors

I analyze a lot of experiments and there are many times when I want to quickly look at means and standard errors for each cell (experimental condition), or the same for each cell and individual-level attribute level (e.g., Democrat, Independent, … Continue reading →

Read more »

..A Quick Geo-Trick for GoogleMaps in R (using dismo)

November 26, 2011
By
..A Quick Geo-Trick for GoogleMaps in R (using dismo)

... I thought this geocoding-bit might be worth to share (found HERE when searching the web for dismo-documentation).Read more »

Read more »

Comparing StackOverflow and the R-help mailing list

November 26, 2011
By
Comparing StackOverflow and the R-help mailing list

Only recently I discovered StackOverflow. I know, as a nerd already programming for many years that is quite late. For those who are not familiar with StackOverflow (aka SO), it is a Question and Answer site for programmers. It is… See more ›

Read more »

Count different positions between two strings of equal length

November 26, 2011
By
Count different positions between two strings of equal length

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that

Read more »

Deductive imputation with the deducorrect package

November 26, 2011
By
Deductive imputation with the deducorrect package

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other … Continue reading →

Read more »

int64: 64 bit integer vectors for R

November 26, 2011
By
int64: 64 bit integer vectors for R

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google. The package defines classes i...

Read more »

The Global Earthquake Desktop

November 25, 2011
By
The Global Earthquake Desktop

One of the first things I do over coffee each morning is scroll through the USGS earthquake RSS feeds.  In the era of free data and open source computing I asked myself, "Wouldn't it be better to visualize all of the earthquakes around the world r...

Read more »

..Some More Regex Examples Added to Collection

November 25, 2011
By
..Some More Regex Examples Added to Collection

Find the below examples added to my list of regex-examples HERE. Read more »

Read more »

Working with isTRUE

November 25, 2011
By
Working with isTRUE

This week I was running computations transforming some input files into output files. The problem was that it was a repeated process. If new input files were generated or old ones were updated I needed to calculate new output files. The transformation ...

Read more »

ConPA uses cloudnumbers.com as calculation backend

November 25, 2011
By
ConPA uses cloudnumbers.com as calculation backend

ConPA is an asset allocation application using the classic Markowitz approach. For the calculations the open-source statistical programming language R is used. R scripts are executed on cloudnumbers.com’s computer clusters in the Cloud and the results are displayed by ConPA frontend. ConPA allows to set the investment date of the portfolio, the target return and

Read more »

Sending Email from R (using sendEmail)

November 25, 2011
By
Sending Email from R (using sendEmail)

Like a lot of other R users I’ve felt the need for sending email from R. I haven’t surveyed CRAN for such a package but looked for the possibility of sending command line email in Windows. Found a nice application called sendEmail that can be found here Below are code snippets in R that will

Read more »

Pseudo-Random vs. Random Numbers in R

November 25, 2011
By
Pseudo-Random vs. Random Numbers in R

Earlier, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in

Read more »

Pseudo-Random vs. Random Numbers in R

November 25, 2011
By
Pseudo-Random vs. Random Numbers in R

Happy Thanksgiving, everyone. Earlier today, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in PHP is

Read more »