A brief introduction to R for SAS and SPSS users

September 29, 2011
By

If you've used SAS or SPSS and want a jump-start into the basics of the popular R language, next week's webinar, Introduction to R for SAS and SPSS Users will be of interest to you. While R, SAS and SPSS are all three software systems for data analysis and graphics, the underlying concepts in R are quite different to...

Read more »

Connect JAVA to R part 2

September 29, 2011
By

To follow on from the earlier post on using R through Java, it is even easier to get jri up and running as a NetBeans module. Why is this useful? Well the platform that the NetBeans IDE is built on … Continue reading →

Read more »

Paired sample t-test in R

September 28, 2011
By
Paired sample t-test in R

Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers … Continue reading →

Read more »

ttrTests This is a Test–Test 1 and Test 2

September 28, 2011
By
ttrTests This is a Test–Test 1 and Test 2

Just to remind everyone, THIS IS NOT INVESTMENT ADVICE AND ANY ACTIONS TAKEN BASED ON THIS DISCUSSION WILL PROBABLY RESULT IN SIGNIFICANT LOSSES. We had fun with the ttrTests package in two previous posts ttrTests: Its Great Thesis and Incredible Poten...

Read more »

The R Graph Gallery goes social

September 28, 2011
By
The R Graph Gallery goes social

The R Graph Gallery, the website from Romain François that showcases hundreds of examples of data visualization with R, has new social features. Now, when you find a graph or chart you find appealing or useful, you can "Like" it on Facebook or "+1" it on Google+. This should be a great way of highlighting the best charts and...

Read more »

Is the “Long Tail” a Useless Concept?

Is the “Long Tail” a Useless Concept?

In response to my last post, “The Long Tail of the Pareto Distribution,” Neil Gunther had the following comment:            “Unfortunately, you've fallen into the trap of using the ‘long tail’ misnomer. If you think about it, it can't possibly be the length of the tail that sets distributions like Pareto and Zipf apart; even the negative exponential and Gaussian...

Read more »

Data Science: a literature review

September 28, 2011
By
Data Science: a literature review

Just what is Data Science, anyway? Here's one take: Ever since the term "Data Scientist" was coined by DJ Patil and Jeff Hammerbacker in 2009, there's been a vigorous debate on what the team actually means. More than 80% of statisticians consider themselves data scientists, but Data Science is more than just Statistics. (My own take is that Data...

Read more »

Polyploidy in sugarcane

September 28, 2011
By

While reading UseR conference abstracts I came across this sentence: "Sugarcane is polypoid, i.e., has 8 to 14 copies of every chromosome, with individual alleles in varying numbers." Vau! This generates really complex genotype system. Say we have biallelic gene with alleles being A and B. In diploids the possible genotypes are AA, AB, and BB. Given the...

Read more »

Bessel integral

September 28, 2011
By
Bessel integral

Pierre Pudlo and I worked this morning on a distribution related to philogenic trees and got stuck on the following Bessel integral where In is the modified Bessel function of the first kind. We could not find better than formula 6.611(4) in Gradshteyn and Ryzhik. which is for a=0… Anyone in for a closed form

Read more »

Using transparency for data count intuition

September 27, 2011
By
Using transparency for data count intuition

This is an illustration of representing point count in a graphic using transparency. This is easy to do in ggplot2 if you use one of the barchart type of geoms.  However I think there are other situations where it would be useful to apply aesthetics based on point count. Since Hadley did a lot of

Read more »

World Tourism Day, and Google Public Data Explore

September 27, 2011
By
World Tourism Day, and Google Public Data Explore

Today is the World Tourism Day! So let’s speak about some tourism related datasets – and others. Among other nice functions, Google offers a Public Data Explore in a beta version which provides a collection of datasets from OECD, IMF, Eurostat, … Continue reading →

Read more »

Five new local R user groups

September 27, 2011
By

Looks like there's been a lot of activity in the R user community in the Northern hemisphere now that the summer break is over. I've just added several new groups to the Local R User Group Directory: Tokyo, Japan: The Tokyo.R R study group has already had 17 meetings, but has just been added to the directory. Shanghai/East China:...

Read more »

Tikz Introduction

September 27, 2011
By
Tikz Introduction

The pgf drawing package for LaTeX provides facilities for drawing simple of complicated pictures within a LaTeX document. There are many options available within the package and in this post we consider some of the basics to get up and running. Fast Tube by Casper As with all LaTeX documents we need to select a

Read more »

Basic line chart with ggplot2

September 27, 2011
By
Basic line chart with ggplot2

ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a … Continue reading →

Read more »

Ghastly R code

September 27, 2011
By
Ghastly R code

My R package, R/qtl, contains about 33k lines of R code (and 21k lines of C code). Some of it is quite good; some of it is terrible. Here’s another example of the terrible. I’ve long needed to revise the function scantwo, for performing a two-dimensional genome scan for pairs of loci. I was looking

Read more »

Project Euler: problem 6

September 27, 2011
By
Project Euler: problem 6

The sum of the squares of the first ten natural numbers is,12 + 22 + ... + 102 = 385The square of the sum of the first ten natural numbers is,(1 + 2 + ... + 10)2 = 552 = 3025Hence the difference between the sum of the squares o...

Read more »

Example 9.7: New stuff in SAS 9.3– Frailty models

September 27, 2011
By
Example 9.7: New stuff in SAS 9.3– Frailty models

Shared frailty models are a way of allowing correlated observations into proportional hazards models. Briefly, instead of l_i(t) = l_0(t)e^(x_iB), we allow l_ij(t) = l_0(t)e^(x_ijB + g_i), where observations j are in clusters i, g_i is typically norma...

Read more »

Obama recruiting analysts and R is one preferred skill

September 27, 2011
By
Obama recruiting analysts and R is one preferred skill

Barack Obama is recruiting analysts for his 2012 re-election campaign. It is to analyze the campaign’s data to guide election strategy and develop quantitative, actionable insights that drive decision-making. R is mentioned as one of the tools to use. Analytics … Continue reading →

Read more »

Time series equivalence of brains and markets

September 27, 2011
By
Time series equivalence of brains and markets

fMRI data from 90 locations in the brain look somewhat like daily closing prices on 116 stocks if you squint just right. Marginal Revolution was nice enough to point to “Topological isomorphisms of human brain and financial market networks”. I’ve only just glanced through the paper.  I find it interesting, but I’m fairly skeptical.  The … Continue reading...

Read more »

Hipster programming languages

September 26, 2011
By
Hipster programming languages

If you look at the programming languages that are popular these days, a few patterns emerge. I'm not talking about languages that have the most hits on the job sites. I'm talking about what the cool kids are coding in - the folks that hang out on hacke...

Read more »

Revolution Analytics partners with Cloudera

September 26, 2011
By

Revolution Analytics today announced that it has partnered with Cloudera, the leader in Apache Hadoop-based software and services, to make big-data analytics with Hadoop and R available to Revolution R Enterprise users. As we announced earlier this month, we have created three open-source R packages which make it possible for R users to write map-reduce programs in the R...

Read more »

ttrTests: Its Great Thesis and Incredible Potential

September 26, 2011
By
ttrTests: Its Great Thesis and Incredible Potential

I stumbled on the ttrTests R package as mentioned in my post ttrTests Experimentation.  I did not recognize its potential until I spent much more time absorbing the basis of the package—David St. John’s thesis Technical Analysis Based on Movin...

Read more »

workshop in Columbia [day 3]

September 26, 2011
By
workshop in Columbia [day 3]

Although this was only a half-day of talks, the third day of the workshop was equally thought-challenging and diverse.  (I managed to miss the ten first minutes by taking a Line 3 train to 125th street, having overlooked the earlier split from Line 1… Crossing south Harlem on a Sunday morning is a fairly mild

Read more »

Using Inkscape to Post-edit Labels in R Graphs

September 26, 2011
By

I discuss how to use Inkscape to easily shift around labels on graphs produced in R. Continue reading →

Read more »

Gamified

September 26, 2011
By
Gamified

Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. An interesting feature of these sites is that, in addition to voting up and down on the questions and answers, one accrues

Read more »

Visualizing Sampling Distributions

September 25, 2011
By
Visualizing Sampling Distributions

Teacher: “How variable is your estimate of the mean?” Student: “Uhhh, it’s not. I took a sample and calculated the sample mean. I only have one number.” Teacher: “Yes, but what is the standard deviation of sample means?” Student: “What do you mean means, I only have the one friggin number.” Statisticians have a habit

Read more »

Accessing and plotting World Bank data with R

September 25, 2011
By
Accessing and plotting World Bank data with R

Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site. It is worth visiting their Data Visualisation Tools page. It i...

Read more »

rrdf 1.5: Accessing SMW SPARQL end points behind LDAP authentication

September 25, 2011
By
rrdf 1.5: Accessing SMW SPARQL end points behind LDAP authentication

We are using a Semantic MediaWiki (SMW) for the Gold Compound selection task by the ToxBank in the SEURAT-1 cluster, funded by Colipa and the EC. I do stress that despite being funded by Colipa, they have no control over my research; they just co-...

Read more »

Arc Diagram and spatiotemporal data mining visualization

September 23, 2011
By
Arc Diagram and spatiotemporal data mining visualization

I won't spend too much time discussing this fascinating topic other than to say it relates very much to prior discussions about pattern discovery via visual data mining (see lexical dispersion plots for example).  I happened across an interesting ...

Read more »