# July 2009

### Simple visualization of a 11X5 table (for WordPress 2.9 Features Vote Results)

July 31, 2009 |

I guess this is not the number one post I would like to start with on this blog, but I feel the time is right for it (community-wise). I’ll move on to the subject matter in a moment, but first a short intro: This blog is written by Tal ...

### Kruskal-Wallis one-way analysis of variance

July 31, 2009 |

If you have to perform the comparison between multiple groups, but you can not run a ANOVA for multiple comparisons because the groups do not follow a normal distribution, you can use the Kruskal-Wallis test, which can be applied when you can not make ... [Read more...]

### Analysis of variance: ANOVA, for multiple comparisons

July 30, 2009 |

Analysis of variance: ANOVA, for multiple comparisonsThe ANOVA model can be used to compare the mean of several groups with each other, using a parametric method (assuming that the groups follow a Gaussian distribution).Proceed with the following example:The manager of a supermarket chain wants to see if the ... [Read more...]

### Comparison of two proportions: parametric (Z-test) and non-parametric (chi-squared) methods

July 29, 2009 |

Consider for example the following problem.The owner of a betting company wants to verify whether a customer is cheating or not. To do this want to compare the number of successes of one player with the number of successes of one of his employees, of w... [Read more...]

### Wilcoxon signed rank test

July 29, 2009 |

Non-parametric statistical hypothesis test, for the comparison of the means between 2 paired samplesThe mayor of a city wants to see if pollution levels are reduced by closing the streets to the car traffic. This is measured by the rate of pollution every 60 minutes (8am 22pm: total of 15 measurements) in a ... [Read more...]

### Beta Verson of tikzDevice Released!

July 28, 2009 |

The tikzDevice package provides a new graphics device for R which enables direct output of graphics in a LaTeX-friendly way. The device output consists of files containing instructions for the TikZ graphics language and may be imported directly into LaTeX documents using the \input{} command. The beta version of tikzDevice ... [Read more...]

### I know it’s been so long…

July 28, 2009 |

Hey,I know it's been so long since last time I posted something in here, but I was really busy with my thesis and some other stuff, but now that I have more time I promise I'll post some interesting stuff in here, by the way, I found such an ... [Read more...]

### Corpus Linguistics with R, Day 2

July 28, 2009 |

R Lesson 2 text gsub ("second", "third", text) SEARCH-REPLACE-SUBJECT [1] "This is a first example sentence." [2] "And this is a third example sentence." __ gsub ("n", "X", text) [1] "This is a first example seXteXce." [2] "AXd this is a secoXd example seXteXce." __ gsub ("is", "was", text) [1] "Thwas was a first example [...] [Read more...]

### Corpus Linguistics with R, Day 1

July 28, 2009 |

(This post documents the first day of a class on R that I took at ESU C&T. I is posted here purely for my own use.) R Lesson 1 __ 2+3; 2/3; 2^3 [1] 5 [1] 0.6666667 [1] 8 --- Fundamentals - Functions __ log(x=1000, base=10) [1] 3 --- (Formals describes the syntax of other [...] [Read more...]

### Wilcoxon-Mann-Whitney rank sum test (or test U)

July 27, 2009 |

Comparison of the averages of two independent groups of samples, of which we can not assume a distribution of Gaussian type; is also known as Mann-Whitney U-test.You want to see if the mean of goals suffered by two football teams over the years is the same. Are below the ... [Read more...]

### Beautiful Data

July 27, 2009 |

O'Reilly's recent publication Beautiful Data has a chapter by Jeff Jonas which is enough reason in itself for me to recommend it. The chapter, Data Finds Data, is also available as a PDF download. [Read more...]

### R Snippet for Sampling from a Dataframe

July 27, 2009 |

It took me a while to figure this out, so I thought I'd share. I have a dataframe with millions of observations in it, and I want to estimate a density distribution, which is a memory intensive process. Running my kde2d function on the full dataframe throws and error ... [Read more...]

### biomaRt

July 27, 2009 |

I use R and Bioconductor for most of my work. I am also increasingly replacing things I would have done before in Perl with R. One such example of this is the Bioconductor module biomaRt.As the name suggest it allows for access to BioMart via R. BioMart is a ... [Read more...]

### Book now shipping from Amazon

July 27, 2009 |

Amazon now reports that the book is in stock! The current discount is 13%.Or, order from the publisher. If you are an ASA member, you can use the online discount code 634LH to obtain a 15% discount.

### Paired Student’s t-test

July 26, 2009 |

Comparison of the means of two sets of paired samples, taken from two populations with unknown variance.A school athletics has taken a new instructor, and want to test the effectiveness of the new type of training proposed by comparing the average time...

### Select operations on R data frames

July 26, 2009 |

The R language is weird - particularly for those coming from a typical programmer's background, which likely includes OO languages in the curly-brace family and relational databases using SQL. A key data structure in R, the data.frame, is used somethin...

### Rosetta Code

July 26, 2009 |

Today I'd like to suggest the interesting Rosetta Code site:Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and diff...

### Two sample Student’s t-test #2

July 25, 2009 |

Comparison of the averages of two independent groups, extracted from two populations at variance unknown; sample variances are not homogeneous.We want to compare the heights in inches of two groups of individuals. Here the measurements:A: 175, 168, 168...

### Example 7.7: Tabulate binomial probabilities

July 25, 2009 |

Suppose we wanted to assess the probability P(X=x) for a binomial random variate with n = 10 and with p = .81, .84, ..., .99. This could be helpful, for example, in various game settings. In SAS, we ﬁnd the probability that X=x using differences in t...