## Two sample Student’s t-test #1

July 24, 2009
By

t-Test to compare the means of two groups under the assumption that both samples are random, independent, and come from normally distributed population with unknow but equal variancesHere I will use the same data just seen in a previous post. The data ...

## One sample Student’s t-test

July 23, 2009
By

Comparison of the sample mean with a known value, when the variance of the population is not known.Consider the exercise we have just seen before.It was made an intelligence test in 10 subjects, and here are the results obtained. The average result of ...

## Two sample Z-test

July 22, 2009
By

Comparison of the means of two independent groups of samples, taken from two populations with known variance.Is asked to compare the average heights of two groups. The first group (A) consists of individuals of Italian nationality (the variance of the ...

## Massively parallel database for analytics

July 22, 2009
By

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL...

## Massively parallel database for analytics

July 22, 2009
By

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL...

## One sample Z-test

July 21, 2009
By

Comparison of the sample mean with know population mean and standard deviation.Suppose that 10 volunteers have done an intelligence test; here are the results obtained. The mean obtained at the same test, from the entire population is 75. You want to c...

## RGG#155, 156 and 157

July 21, 2009
By

I pushed 3 more graphics from Biecek Przemyslaw to the graphics gallery A list of popular names for colors from packages RColorBrewer, colorRamps, grDevices A set of examples of few graphical low-level parameters lend, ljoin, xpd, adj, lege...

## Score with scoring rules

July 21, 2009
By

INCENTIVES TO STATE PROBABILITIES OF BELIEF TRUTHFULLY We have all been there. You are running an experiment in which you would like participants to tell you what they believe. In particular, you’d like them to tell you what they believe to be the probability that an event will occur. Normally, you would ask them. But

## Geometric and harmonic means in R

July 20, 2009
By

Compute the geometric mean and harmonic mean in R of this sequence.10, 2, 19, 24, 6, 23, 47, 24, 54, 77These features are not present in the standard package of R, although they are easily available in some packets. However, it is easy to calculate the...

## Adding a legend to a plot

July 20, 2009
By

It's pretty easy!plot (c(1968,2010),c(0,10),type="n", # sets the x and y axes scales xlab="Year",ylab="Expenditures/GDP (%)") # adds titles to the axes lines(year,defense,col="red",lwd=2.5) # adds a line for defense expenditures lines(year,health,col="...

## Adding a legend to a plot

July 20, 2009
By

It's pretty easy!plot (c(1968,2010),c(0,10),type="n", # sets the x and y axes scales xlab="Year",ylab="Expenditures/GDP (%)") # adds titles to the axes lines(year,defense,col="red",lwd=2.5) # adds a line for defense expenditures lines(year,health,col="...

## Example 7.6: Find Amazon sales rank for a book

July 20, 2009
By

In honor of Amazon's official release date for the book, we offer this blog entry.Both SAS and R can be used to find the Amazon Sales Rank for a book by downloading the desired web page and ferreting out the appropriate line. This code is likely to br...

## ggplot2: more wicked-cool plots in R

July 20, 2009
By

As far as I know there are 3 different systems for producing figures in R: (1) base graphics, included with R, (2) the lattice package, and (3) ggplot2, one of the newer plotting systems which is, according to the creator Hadley Wickham, "based on the grammar of graphics, which tries to take the good parts of base and lattice...

July 19, 2009
By

## Let us practice with some functions of R

July 18, 2009
By

Given the following data set, compute the arithmetic mean, median, variance, standard deviation; find the greatest and the smaller value, the sum of all values, the square of the sum of all values, the sum of the square of all values; assigne the ranks...

## Book excerpts now posted

July 18, 2009
By

We've posted excerpts from the book on the book website. The excerpts include Chapter 3 (regression and ANOVA) in its entirety. This demonstrates how the entries (the generic descriptions of software functions) and the worked examples reinforce each ...

## Parsing GEO SOFT files with Python and Sqlite

July 17, 2009
By

NCBI's GEO database of gene expression data is a great resource, but its records are very open ended. This lack of rigidity was perhaps necessary to accommodate the variety of measurement technologies, but makes getting data out a little tricky. But, a...

## Simple Data Visualization

July 16, 2009
By

OK, so, I know I already raved about one Hadley Wickham project and how it has changed my life last week. But what can I say, the man is a genius. And if you are using R (and let’s face it, you should be) and you want simple sexy graphs made quick, the man has

## Influence.ME: Simple Analysis

July 16, 2009
By

With the introduction of our new package for influential data influence.ME, I’m currently writing a manual for the package. This manual will address topics for both the experienced, and the inexperienced users. I will also present much of the content ...

## Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

## Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

July 15, 2009
By

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...

July 15, 2009
By

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...