Two sample Student’s t-test #1

July 24, 2009
By
Two sample Student’s t-test #1

t-Test to compare the means of two groups under the assumption that both samples are random, independent, and come from normally distributed population with unknow but equal variancesHere I will use the same data just seen in a previous post. The data ...

Read more »

One sample Student’s t-test

July 23, 2009
By
One sample Student’s t-test

Comparison of the sample mean with a known value, when the variance of the population is not known.Consider the exercise we have just seen before.It was made an intelligence test in 10 subjects, and here are the results obtained. The average result of ...

Read more »

Two sample Z-test

July 22, 2009
By
Two sample Z-test

Comparison of the means of two independent groups of samples, taken from two populations with known variance.Is asked to compare the average heights of two groups. The first group (A) consists of individuals of Italian nationality (the variance of the ...

Read more »

Massively parallel database for analytics

July 22, 2009
By
Massively parallel database for analytics

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL...

Read more »

Massively parallel database for analytics

July 22, 2009
By
Massively parallel database for analytics

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open Source. Alternative, column-based, backends to PostgreSQL...

Read more »

One sample Z-test

July 21, 2009
By
One sample Z-test

Comparison of the sample mean with know population mean and standard deviation.Suppose that 10 volunteers have done an intelligence test; here are the results obtained. The mean obtained at the same test, from the entire population is 75. You want to c...

Read more »

RGG#155, 156 and 157

July 21, 2009
By
RGG#155, 156 and 157

I pushed 3 more graphics from Biecek Przemyslaw to the graphics gallery A list of popular names for colors from packages RColorBrewer, colorRamps, grDevices A set of examples of few graphical low-level parameters lend, ljoin, xpd, adj, lege...

Read more »

Score with scoring rules

July 21, 2009
By
Score with scoring rules

INCENTIVES TO STATE PROBABILITIES OF BELIEF TRUTHFULLY We have all been there. You are running an experiment in which you would like participants to tell you what they believe. In particular, you’d like them to tell you what they believe to be the probability that an event will occur. Normally, you would ask them. But

Read more »

Geometric and harmonic means in R

July 20, 2009
By
Geometric and harmonic means in R

Compute the geometric mean and harmonic mean in R of this sequence.10, 2, 19, 24, 6, 23, 47, 24, 54, 77These features are not present in the standard package of R, although they are easily available in some packets. However, it is easy to calculate the...

Read more »

Adding a legend to a plot

July 20, 2009
By
Adding a legend to a plot

It's pretty easy!plot (c(1968,2010),c(0,10),type="n", # sets the x and y axes scales xlab="Year",ylab="Expenditures/GDP (%)") # adds titles to the axes lines(year,defense,col="red",lwd=2.5) # adds a line for defense expenditures lines(year,health,col="...

Read more »

Adding a legend to a plot

July 20, 2009
By
Adding a legend to a plot

It's pretty easy!plot (c(1968,2010),c(0,10),type="n", # sets the x and y axes scales xlab="Year",ylab="Expenditures/GDP (%)") # adds titles to the axes lines(year,defense,col="red",lwd=2.5) # adds a line for defense expenditures lines(year,health,col="...

Read more »

Example 7.6: Find Amazon sales rank for a book

July 20, 2009
By
Example 7.6: Find Amazon sales rank for a book

In honor of Amazon's official release date for the book, we offer this blog entry.Both SAS and R can be used to find the Amazon Sales Rank for a book by downloading the desired web page and ferreting out the appropriate line. This code is likely to br...

Read more »

ggplot2: more wicked-cool plots in R

July 20, 2009
By

As far as I know there are 3 different systems for producing figures in R: (1) base graphics, included with R, (2) the lattice package, and (3) ggplot2, one of the newer plotting systems which is, according to the creator Hadley Wickham, "based on the grammar of graphics, which tries to take the good parts of base and lattice...

Read more »

Probability exercise: negative binomial distribution

July 19, 2009
By
Probability exercise: negative binomial distribution

What is the probability you get the 4th cross before the 3rd head, flipping a coin?The mathematical formula for solving this exercise, which follows a negative binomial distribution, is:$$f(x)=P(X=x)=\begin{pmatrix} x+y-1\\ y-1 \end{pmatrix} \cdot p^x ...

Read more »

New RInside release

July 19, 2009
By

I just rolled up a new release of RInside, my C++ wrapper classes which facilitate embedding R into your own C++ application. This releases owes a big Thank you! to Miguel Lechón who not only noticed errant behaviour and occassional segfaults with overly long commands sent to the embedded R, but even traced it to an oversight of mine in a simple memory buffer class...

Read more »

New RInside release

July 19, 2009
By

I just rolled up a new release of RInside, my C++ wrapper classes which facilitate embedding R into your own C++ application. This releases owes a big Thank you! to Miguel Lechón who not only noticed errant behaviour and occassional segfaults with ov...

Read more »

New RInside release

July 19, 2009
By

I just rolled up a new release of RInside, my C++ wrapper classes which facilitate embedding R into your own C++ application. This releases owes a big Thank you! to Miguel Lechón who not only noticed errant behaviour and occassional segfaults with overly long commands sent to the embedded R, but even traced it to an oversight of mine in a simple memory buffer class...

Read more »

David Varadi’s RSI(2) alternative

July 19, 2009
By
David Varadi’s RSI(2) alternative

Here's a quick R implementation of David Varadi's alternative to the RSI(2).  Michael Stokes over at the MarketSci blog has three great posts exploring this indicator: Varadi’s RSI(2) Alternative: The DV(2) RSI(2) vs. DV(2) Last Couple...

Read more »

A probability exercise on the Bernoulli distribution

July 18, 2009
By
A probability exercise on the Bernoulli distribution

What is the probability, flipping a coin 8 times, to obtain the sequence HHTTTHTT? (H = head; T= tail)The theory teaches us that to solve this question, we can simply use the following formula:$$f(x)=P(X=x)=B(n,p)=\begin{pmatrix}n\\ x \end{pmatrix} \cd...

Read more »

Let us practice with some functions of R

July 18, 2009
By
Let us practice with some functions of R

Given the following data set, compute the arithmetic mean, median, variance, standard deviation; find the greatest and the smaller value, the sum of all values, the square of the sum of all values, the sum of the square of all values; assigne the ranks...

Read more »

Book excerpts now posted

July 18, 2009
By
Book excerpts now posted

We've posted excerpts from the book on the book website. The excerpts include Chapter 3 (regression and ANOVA) in its entirety. This demonstrates how the entries (the generic descriptions of software functions) and the worked examples reinforce each ...

Read more »

Parsing GEO SOFT files with Python and Sqlite

July 17, 2009
By
Parsing GEO SOFT files with Python and Sqlite

NCBI's GEO database of gene expression data is a great resource, but its records are very open ended. This lack of rigidity was perhaps necessary to accommodate the variety of measurement technologies, but makes getting data out a little tricky. But, a...

Read more »

Simple Data Visualization

July 16, 2009
By
Simple Data Visualization

OK, so, I know I already raved about one Hadley Wickham project and how it has changed my life last week. But what can I say, the man is a genius. And if you are using R (and let’s face it, you should be) and you want simple sexy graphs made quick, the man has

Read more »

Influence.ME: Simple Analysis

July 16, 2009
By
Influence.ME: Simple Analysis

With the introduction of our new package for influential data influence.ME, I’m currently writing a manual for the package. This manual will address topics for both the experienced, and the inexperienced users. I will also present much of the content ...

Read more »

Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

Read more »

Missing data, logistic regression, and a predicted values plot (or two)

July 15, 2009
By

miss attach miss result1 summary(result1) Call: glm(formula = a ~ b, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max -1.8864 -1.2036 0.7397 0.9425 1.4385 Coefficients: ...

Read more »

Job grade plot

July 15, 2009
By
Job grade plot

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...

Read more »

Job grade plot

July 15, 2009
By
Job grade plot

This plot:was created using the following R code:plot (q9e~q8, type = "n",xlim = c(1,13), ylim = c(1,13),cex.lab=1.25,cex.axis=0.75, col.lab = "#333333", xlab = "Obama job grade",ylab = "Congressional job grade", xaxt ="n", yaxt="n",main="Obama and Co...

Read more »

Example 7.5: Replicating a prettier jittered scatterplot

July 15, 2009
By
Example 7.5: Replicating a prettier jittered scatterplot

The scatterplot in section 7.4 is a plot we could use repeatedly. We demonstrate how to create a macro (SAS, section A.8) and a function (R, section B.5) to do it more easily.SAS%macro logiplot(x=x, y=y, data=, jitterwidth=.05, smooth=50);data lp1;set...

Read more »