This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

Today I am giving away 10 sessions of free, online, one-on-one R help. My hope is to get a better understanding of how my readers use R, and the issues they face when working on their own projects. The sessions will be over the next two weeks, online and 30-60 minutes each. I just purchased Screenhero, The post

After you check the distribution of the data by ploting the histogram, the second thing to do is to look for outliers. Identifying the outliers is important becuase it might happen that an association you find in your analysis can be explained by the presence of outliers. The best tool to identify the outliers is

The quest for income microdata For a separate project, I've been looking for source data on income and wealth inequality. Not aggregate data like Gini coefficients or the percentage of income earned by the bottom 20% or top 1%, but the sources used to calculate those things. Because it's sensitve personal financial data either from surveys or tax...

Introduction This should be last in the series of posts based on my R package cricketr. That is, unless some bright idea comes trotting along and light bulbs go on around my head. In this post cricketr adapts to the Twenty20 International format. Now cricketr can handle stats from all 3 formats of the game

Introduction In this post my package ‘cricketr’ takes a swing at One Day Internationals(ODIs). Like test batsman who adapt to ODIs with some innovative strokes, the cricketr package has some additional functions and some modified functions to handle the high strike and economy rates in ODIs. As before I have chosen my top 4 ODI

R allows you to create different plot types, ranging from the basic graph types like density plots, dot plots, bar charts, line charts, pie charts, boxplots and scatter plots, to the more statistically complex types of graphs such as probability plots, mosaic plots and correlograms. In addition, R is pretty known for its data visualization The post

This afternoon, we’ve seen in the training on data science that it was possible to use AIC criteria for model selection. > library(splines) > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="log"))) 438.6314 > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="identity"))) 436.3997 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="log"))) 425.6434 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="identity"))) 428.7195 And I’ve been asked...

I don’t understand why any researcher would choose not to use panel/multilevel methods on panel/hierarchical data. Let’s take the following linear regression as an example: , where is a random effect for the i-th group. A pooled OLS regression model for the above is unbiased and consistent. However, it will be inefficient, unless for all

e-mails with the latest R posts.

(You will not see this message again.)