Forbes: Top 20 influencers in Big Data

February 3, 2012
By

Haydn Shaughnessy at The Forbes blog provides a list of the "Top 20 Influencers in Big Data", and I'm humbled to report that yours truly is listed there at #2. It's an instantaneous ranking based on the social-media tracking tool Traakr, but it's still great to be listed alongside writers for SiliconAngle, GigaOM, and KDNuggets (and even Mashable!). I...

Read more »

New R User Groups in Austin, Adelaide

February 3, 2012
By

It's awesome to see so many local R user groups kicking off in 2011! Yet another is the Austin R User Group in Austin, Texas. They've already held their first informal get-together, and the first formal meeting on February 23 will be devoted to data management techniques in R. Props to Sandy Donlon for organizing the group! And I'm...

Read more »

Why don’t we hear more about Adrian Dantley on ESPN? This graph makes me think he was as good an offensive player as Michael Jordan.

February 3, 2012
By
Why don’t we hear more about Adrian Dantley on ESPN? This graph makes me think he was as good an offensive player as Michael Jordan.

In my last post I complained about efficiency not being discussed enough by NBA announcers and commentators. I pointed out that some of the best scorers have relatively low FG% or TS%. However, via the comments it was pointed out that top scorers need ...

Read more »

Large search spaces using R

February 3, 2012
By

I'm working on some really interesting stuff at the moment, the details of which I can't discuss for reasons of national security (not really). However, one of the things I've been doing a lot of is searching though lots of different combina...

Read more »

How many pages in Scott Walker Recall Petition PDF files?

February 3, 2012
By

Computer Assisted Reporting In an online press release on Tuesday the Wisconsin Government Accountability Board announced they would put all 153,335 pages of PDF copies of the Scott Walker recall petition online later that day. The GAB announced the PD...

Read more »

Green Disk Sizing

February 3, 2012
By
Green Disk Sizing

I finally got around to completing item 5 on my 2011 list concerning electrical power consumed by a magnetic hard disk drive (HDD). The semi-empirical statement is: Power ∝ Nplatters × Ω2.8 × D4.6    . . .    (1) where Nplatters is the number of platters on the spindle, Ω is the rotational speed in revolutions per minute (RPM) and D...

Read more »

Japan Quake Map 2010-2011

February 2, 2012
By
Japan Quake Map 2010-2011

1 Introduction “The 3.11 Tohoku Earthquake in Japan”, It did serious damage to Japan. I have attempted gaining

Read more »

Commonly used R commands (statistics)

February 2, 2012
By

When I say Ease of Use Improved, I mean you can simply copy, paste and run the codes in this post, without referring to other places, without downloading a data file and read it from R. This is how I like a blog article to be. You don’t need to read the whole article. You

Read more »

speed of R, C, &tc.

February 2, 2012
By
speed of R, C, &tc.

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that matlab is a lot

Read more »

"R": Plotting the spectra (Gasoline) – 002

February 2, 2012
By
"R": Plotting the spectra (Gasoline) – 002

"R" has a package called "ChemometricsWithR", where we can get data from different analytical instruments including Near Infrared (NIR).Follow the steps to plot the spectra of a gasoline data set:In this other case we plot the spectra of the NIR shootout 2002: > data(shootout)> wavelengths<-seq(600, 1898,by=2)> mattplot(wavelengths,shootout$calibrate.1,xlab="wavelength(nm)",ylab="log1/R)")>...

Read more »

R Chart featured in Facebook IPO

February 2, 2012
By
R Chart featured in Facebook IPO

Page 7 of Facebook's 213-page S-1 filing for their record-breaking IPO includes the following chart, under the headline: "Our Mission: To make the world more open and connected". This chart was created using the R language and Hadoop by Facebook intern Paul Butler. (Thanks to the blog IOER Tools for first noticing the inclusion of the chart.) And speaking...

Read more »

Serious Stats book and blog update

February 2, 2012
By

This is a quick update to announce my new blog Serious Stats. This is a companion to my forthcoming book of the same name:Baguley, T. (2012, in press). Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrav...

Read more »

R graphic used for Facebook IPO

February 2, 2012
By
R graphic used for Facebook IPO

Apparently former Facebook intern, Paul Butler,  graphic of the Facebook social network graph is being used for Facebook's IPO.  The social network graphic is featured on Page 7 of the IPO filing.  His graphic was featured on mashable an...

Read more »

On linear models with no constant and R2

February 2, 2012
By
On linear models with no constant and R2

In econometrics course we always say to our students that "if you fit a linear model with no constant, then you might have trouble. For instance, you might have a negative R-squared". So I tried to find databases on the internet such that, when we ...

Read more »

R meets HANA

February 2, 2012
By
R meets HANA

If you read my last blog called HANA meets R you will remember that we read data from HANA into R directly, without having to download an .csv file, but using ODBC. This time, we're going to read data from HANA as well, but after do some nice tricks on R, we're going to post back the...

Read more »

Great Maps with ggplot2

February 2, 2012
By
Great Maps with ggplot2

The above map (and this one) was produced using R and g

Read more »

HANA meets R

February 2, 2012
By
HANA meets R

In my previous HANA and R blogs, I have been forced to create .csv files from HANA and read them on R...an easy but also boring procedure...specially if your R report is supposed to be run on a regular basis...having to create an .csv file every time you need to run your report it's not a nice thing...After...

Read more »

tenured research position with ABC skills!

February 2, 2012
By
tenured research position with ABC skills!

I just received this announcement for the opening of a (tenured/civil servant) position in the national research institute in biostatistics, genetics, and agronomy, INRA: Position opening with profile Approximate inference techniques in complex systems Key activities and required skills: You will develop methodological research in the field of statistical inference for models used in environmental

Read more »

Landscape Metrics with R, SDMTools, ImageJ and Bio7

February 2, 2012
By

01.02.2012 Landscape metrics were developed to analyze spatial patterns of landscapes (e.g. composition and spatial arrangement). In R it is possible to calculate these metrics with the “SDMTools” package. Bio7 offers an easy to use interface to R and ImageJ and can use these tools to simplify a workflow to analyze image data (e.g. vegetation

Read more »

Two courses in R programming by Ken Rice and Thomas Lumley

February 2, 2012
By

Ken Rice and Thomas Lumley will give a course on advanced R programming in two locations this summer. 1. In Edinburgh, June 13-15 (the week before the International Conference in Quantitative Genetics). See http://www.eisg2012.org.uk/ 2. In Seattle, July 23-25, as part of the Summer Institute in Statistical Genetics. See http://www.biostat.washington.edu/suminst/sisg/general The course is about 60% lecture and 40% lab session (BYO R),...

Read more »

Analytic applications are built by data scientists

February 1, 2012
By

Ventana Research analyst David Menninger was on the judging panel for the Applications of R in Business contest. In a post on the Ventana research blog, he offers his perspectives on the contest, noting that R, as a statistical package, includes many algorithms for predictive analytics, including regression, clustering, classification, text mining and other techniques. The contest submissions supported...

Read more »

Cochran Q Test for k related samples in R

February 1, 2012
By
Cochran Q Test for k related samples in R

To run the Cochran Q Test in R, we need to download the package of it first, since it is not built-in in R. The name of the package is RVAideMemoire authored by Maxime Hervé. Here's how to do it.Codes:Here we are installing the package named RVAideMem...

Read more »

Transformation of Several Variables in a Dataframe

February 1, 2012
By
Transformation of Several Variables in a Dataframe

This is how I transform several columns of a dataframe, i.e., with count-data into binary coded data (this would apply also for any other conversion..).count1

Read more »

Vectorized R vs Rcpp

February 1, 2012
By
Vectorized R vs Rcpp

In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn’t vectorize R code and at the end I was comparing apples vs oranges. To fix that problem, I built a new script, where I’m trying to compare

Read more »

Are Recessions Environmentally Beneficial?

February 1, 2012
By
Are Recessions Environmentally Beneficial?

Description:Total energy consumption in the United States by sector.  Vertical gray lines represent periodsof recession.Data:http://www.eia.gov/totalenergy/data/annual/index.cfm#consumptionhttp://en.wikipedia.org/wiki/List_of_recessions_in_the_Uni...

Read more »

MAT886 mean excess function (and reinsurance)

February 1, 2012
By
MAT886 mean excess function (and reinsurance)

Tomorrow, in the course on extreme value, we will focus on applications. We will discuss reinsurance pricing. Consider a random variable , a threshold and define the mean excess function. This function is known in life insurance as the average ...

Read more »

"R": Looking at the Data (Gasoline) – 001

February 1, 2012
By
"R": Looking at the Data (Gasoline) – 001

As other softwares "R" has nice tools to look to the data before to develop the calibration.Statistics for the "Y" variable (in this case octane number) like Maximun, Minimun,..,standard deviation,...are important:> library(ChemometricsWithR)> data(gasoline)> summary(gasoline$octane)   Min.  1st Qu.  Median    Mean   3rd Qu.    Max.   83.40   85.88    87.75    87.18   88.45    89.60> sd(gasoline$octane) 1.530078And of course the Histogram:> hist(gasoline$octane)

Read more »

Confirming SSR, SSE, and SST using matrix in R

February 1, 2012
By
Confirming SSR, SSE, and SST using matrix in R

The codes below was done in our regression laboratory class. Here, we run first the data in SPSS, and take the ANOVA output where we can find the computed values of SSR, SSE, and SST.ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regress...

Read more »

R Training Course in the Bay Area

February 1, 2012
By
R Training Course in the Bay Area

An introduction to R for sofware developers and data analysts Saturday March 10th, 2012 8:30-5:00pm EBay 2161 North 1st Street San Jose, California I will be presenting a one day professional development workshop on R programming for software developers and … Continue reading →

Read more »