Structural Equation Modeling: Separating the General from the Specific (Part II)

August 26, 2012
By
Structural Equation Modeling: Separating the General from the Specific (Part II)

As promised in Halo Effects and Multicollinearity (my last post), I will show how to run a confirmatory factor analysis in R to test our bifactor model.  In addition, I will include a dependent variable and fit a structural equation mode...

Read more »

A Chlorpleth Map of Free and Reduced Price Lunch in R

August 26, 2012
By
A Chlorpleth Map of Free and Reduced Price Lunch in R

Charles Blow has an excellent op-ed in the New York Times about public education this week. The most important point he makes is that the defunding of public education is coming at precisely the time when American school children are most vulnerable:No...

Read more »

Walmart Invasion

August 26, 2012
By
Walmart Invasion

As an invasion biologist, the process of spatial spread is at the heart of what I do. When I came across this dataset of Walmart store openings since 1962 I couldn’t help but see it as an invasion front which looks a lot like a biological invasion or (albeit slow) epidemic. The video shows monthly

Read more »

Kaggle Prospect – Harvard Business Review

August 25, 2012
By

This post is meant for submitting visual analysis for the Harvard Business Review Contest on KaggleI used the subject lines for all the articles and all the years and mapped the articles into one of the following 18 categories  Business Ethics  Business Management  Crisis  Emerging Markets  Financial Performance  Health Care  Information Technology  Labor  Leadership  Management Systems  Marketing Strategy  Regulation  Social Media  Stock Market  Strategic Planning  Supply Chain  United States & World  Women & Management Changes in...

Read more »

Economic geography of the eastern USA circa 1999, median incomes…

August 25, 2012
By
Economic geography of the eastern USA
circa 1999, median incomes…

Economic geography of the eastern USA circa 1999, median incomes by zip code Code and data source to follow in a longer post.

Read more »

Why R for Mass Spectrometrist and Computational Proteomics

August 25, 2012
By
Why R for Mass Spectrometrist and Computational Proteomics

Why R:Actually, It is a common practice the integration of the statistical analysis of the resulted data and in silico predictions of the data generated in your manuscript and your daily research. Mass spectrometrist, biologist and bioinformaticians c...

Read more »

Love for ProjectTemplate

August 25, 2012
By
Love for ProjectTemplate

The advantage about writing a blog post about the tools you wish that you’d used throughout grad school is that, well, it makes you check them out. I went through the ProjectTemplate tutorial, and I’m hooked. Here’s the advantages as … Continue reading →

Read more »

London 2012 Olympics — Medals vs GDP and population

August 25, 2012
By
London 2012 Olympics — Medals vs GDP and population

It’s already midnight. I’m sitting near my bed. And before going to bed, I’ll type my last post on London 2012 Olympics. Olympic games are not only individual competitions, but also the reflections of countries’ strength. This is one reason why Olympics data … Continue reading →

Read more »

Exporting ctree object to Asymptote

August 25, 2012
By
Exporting ctree object to Asymptote

When producing regression or classification trees (standard rpart or ctree from party package) in GNU R I am often unsatisfied with the default plots they produce. One of many possible solutions is to export a tree plot to Asymptote.The ...

Read more »

Count data and GLMs: choosing among Poisson, negative binomial, and zero-inflated models

August 24, 2012
By

Ecologists commonly collect data representing counts of organisms. Generalized linear models (GLMs) provide a powerful tool for analyzing count data. The starting point for count data is a GLM with Poisson-distributed errors, but

Read more »

Commandeering a map from PDF or EPS, using Inkscape and R

August 24, 2012
By
Commandeering a map from PDF or EPS, using Inkscape and R

I love Nathan Yau’s tutorial on making choropleths from a SVG file. However, if you don’t have a SVG handy already and instead you want to repurpose a map from another vector format such as PDF or EPS, there are … Continue reading →

Read more »

Toy Example with GScholarScraper_3.1

August 24, 2012
By
Toy Example with GScholarScraper_3.1

A commentator on my blog brought up this nice idea of how to use the GScholarScraper function for bibliometrics..I altered the code a little bit which enables to set a year since when results should be returned and added a field to the output collectin...

Read more »

MPK Analytics – putting the R into analytics

August 24, 2012
By
MPK Analytics – putting the R into analytics

Welcome to the blog of MPK Analytics – the consulting and training company whose mission it is to help clients in academia, industry and government to transform their data into insight using

Read more »

Does playing baseball shorten your lifespan? (Answer: No.)

August 24, 2012
By
Does playing baseball shorten your lifespan? (Answer: No.)

A National Institute for Occupational Safety and Health study, published in March, found that professional American football (NFL) players lived longer, on average, than similar "mere mortals" in the general population. Football is a dangerous sport, so that might seem surprising at first, until you consider the fact that NFL players are elite sportsmen: only the strongest, fastest and...

Read more »

Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

August 24, 2012
By
Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

Finding the best subset of variables for a regression is a very common task in statistics and machine learning. There are statistical methods based on asymptotic normal theory that can help you decide whether to add or remove a variable at a time. The ...

Read more »

Visualizing the Arctic Sea Ice Extent Decline

August 24, 2012
By
Visualizing the Arctic Sea Ice Extent Decline

Understanding what is happening to Arctic sea ice is critical to recognizing the serious consequences of global warming. So I want to help people visualize the 30+ year trend in Arctic sea ice extent. The source data file is here: … Continue reading →

Read more »

CRAN might get tenure at Yale?

August 24, 2012
By

From one of the R lists I follow: Today (2012-08-23) on CRAN : “Currently, the CRAN package repository features 4001 available packages.” These packages are maintained by approximately 2350 different folks. Previous milestones: 2011-05-12: 3,000 packages 2009-10-04: 2,000 packages 2007-04-12: 1,000 packages 2004-10-01: 500 packages 2003-04-01: 250 packages http://cran.r-project.org/web/packages/

Read more »

Data analysis using R – course in Essex

August 24, 2012
By
Data analysis using R – course in Essex

This course is running 1-5 October at the University of Essex. There doesn’t seem to be a website but you register by writing to [email protected] Here’s what they say in their e-mail: Lecturers: Dr Werner Adler (University of Erlangen-Nuremberg; Co-author … Continue reading →

Read more »

ggplot2 Self-deprecation

August 24, 2012
By
ggplot2 Self-deprecation

I've been in China working for a few weeks (where this blog is (oddly) blocked). So, I haven't been able to post much over the summer. To kick things off for the new (academic) year, I thought I might just re-post something good I saw on the Book of Sa...

Read more »

Comparing hist() and cut() R functions

August 24, 2012
By

The other day a question about faceting data came up in the Dallas R Users group (link of conversation). The hist() function is more efficient and uses less memory than the cut() function. Additionally, hist() returns an object that makes...

Read more »

Momentum with R: Part 1

August 23, 2012
By
Momentum with R: Part 1

Time really flies… it is hard to believe that it has been over a month since my last post. Work and life in general have consumed much of my time lately and left little time for research and blog posts. Anyway, on to the post! This post will be the first in a series of … Continue reading...

Read more »

Revolution Analytics receives Top Innovator award for Data Science Technology

August 23, 2012
By
Revolution Analytics receives Top Innovator award for Data Science Technology

A big thank-you to all the R users out there who voted for Revolution R Enterprise in DataWeek Awards. We're so pleased to be recognized by the voters and the DataWeek judging panel with the Top Innovator Award for Data Science Technology. We're looking forward to the awards ceremony next week at DataWeek SF (in San Francisco, September 24-27)....

Read more »

difference between NA and NaN in R

August 23, 2012
By

We usually see NA and NaN in R. What's the difference between them?Here a good post for that topic:http://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-naIn summary here:NaN ("Not a Number") means 0/0NA ("Not Available") is generally interpreted as a missing value and has various forms - NA_integer_, NA_real_, etc. Therefore, NaN ≠ NA and there is a need for NaN and NA.is.na() returns TRUE for both NA...

Read more »

Bonds Much Sharpe -r Than Buffett

August 23, 2012
By
Bonds Much Sharpe -r Than Buffett

Mebane Faber’s post Buffett’s Alpha points out Warren Buffett’s 0.76 Sharpe Ratio discussed in the similarly title paper Buffet’s Alpha.  I of course immediately think about the 8th Wonder of the World – the US Bond Market, whose Sharpe ...

Read more »

R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web

August 23, 2012
By
R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web

In this last post of my little series (see my latest post) on R and the web I explain how to extract data of a website (web scraping/screen scraping) with R. If the data you want to analyze are a part of a web page, for example a HTML-table (or hundreds of...

Read more »

Ebola in the Congo

August 23, 2012
By
Ebola in the Congo

Ebola has also appeared in the Democratic Republic of the Congo. The WHO report from 21 August reports 15 cases and 10 deaths, mostly in the town of Isiro. This outbreak is in no way related to the Ugandan outbreak … Continue reading →

Read more »

How robust is logistic regression?

August 23, 2012
By
How robust is logistic regression?

Logistic Regression is a popular and effective technique for modeling categorical outcomes as a function of both continuous and categorical variables. The question is: how robust is it? Or: how robust are the common implementations? (note: we are using robust in a more standard English sense of performs well for all inputs, not in the Related posts:

Read more »

London 2012 Olympics — Men and Women 400-metre medley

August 23, 2012
By
London 2012 Olympics — Men and Women 400-metre medley

Alan brought up the suspicion to Ye’s world record in women 400 metres individual medley. And I quote: “Her last split caused controversy (deep suspicion of doping) as she swam it faster than the fastest male swimmer. I wonder how commonly this … Continue reading →

Read more »

Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment

August 23, 2012
By
Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment

One of the possible barriers to widespread adoption of open notebook science is knowing where to start. Video reports of lab experiments hosted on Youtube can be easily embedded in a hosted WordPress blog; a MediaWiki wiki can be used to provide one page per experiment, with change tracking/history on each page and a shadow

Read more »