PCA to PLS modeling analysis strategy for WIDE DATA

March 2, 2013
By
PCA to PLS modeling analysis strategy for WIDE DATA

Working with wide data is already hard enough, add to this row outliers and things can get murky fast. Here is an example of an anlysis of a wide data set, 24 rows  x 84 columns. Using imDEV, written in R, to calculate and visualize a principal components analysis (PCA) on this data set. We find that

Read more »

About This Blog

March 2, 2013
By

About This BlogMy name is Isaac and I'm a Ph.D. student in Clinical Psychology. Why am I writing about fantasy football and data analysis? Because fantasy football involves the intersection of two things I love: sports and statistics. With this blog, I...

Read more »

Percentage Winner

March 2, 2013
By
Percentage Winner

I know, it has been often said by people, much brighter and more competent that I will ever be. The least important figure to look at are the Percentage Winner.Personally I find it very challenging and difficult to follow a system - or even provide see...

Read more »

PowerBuilder and R get together

PowerBuilder and R get together

The other day I was thinking about writing a blog using PowerBuilder, but couldn't decide which one other technology I should integrate it...of course...R came to my mind...My journey started around 4 days ago...when I start looking for ways to call R ...

Read more »

RcppArmadillo 0.3.800.0

March 2, 2013
By

A new Armadillo version 3.800.0 is now out. Conrad picked a new numbering scheme to coincide with the relicensing from LGPL to MPL 2.0. The new version 0.3.800.0 of the corresponding RcppArmadillo package (which still uses GPL 2 or later) is now on ...

Read more »

Adding Labels to Points in a Scatter Plot in R

Adding Labels to Points in a Scatter Plot in R

What’s the Scatter? A scatter plot displays the values of 2 variables for a set of data, and it is a very useful way to visualize data during exploratory data analysis, especially (though not exclusively) when you are interested in the relationship between a predictor variable and a target variable.  Sometimes, such data come with categorical

Read more »

The options mechanism in R

March 2, 2013
By

Customization in R. Basics Several features benefit from being customizable — either because of personal taste or specifics of the environment. The way R implements this flexibility is through the options function.  This both sets and reports options.  For example, we can see the names of the options that are set by default: > names(options()) The post The...

Read more »

making a random walk geometrically ergodic

March 1, 2013
By
making a random walk geometrically ergodic

While a random walk Metropolis-Hastings algorithm cannot be uniformly ergodic in a general setting (Mengersen and Tweedie, AoS, 1996), because it needs more energy to leave far away starting points, it can be geometrically ergodic depending on the target (and the proposal). In a recent Annals of Statistics paper, Leif Johnson and Charlie Geyer designed

Read more »

Tools for making a paper

March 1, 2013
By

Since it seems to be the fashion, here’s a post about how I make my academic papers. Actually, who am I trying to kid? This is also about how I make slides, letters, memos and “Back in 10 minutes” signs to pin on the door. Nevertheless it’s for making academic papers that I’m going to

Read more »

R 2.15.3 is released

March 1, 2013
By

Follows is the announcement today from Peter Dalgaard, for the R Core Team: The build system rolled up R-2.15.3.tar.gz (codename “Security Blanket”) at 9:00 this morning. This is intended to be the final round-up release of the 2.15 series, and in fact of the entire 2.x.y series which started 2004-10-04. The list below details the changes in this release. You can get...

Read more »

Using ENCODE methylation data (RRBS) in R

March 1, 2013
By
Using ENCODE methylation data (RRBS) in R

ENCODE project has generated reduced-representation bilsulfite sequencing data for multiple cell lines. The data is organized in an extended bed format with additional columns denoting % methylation and coverage per base. Luckily, this sort of generic...

Read more »

R 2.15.3 is released

March 1, 2013
By

The final installment of the R 2.x series is now available: R 2.15.3 was released this morning. If you build R yourself, the source files can be downloaded from CRAN now; pre-built binaries for Windows, Mac and Linux will be available from the various CRAN mirrors over the next few days. This update mainly fixes a few minor bugs,...

Read more »

Counts may be ratio, but not importance

March 1, 2013
By

I can see from those of you who have contacted me that there is still some confusion about the claims made by Sawtooth that MaxDiff estimates can be converted to ratio-scale probabilities.  Many of you seem to believe that if attribute A...

Read more »

Data Visualization: Shiny Spiced Consulting

March 1, 2013
By

If you haven’t already heard, RStudio has developed an incredibly easy way to deploy R on the web with its Shiny Package. For those who have heard, this really isn’t new as bloggers have already been blogging about it for some … Continue reading → The post Data Visualization: Shiny Spiced Consulting appeared first on Data Community DC.

Read more »

Overlapping Histogram in R

March 1, 2013
By
Overlapping Histogram in R

While preparing a class exercise involving the use of overlaying of histogram, I searched Google on possible article or discussion on the said topic. Luckily, I found a blog where the author demonstrated an R function to create an overlapping histogram...

Read more »

Data Visualization: Shiny Spiced Consulting

March 1, 2013
By

If you haven't already heard, RStudio has developed an incredibly easy way to deploy R on the web with its Shiny Package. For those who have heard, this really isn't new as bloggers have already been blogging about it for some months now, but I have primarily seen a focus on how to build Shiny apps, and feel...

Read more »

Using Rcpp with Boost.Regex for regular expression

March 1, 2013
By
Using Rcpp with Boost.Regex for regular expression

Gabor asked about Rcpp use with regular expression libraries. This post shows a very simple example, based onone of the Boost.RegEx examples. We need to set linker options. This can be as simple as Sys.setenv("PKG_LIBS"="-lboost_regex") With that, the following example can be built: // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}){3}\\d{4}"); ...

Read more »

Using Rcpp with Boost.Regex for regular expression

March 1, 2013
By
Using Rcpp with Boost.Regex for regular expression

Gabor asked about Rcpp use with regular expression libraries. This post shows a very simple example, based onone of the Boost.RegEx examples. We need to set linker options. This can be as simple as Sys.setenv("PKG_LIBS"="-lboost_regex") With that, the following example can be built: // cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}){3}\\d{4}"); ...

Read more »

Shading and Points with xtsExtra plot.xts

February 28, 2013
By
Shading and Points with xtsExtra plot.xts

For some reason, I feel like have much better control with plot.xts function from the xtsExtra package described here over some of the other more refined R graphical packages. Maybe, it is just my simple mind, but recently I wanted to shade holding per...

Read more »

ETS models now in EViews 8

February 28, 2013
By
ETS models now in EViews 8

The ETS modelling framework developed in my 2002 IJF paper (with Koehler, Snyder and Grose), and in my 2008 Springer book (with Koehler, Ord and Snyder), is now available in EViews 8. I had no idea they were even working on it, so it was quite a surprise to be told that EViews now includes ETS models. Here is the blurb...

Read more »

Statistical computation in JavaScript — am I nuts?

February 28, 2013
By

Over the past couple weeks, I’ve been considering alternatives to R. I’d heard Python was much faster, so I translated a piece of R code with several nested loops into Python (it ran an order of magnitude faster). To find out more about Mathematica 9, I had an extended conversation with some representatives from Wolfram

Read more »

Shapefiles in R

February 28, 2013
By
Shapefiles in R

Let's learn how to use Shapefiles in R. This will allow us to map data for complicated areas or jurisdictions like zipcodes or school districts. For the United States, many shapefiles are available from the Census Bureau. Our example will map U.S. nati...

Read more »

EPL Table Motion Chart

February 28, 2013
By

The Shiny package provides great user interactivity and another boost to its attractiveness has come with its integration with googleVis. Markus Gesman provides some background in a blog article with coded examples which he along with fellow googleVis creator, Diego de Castillo and lead Shiny developer Winson Chang have furnished There are at least three

Read more »

Using R in LaTeX with knitr and RStudio

February 28, 2013
By
Using R in LaTeX with knitr and RStudio

Hi, I presented today at INSEE R user group (FLR) how to use knitr (Sweave evolution) for writing documents which are self contained with respect to the source code: your data changed? No big deal, just compile your .Rnw file again and you are done with an updated version of your paper! is easy. Some

Read more »

Summary of My First Trip to Strata #strataconf

February 28, 2013
By
Summary of My First Trip to Strata #strataconf

In this post I am goIing to summarize some of the things that I learned at Strata Santa Clara 2013. For now, I will only discuss the conference sessions as I have a much longer post about the tutorial sessions that I am still working on and will post at a later date. I will add to this post...

Read more »

Slides from "Big Data Real Time Predictive Analytics"

February 28, 2013
By

At Tuesday's Data Driven Business Day at the Strata conference I gave my talk, Real-time Big Data Predictive Analytics: From Deployment to Production. My goal in the talk was to explain the buzz-phrases "real time", "big data" and "predictive examples" in the context of a specific example: why are some web ads today uncannily targeted at our personal interests...

Read more »

Pollination effectiveness landscape

February 28, 2013
By
Pollination effectiveness landscape

I want to show you a pollination landscape, but this is not a pollinator landscape with flowers and nesting sites, but a plot showing two components of pollination. Quantity and quality. A recent paper by Pedro Jordano (see here for other … Continue reading →

Read more »

Pollination effectiveness landscape

February 28, 2013
By
Pollination effectiveness landscape

I want to show you a pollination landscape, but this is not a pollinator landscape with flowers and nesting sites, but a plot showing two components of pollination. Quantity and quality. A recent paper by Pedro Jordano (see here for other … Continue reading →

Read more »

Classifying Emails as Spam or Ham using RTextTools

February 28, 2013
By

Recently, I had read an article on R-bloggers, titled Classifying Breast Cancer as Benign or Malignent using RTextTools by Timothy P. Jurka, who is the author of both that article and the RTextTools package. Having reproduced the results using the...

Read more »

Sponsors