Distribution of colors by flag

October 22, 2012
By
Distribution of colors by flag

A story: We showed you how to use R to assess flag similarity and make a scatter plot of raster images. Dr. Wickham referred us to the set of 2400 flag icons made available by GoSquared, and then (probably jokingly) challenged us to replicate the cool...

Read more »

Going to the Movies…

October 22, 2012
By
Going to the Movies…

Today, let us have a look at movies. The Internet Movie Database (IMDb) has some data dumps available on their website. It's a subset of the information available on the IMDb site, but it's more than enough. I will spare you my code to convert these da...

Read more »

Predict User’s Return Visit within a day part-2

October 22, 2012
By
Predict User’s Return Visit within a day part-2

Welcome to the second part of the series on predicting user’s revisit to the website. In my earlier blog Logistic Regression with R, I discussed what is logistic regression. In the first part of the series, we applied logistic regression to available data set. The problem statement there was whether a user will return in

Read more »

Predict User’s Return Visit within a day part-1

October 22, 2012
By
Predict User’s Return Visit within a day part-1

In my earlier blog, I have discussed about what is logistic regression? And how logistic model is generated in R? Now we will apply that learning on a specific problem of prediction. In this post, I will create a basic model to predict whether a user will return on website in next 24 hours. This

Read more »

Classes and Objects in R

October 21, 2012
By

Classes and objects in R Welcome back! In this blog post I'm going to try to tackle the concept of objects in R. R is said to be an “object oriented” language. I touched on this in my last post when we discussed the concatenate function c() and I'll go a bit beyond that this time. Speaking of the c() function, I'll begin this...

Read more »

Logistic Regression with R

October 21, 2012
By
Logistic Regression with R

Logistic Regression In my first blog post, I have explained about the what is regression? And how linear regression model is generated in R? In this post, I will explain what is logistic regression? And how the logistic regression model is generated in R? Let’s first understand logistic regression. Logistic regression is one of the

Read more »

Basics of JavaScript and D3 for R Users

October 21, 2012
By
Basics of JavaScript and D3 for R Users

Hadley Wickham, creator of the ggplot2 R package, has been learning JavaScript and its D3 library for the next iteration of ggplot2 (tentatively titled r2d3?)… so I suspect it’s only a matter of time before he pulls the rest of the … Continue reading →

Read more »

Player timelines with ggplot

October 21, 2012
By
Player timelines with ggplot

Timelines can be quite a handy way of getting an overview of a player’s career in terms of when they played, with which team and who were their contemporaries As often is the case, I turned to Stackoverflow to set me on my way for an R solution. In this instance, I did not take

Read more »

ggmcmc – diagnostic plots for MCMC with ggplot2

October 21, 2012
By
ggmcmc – diagnostic plots for MCMC with ggplot2

Xavier Fernández i Marín, who maintains the jags package on Gentoo Linux, writes to tell me he is developing the R package ggmcmc. This package is for visualizing Markov Chain Monte Carlo output using ggplot2 graphics and  should complement the … Continue reading →

Read more »

Looking to the PCA scores with GGobi

October 21, 2012
By
Looking to the PCA scores with GGobi

In this post I continue with the unsupervised exploration of oil spectra, which we have seen in previous post ( PCA with "ChemoSpec" - 001).In the manual "ChemoSpec:An R Package for Chemometric Analysis of Spectroscopic Data", (page 23) there is a brie...

Read more »

Momentum in R: Part 2

October 20, 2012
By
Momentum in R: Part 2

Many of the sites I linked to in the previous post have articles or papers on momentum investing that investigate the typical ranking factors; 3, 6, 9, and 12 month returns. Most (not all) of the articles seek to find which is the “best” look-back period to rank the assets. Say that the outcome of … Continue reading...

Read more »

Le Monde puzzle (rainy Sunday!)

October 20, 2012
By
Le Monde puzzle (rainy Sunday!)

On October 14, the weekend edition of Le Monde had the following puzzle: consider four boxes that contain all integers between 1 and 9999, in such a way that for any N, N, 2N, 3N, and 4N are in four different boxes. If 1,2,3 and 4 are in boxes labelled 1,2,3 and 4, respectively, in

Read more »

Carl Morris Symposium on Large-Scale Data Inference (2/3)

October 20, 2012
By
Carl Morris Symposium on Large-Scale Data Inference (2/3)

Continuing the summary of last week’s symposium on statistics and data visualization (see part 1 and part 3)… Here I describe Dianne Cook’s discussion of visual inference, and Rob Kass’ talk on statistics in cognitive neuroscience. [Edit: I've added a few … Continue reading →

Read more »

Recoding Variables in R: Pedagogic Considerations

October 20, 2012
By

I was creating a dataset this last week in which I had to partition the observed responses to show how the ANOVA model partitions the variability. I had the observed Y (in this case prices for 113 bottles of wine), … Continue reading →

Read more »

PCA with "ChemoSpec" – 001

October 20, 2012
By
PCA with "ChemoSpec" – 001

In my last post about "ChemoSpec package" (Hierarchical Cluster Analysis (ChemoSpec) - 02), we saw the two cluster groups (one for olive oil, other for sunflower oil), and also another sub-clusters for the sunflower oil.Continue reading the manual "Che...

Read more »

CFP: DMApps 2013 – Workshop on Data Mining Applications in Industry and Government, submission due by Jan 6, 2013

October 19, 2012
By
CFP: DMApps 2013 – Workshop on Data Mining Applications in Industry and Government, submission due by Jan 6, 2013

CALL FOR PAPERS DMApps 2013: the International Workshop on Data Mining Applications in Industry & Government In conjunction with PAKDD 2013, Gold Coast, Australia, April 14-17, 2013 http://dmapps2013.rdatamining.com The 2013 International Workshop on Data Mining Applications in Industry & Government … Continue reading →

Read more »

Introduction to Bayesian lecture: Accompanying handouts and demos

October 19, 2012
By
Introduction to Bayesian lecture: Accompanying handouts and demos

I recently posted the slides from a guest lecture that I gave on Bayesian methods for biologists/ecologist. In an effort to promote active learning, the class was not a straight forward lecture, but rather a combination of informational input from me and opportunities for students to engage with the concepts via activities and discussion of

Read more »

Tidbit: Correlation and Simple Linear Regression

October 19, 2012
By

In business "Correlation" is generically used as a mutual relationship or connection between two or more things; statistically speaking correlation is the interdependence of variable quantities. I overhear many end users request information on the correlation of variables for prediction use, what they are referring to is actually simple linear regression. I don't mean to outline all

Read more »

Because it’s Friday: 7 billion-person ‘continents’

October 19, 2012
By
Because it’s Friday: 7 billion-person ‘continents’

The population of the world has been over 7 billion for about a year now. But those seven billion aren't distributed equally around the globe. 1.2 billion people — about in India alone (despite it havingjust 2% of the world's land area). At the other end of the spectrum, the entire continent of Australia houses about 0.3% of Australia....

Read more »

Venturing in text mining with ‘R’

October 19, 2012
By

Background:Hello Friends, hope all of you are doing just great.  I decided to create my footprint in the blog space, it comes from my desire to share few very basic steps of text mining, with all of you.  I am neither a nerd, or statistician or an established data scientist and if you are one of them well,...

Read more »

Stella Copeland’s Intro to Mixed Models in R

October 19, 2012
By

In D-RUG today, Stella Copeland gave a quick introduction to mixed models in R. Here’s the script that she presented: Get the data file for this script here Stella also recommends this paper by Ben Bolker as a quick introduction to the topic.

Read more »

Up and Coming R User Group meetings

October 19, 2012
By

Mango Solutions are pleased to announce the forthcoming R user Group meetings that they will be hosting or participating in.  To attend, please see the registration information on the relevant websites:   1.       GreatBoston useR Group      (http://www.meetup.com/Boston-useR/ ) Date:                     Tuesday 23rd October Venue:                 IBM Cambridge, 1 Rogers Street, Cambridge, MA Time:                     6.15pm Presentation:    Creating and Designing Applications...

Read more »

Soccer is all about money (?) – Part 3: More plots & analyses

October 19, 2012
By
Soccer is all about money (?) – Part 3: More plots & analyses

Let's play around a bit more with the dataset we built in Part 1 of this series.Now we are going to compare data from more championships in Europe.Let's check out the first divisions from the following countries:- Germany (1. Bundesliga)- England (Premier League)- Spain (Primera División)- Italy (Serie A)- France (League 1)If you want to replicate the...

Read more »

Adding a background to your ggplot

October 19, 2012
By
Adding a background to your ggplot

I really enjoy using the DW-NOMINATE data for examples, as I do here. Sometimes it’s useful to indicate regions in the background of a plot — perhaps two-dimensional regions of interest, perhaps one-dimensional periods in time. It’s...

Read more »

Visualizing colors()

October 19, 2012
By
Visualizing colors()

The other day I learnt about the existance of the colors() vector in R which specifies all the character-based colors like “light blue”, “black”, etc. So I made a simple plot to visualize them all. Here’s the code: mat <- matrix(1:length(colors()), ncol = 9, byrow= TRUE) df <- data.frame(col = colors(), ...

Read more »

Company Valuation using Discounted Cash Flows

October 18, 2012
By
Company Valuation using Discounted Cash Flows

Today I want to show a simple example of how we can value a company using Discounted Cash Flow (DCF) analysis. The idea is to compute the company’s Intrinsic Value based on the discounted future cash-flows. To compute future cash-flows I will use the historical Free Cash Flow growth rate. To compute present value of

Read more »

Using FAFSA Data to Define Competitor Density

October 18, 2012
By
Using FAFSA Data to Define Competitor Density

I have been thinking a lot about how to define and discuss competition at the undergraduate level.   I will save the chat on which dataset is better (ASQ, Student Clearinghouse, social media, etc.) for another day. One common question I get as an analyst in Enrollment Management is how to “define” competition. While it’s

Read more »

The rapidly increasing ideology of the US Republican Party

October 18, 2012
By
The rapidly increasing ideology of the US Republican Party

The chart below comes by way of the is.R blog and shows the average ideology of the members of the United State House of Representatives within the Republican (red) and Democratic (blue) parties. (Other parties are shown in green.) The chart is shown as a time series, from the first US congress in 1789, to the most recent full...

Read more »

Benchmarking distance calculation in R

October 18, 2012
By
Benchmarking distance calculation in R

A typical step in a lot of data mining methods is the calculation of a distance between entities. For example using the nearest-neighbor method it is crucial to do this calculation very efficiently because it is the most time-consuming step of the procedure. Just imagine you want to compute the Euclidean distance between a constantly changing database...

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.