self-organizing map in R

July 19, 2012
By
self-organizing map in R

This is my first SOM figure :)Thanks to the som package and example code from Jun Yan. Here is my code for the figure:require(som)rpkm <- Tx_rpkmrpkm.f <- filtering(rpkm, lt=10, ut=30000, mmr=2, mmd=10)# rpkm.f=log(rpkm.f+0.1) # t...

Read more »

Random Forest Variable Importance

July 19, 2012
By

Random forests ™ are great. They are one of the best "black-box" supervised learning methods. If you have lots of data and lots of predictor variables, you can do worse than random forests. They can deal with messy, real data. If there are lots of extraneous predictors, it has no problem. It automatically does a good job...

Read more »

A weighting function for ‘nls’ / ‘nlsLM’

July 19, 2012
By
A weighting function for ‘nls’ / ‘nlsLM’

Standard nonlinear regression assumes homoscedastic data, that is, all response values are distributed normally.  In case of heteroscedastic data (i.e. when the variance is dependent on the magnitude of the data), weighting the fit is essential. In nls (or nlsLM of the minpack.lm package), weighting can be conducted by two different methods: 1) by supplying

Read more »

Video: knitr, R Markdown, and R Studio: Introduction to Reproducible Analysis

July 19, 2012
By

This post presents the video of a talk that I presented in July 2012 at Melbourne R Users on using knitr, R Markdown, and R Studio to perform reproducible analysis. I also provide links to a github repository where the R markdown examples can be examin...

Read more »

Universal portfolio, part 8

July 18, 2012
By
Universal portfolio, part 8

We extend the analysis of part 7 by calculating the final wealth for all tuples of 3 and 4 stocks, this is a simple extension but it also shows the most important problem of the Universal portfolio algorithm, its exponential complexity in the number of...

Read more »

Mapping Public Opinion: A Tutorial

July 18, 2012
By
Mapping Public Opinion: A Tutorial

At the upcoming 2012 summer meeting of the Society of Political Methodology, I will be presenting a poster on Isarithmic Maps of Public Opinion. Since last posting on the topic, I have made major improvements to the code and robustness of the modeling approach, and written a tutorial that illustrates the production of such maps. This … Continue reading →

Read more »

Time zones

July 18, 2012
By
Time zones

Say we have some following raw data. It consists of a timestamp and a corresponding value. There is a peak at exactly midnight (00:00:00). Each timestamp is fully specified. It contains a date, a time of day, and a time zone offset indication. In this case +0000, meaning the data is 0 hours away the UTC timezone. "timestamp","value""25-04-2012...

Read more »

Course at Monash (#1)

July 18, 2012
By
Course at Monash (#1)

Here are the slides for the first day of my course at Monash University, Melbourne, in the Special Lectures in Econometrics, with a strong similarity with the slides of my course in Wharton, two years ago. (Be sure to check slide 67! If the update on slideshare works from my flat in Melbourne…) Filed under:

Read more »

Gamification Quantification

July 18, 2012
By

Surveys become engaging when they become games, or at least, take on some of the characteristics of games.  This is the argument made by those advocating the gamification of marketing research [http://researchaccess.com/2011/12/market-researc...

Read more »

bubble plot in R

July 18, 2012
By
bubble plot in R

Motived by the post from FlowingData(http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/), I made this plot with R code below:par(mfrow=c(3,1), mar=c(4,6,4,4))for(ty in c("protein_coding","lincRNA","piRNA")){          ...

Read more »

A closer look at data suggests Johns Hopkins is still the #1 US hospital

July 18, 2012
By
A closer look at data suggests Johns Hopkins is still the #1 US hospital

The US News best hospital 2012-20132 rankings are out. The big news is that Johns Hopkins has lost its throne. For 21 consecutive years Hopkins was ranked #1, but this year Mass General Hospital (MGH) took the top spot displacing Hopkins to #2. Howeve...

Read more »

Preparing public data for analysis with R

July 18, 2012
By
Preparing public data for analysis with R

In most data science applications, preparing the data is at least half the job. Finding where the data lives, figuring out how to access it, finding the right records, filtering, cleaning and transforming the data ... all of this has to be done before the statistical analysis can even begin. Fortunately, the R language has many tools for data...

Read more »

How to track Twitter unfollowers in R

July 18, 2012
By
How to track Twitter unfollowers in R

I have Twitter account and it is relatively easy to see new followers or subscribers. However, I was looking for ways to know who are the unfollowers. I have noticed, that some (un)subscriptions happen in bulks, which made me thinking that either I tweeted some bullshit and upset bunch of people or spam bots work

Read more »

Johns Hopkins Coursera Statistics Courses

July 18, 2012
By

Computing for Data Analysis Data Analysis Mathematical Biostatistics Bootcamp

Read more »

Monitor with R: Moisture in Sunflower Seeds Intact

July 18, 2012
By
Monitor with R: Moisture in Sunflower Seeds Intact

I had the opportunity today to check the performance of a calibration (moisture in intact sunflower seed in reflectance).This is always a exciting moment:  Does the performance of the calibration for the new validation set is as expected duri...

Read more »

Project Euler — problem 15

July 18, 2012
By
Project Euler — problem 15

The 15th problem in Project Euler. Starting in the top left corner of a 22 grid, there are 6 routes (without backtracking) to the bottom right corner. How many routes are there through a 2020 grid? Mmm… walk in the … Continue reading →

Read more »

50 Shades of Grey Wordcloud

July 17, 2012
By
50 Shades of Grey Wordcloud

Sometimes you just want to see what all the fuss is about. File this under the 'because I can' category: I proudly (?) present - a wordcloud produced from the text of E. L. James' "50 Shades of Grey".For a book which is getting all this press about bei...

Read more »

Trends in run scoring, NL edition (more R)

July 17, 2012
By

Last time around I used R to plot the average runs per game for the American League, starting in 1901. Now I’ll do the same for the National League.  I'll save a comparison of the two leagues for my next post.A fundamental principal of programming is that code can be repurposed for different sets of datas. So...

Read more »

Create an R package in under 6 minutes

July 17, 2012
By
Create an R package in under 6 minutes

Storing your favorite R functions is best done by creating your own R package. This is a quick way to get started with two example functions.Enjoy!

Read more »

R for Ecologists: Creating a Site x Species Matrix

July 17, 2012
By
R for Ecologists: Creating a Site x Species Matrix

Today I’m going to adress a fairly common problem in ecology that has been coming up frequently as of late. The issue is how to create a site x species matrix for community composition analysis (i.e. ordination). Moreover, ecologists have to do … Continue reading →

Read more »

Plotting the Frequency of Twitter Hashtag Usage Over Time with R and ggplot2

July 17, 2012
By
Plotting the Frequency of Twitter Hashtag Usage Over Time with R and ggplot2

The 20th annual ISMB meeting was held over the last week in Long Beach, CA. It was an incredible meeting with lots of interesting and relevant talks, and lots of folks were tweeting the conference, usually with at least a few people in each concurrent ...

Read more »

The R packages in a data scientist’s toolbox

July 17, 2012
By

John Myles White, self-described "statistics hacker" and co-author of "Machine Learning for Hackers" was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science: Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use. I primarily program in R,...

Read more »

Hierarchical Cluster Analysis (ChemoSpec) – 01

July 17, 2012
By
Hierarchical Cluster Analysis (ChemoSpec) – 01

I have been in previous post using the ChemoSpec package for some oil data (olive and sunflower). My spectra has now a range from 1100nm to 2200nm and is raw (not treated mathematically) . I want to start using the ChemoSpec package to start using the ...

Read more »

“Computing for Data Analysis” with R on coursera

July 17, 2012
By
“Computing for Data Analysis” with R on coursera

Just stumbled on across a course on coursera titled “Computing for Data Analysis” taught by Roger D. Peng the Johns Hopkins Bloomberg School of Public Health. Here is the description of the course. In this course you will learn how to program in R and how to use R for effective data analysis. You will learn … Continue reading...

Read more »

Criticism 5 of NHST: p-Values Measure Effort, Not Truth

July 17, 2012
By
Criticism 5 of NHST: p-Values Measure Effort, Not Truth

Introduction In the third installment of my series of criticisms of NHST, I focused on the notion that a p-value is nothing more than a one-dimensional representation of a two-dimensional space in which (1) the measured size of an effect and (2) the precision of this measurement have been combined in such a way that

Read more »

Optical Art with R

July 16, 2012
By
Optical Art with R

Last week, in a post entitled Bridget Riley exhibition in London, the author Markus Gesmann wrote an R script reproducing one of Riley's famous art pieces: Movement in Squares.This reminded me of my own first "brush" with Op art. It was in art class ye...

Read more »

Factor Attribution to improve performance of the 1-Month Reversal Strategy

July 16, 2012
By
Factor Attribution to improve performance of the 1-Month Reversal Strategy

Today I want to show how to use Factor Attribution to boost performance of the 1-Month Reversal Strategy. The Short-Term Residual Reversal by D. Blitz, J. Huij, S. Lansdorp, M. Verbeek (2011) paper presents the idea and discusses the results as applied to US stock market since 1929. To improve 1-Month Reversal Strategy performance authors

Read more »

Data mining for network security and intrusion detection

July 16, 2012
By
Data mining for network security and intrusion detection

In preparation for “Haxogreen” hackers summer camp which takes place in Luxembourg, I was exploring network security world. My motivation was to find out how data mining is applicable to network security and intrusion detection. Flame virus, Stuxnet, Duqu proved that static, signature based security systems are not able to detect very advanced, government sponsored

Read more »

Convenient access to Gapminder’s datasets from R

July 16, 2012
By
Convenient access to Gapminder’s datasets from R

In April, Hans Rosling examined the influence of religion on fertility. I used R to replicate a graphic of his talk:> library(datamart) > gm <- gapminder() > #queries(gm) > # > # babies per woman > tmp <- query(gm, "TotalFertilityRate") > babies <- as.vector(tmp) > names(babies) <- names(tmp) > babies <- babies > countries <- names(babies) > # > # income per capita, PPP adjusted > tmp <- query(gm, "IncomePerCapita") >...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series













Contact us if you wish to help support R-bloggers, and place your banner here.