Using ARPACK to compute the largest eigenvalue of a matrix

February 7, 2013
By

Thanks to Gábor Csárdi, author of the R interface to ARPACK, for this example of using (the R/Igraph interface to) arpack for finding the largest eigenvalue of a matrix. The key insight is that arpack solves the function passed to … Continue reading →

Read more »

Creating ‘Tags’ For PPC Keywords

February 7, 2013
By
Creating ‘Tags’ For PPC Keywords

When performing search engine marketing, it is usually beneficial to construct a system for making sense of keywords and their performance. While one could construct Bayesian Belief Networks to model the process of consumers clicking on ads, I have found that using ’tags’ to categorize keywords is just as useful for conducting post-hoc analysis on the effectiveness of marketing

Read more »

Data analysis class

February 7, 2013
By
Data analysis class

I've been writing software to help others do data analysis for a number of years and at the same time trying to work up my nerve to try my own analysis. Why let other people have all the fun? So, when I saw that Jeffrey Leek, biostatistician at Johns Hopkins and coauthor of Simply Statistics, was teaching...

Read more »

R Bootcamp @ Sector67 in Madison

February 6, 2013
By

I am pleased to announce that together with Justin Meyer (also from the Wisconsin Department of Public Instruction) I will be presenting a two hour version of the R Bootcamp. Sector67 is a collaborative maker/hacker space in Madison, and is a great ven...

Read more »

Ryan Peek on using xts and ggplot for time-series data

February 6, 2013
By
Ryan Peek on using xts and ggplot for time-series data

At Davis R Users’ Group today, Ryan Peek gave a presentation on how he takes data from his field instruments and visualizes it in R. Here are his notes. The original *.Rmd file and data can be found here SHORT HOW-TO ON USING XTS AND GGPLOT FOR TIME SERIES DATA XTS is a very helpful package...

Read more »

Set operations on more than two sets in R

February 6, 2013
By

ProblemSet operations are a common place thing to do in R, and the enabling functions in the base stats package are:intersect(x, y)union(x, y)setdiff(x, y)setequal(x, y)That said, you'll note that each ONLY takes two arguments - i.e. set X and set Y - ...

Read more »

Modelling memory and news trajectories

February 6, 2013
By
Modelling memory and news trajectories

Modelling memory In the text below I present two models I've made to quantify and visualise the diverging trajectories of memory and news events, and conclude that linear regression may be used to test which model best describes the story. First, though, I contextualise this with an illustration from the...

Read more »

Oracle R Enterprise 1.3 gives predictive analytics an in-database performance boost

February 6, 2013
By
Oracle R Enterprise 1.3 gives predictive analytics an in-database performance boost

Recently released Oracle R Enterprise 1.3 adds packages to R that enable even more in-database analytics. These packages provide horizontal, commonly used techniques that are blazingly fast in-database for large data. With Oracle R Enterprise 1.3, Oracle makes R even better and usable in enterprise settings. (You can download ORE 1.3 here and documentation here.) ...

Read more »

Make building R packages easier with devtools

February 6, 2013
By

If you're writing any significant amount of R code, you might want to start think about bundling it up into packages. An R package combines functions, data, documentation and unit tests, and is a convenient and reliable system to manage and version collections of R content that could otherwise become unwieldy. And if you want to share your code...

Read more »

The new Stan 1.1.1, featuring Gaussian processes!

February 6, 2013
By
The new Stan 1.1.1, featuring Gaussian processes!

We just released Stan 1.1.1 and RStan 1.1.1 As usual, you can find download and install instructions at: http://mc-stan.org/ This is a patch release and is fully backward compatible with Stan and RStan 1.1.0. The main thing you should notice is that the multivariate models should be much faster and all the bugs reported for The post The...

Read more »

xts object – subscript out of bounds

February 6, 2013
By
xts object – subscript out of bounds

I bet you have seen this error a few times. When I compare large xts objects with different number of observations it would hit me right towards the end of the analysis.I wrote a small function, which allows me to check the length of the sets in advanc...

Read more »

A book about Rcpp

February 5, 2013
By

Some little birds had already been whispering about it, but I didn't want to jinx it and told myself I would wait with an announcement until the booksellers have (at least) placeholder pages. And as I learned from Duncan Murdoch via email earlier toda...

Read more »

counts numbers in a interval

February 5, 2013
By

Say I have a list of values, and I cut them by some break points, how do I know the number of values in each interval?We know cut() function in R works for the purpose.  For example,tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)x <- rep(0:8, tx0)> ...

Read more »

Learning RStudio for R Statistical Computing

February 5, 2013
By
Learning RStudio for R Statistical Computing

"Learning RStudio for R Statistical Computing" will teach you how to quickly and efficiently create and manage statistical analysis projects, import data, develop R scripts, and generate reports and graphics. R developers will learn about package development, coding principles, and … Continue reading →

Read more »

Dallas R Users: Learn Shiny this Saturday, 2/9

February 5, 2013
By

Just a heads-up for any R users in the Dallas/Fort Worth Metroplex: I’ll be presenting at the Dallas R Users Group this Saturday, 2/9/2013 at 10:00AM at the University of Dallas (1845 East Northgate Drive, Irving, TX). I’ll be talking about how to use RStudio’s new Shiny framework to create R-powered web applications. For the

Read more »

Collinearity and stepwise VIF selection

February 5, 2013
By
Collinearity and stepwise VIF selection

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

Read more »

Learn about R through data mining

February 5, 2013
By
Learn about R through data mining

If you're in San Francisco for this week's DeveloperWeek conference, our own Joe Rickert will also giving a presentation on Wednesday at 2:10PM on Predictive Modeling with Big Data in R which will feature several demos of data mining massive data sets using the Revolution R Enterprise. Incidentally, the whole team Revolution Analytics was proud to receive the Top...

Read more »

Natura non facit saltus

February 5, 2013
By
Natura non facit saltus

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see, this week in class, several smoothing techniques, for insurance ratemaking. As a starting point, assume that we do not want to use segmentation techniques: everyone will pay exactly the same price. no segmentation of the premium And that price should be related to...

Read more »

Relearn boxplot and label the outliers

February 5, 2013
By
Relearn boxplot and label the outliers

Despite the fact that box plot is used almost every where and taught at undergraduate statistic classes, I recently had to re-learn the box plot in order to know how to label the outliers.This stackoverflow post was where I found how...

Read more »

New Rcpp page on upcoming events — including Master Class in New York

February 5, 2013
By

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-lon...

Read more »

MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By
MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

Read more »

2011 Census Open Atlas Project

February 5, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for

Read more »

Tables from R into Word

February 5, 2013
By
Tables from R into Word

A good looking table matters! This tutorial is on how to create a neat table in Word by combining knitr and R Markdown. I'll be using my own function, htmlTable, from the Gmisc package. Background: Because most journals that I submit to want...

Read more »

Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
By
Proposed techniques for communicating the amount of information contained in a statistical result

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

Read more »

2011 Census Open Atlas Project

February 5, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

Read more »

Next Kölner R User Meeting: 6 February 2013

February 5, 2013
By
Next Kölner R User Meeting: 6 February 2013

Quick reminder: The next Cologne R user group meeting is scheduled for tomorrow, 6 February 2013. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to...

Read more »

Tracking Number of Historical Clusters in DOW 30 and S&P 500

February 4, 2013
By
Tracking Number of Historical Clusters in DOW 30 and S&P 500

In the Tracking Number of Historical Clusters post, I looked at how 3 different methods were able to identify clusters across the 10 major asset universe. Today, I want to share the impact of clustering on the larger universe. Below I examined the historical time series of number of clusters in the DOW 30 and

Read more »

Visualizing networks in R: arc diagrams and hive plots

February 4, 2013
By
Visualizing networks in R: arc diagrams and hive plots

Arc diagrams are an alternate way of representing two-dimensional graphs. Rather than scattering the nodes across the page connected by straight edges, you can instead arrange the nodes along a one-dimensional axis, and replace the straight edges with arcs between the nodes. While an arc diagram might not give as good a sense of the connections between the nodes...

Read more »

2011 Census Open Atlas Project

February 4, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

Read more »

Sponsors