In Praise of Substantive Expertise in Data Science

November 14, 2014
By
In Praise of Substantive Expertise in Data Science

Substantive expertise makes it into the Data Science Venn Diagram from DataCamp's infographic on how to become a data scientist. It's one of the three circles of equal size along with programming and statistics. Regrettably, substantive expertise ...

Read more »

R leaps to #12 in Tiobe language popularity index

November 14, 2014
By
R leaps to #12 in Tiobe language popularity index

The R language has jumped to number 12 in the November 2014 TIOBE Index of programming language popularity. This is R's highest ranking in the history of the TIOBE index, which has been ranking languages since 2003. A high ranking is an impressive achievement for R given that it is a domain-specific language (designed for data science applications), and...

Read more »

Runoff in Uruguay: FA is expected to win a third mandate

November 14, 2014
By
Runoff in Uruguay: FA is expected to win a third mandate

Within 2 weeks, electors in Uruguay will vote for the runoff election between FA and PN. According to the polling data being published, it's very likely Uruguayans will give FA a third mandate. I run the following forecast model which suggest that the difference between the two parties are huge; even greater than the number … Read More...

Read more »

Volatility Risk Premium: Sharpe 2+, Return to Drawdown 3+

November 14, 2014
By
Volatility Risk Premium: Sharpe 2+, Return to Drawdown 3+

First, before starting this post, I’d like to give one last comment about my previous post: I called Vanguard to … Continue reading →

Read more »

What size will you be after you lose weight?

November 14, 2014
By
What size will you be after you lose weight?

REDDITORS’ BEFORE AND AFTER MEASUREMENTS ANALYZED Click to enlarge How many pounds do you need to lose in order to reduce your waistline by one inch? How many kilos do you need to lose to reduce your waistline by one centimeter? We wanted to find out. We were having trouble finding published data (though we The post

Read more »

Dynamic occupancy models in Stan

November 14, 2014
By
Dynamic occupancy models in Stan

Occupancy modeling is possible in Stan as shown here, despite the lack of support for integer parameters. In many Bayesian applications of occupancy modeling, the true occupancy states (0 or 1) are directly modeled, but this can be avoided by marginalizing out the true occupancy state. The Stan manual (pg. 96) gives an example of this kind...

Read more »

Scatter Plot Matrices in R

November 13, 2014
By
Scatter Plot Matrices in R

One of our graduate student ask me on how he can check for correlated variables on his dataset. Using R, his problem can be done is three (3) ways. First, he can use the cor function of the stat package to calculate correlation coefficient between vari...

Read more »

How to become a data scientist in 8 easy steps: the infographic

November 13, 2014
By
How to become a data scientist in 8 easy steps: the infographic

This post was written by the team behind DataCamp, the online interactive learning platform for data science.   After being dubbed “sexiest job of the 21st Century” by Harvard Business Review, data scientists have stirred the interest of the general public. Many people are intrigued by this job, namely because the name has an interesting

Read more »

CALL FOR PRESENTATIONS: EARL CONFERENCE (Effective Applications of the R Language), London, 15-16th September 2015

November 13, 2014
By
CALL FOR PRESENTATIONS: EARL  CONFERENCE (Effective Applications of the R Language), London, 15-16th September 2015

Further to the success of the EARL2014 Conference we are delighted to announce that EARL2015 will be held in London on the 15-16th September 2015. Abstracts are invited for presentations on topics related to the commercial usage and applications of the R Language. Presenters of accepted presentations will be entitled to a free conference pass for the day of their...

Read more »

A look at the igraph package

November 13, 2014
By
A look at the igraph package

by Joseph Rickert The igraph package has become a fundamental tool for the study of graphs and their properties, the manipulation and visualization of graphs and the statistical analysis of networks. To get an idea of just how firmly igraph has become embedded into the R package ecosystem consider that currently igraph lists 72 reverse depends, 59 reverse imports...

Read more »

How to Summarize a 2D Posterior Using a Highest Density Ellipse

November 13, 2014
By
How to Summarize a 2D Posterior Using a Highest Density Ellipse

Making a slight digression from last month’s Probable Points and Credible Intervals here is how to summarize a 2D posterior density using a highest density ellipse. This is a straight forward extension of the highest density interval to the situation where you have a two-dimensional posterior (say, represented as a two column matrix of samples) and you want...

Read more »

Building Blocks: A Compelling Image for Clustering with Nonnegative Matrix Factorization (NMF)

November 12, 2014
By
Building Blocks: A Compelling Image for Clustering with Nonnegative Matrix Factorization (NMF)

Would hierarchical clustering be as popular without the dendrogram? Cannot the same be said of finite mixture modeling with its multidimensional spaces populated by normal distributions? I invite you to move your mouse over the figure on the introducto...

Read more »

In case you missed it: October 2014 Roundup

November 12, 2014
By

In case you missed them, here are some articles from October of particular interest to R users. R hits a new milestone with 6,000 CRAN packages, and R 3.1.2 released. Revolution Analytics announces Revolution R Open, a supported and enhanced downstream distribution of R. (Learn more at the webinar on Wednesday November 12.) Some benchmarks on the performance improvements...

Read more »

Solutions on github

November 12, 2014
By

See this page. We're not done with them all but chapter 3 and 4 are there and the regression chapters are not too far behind. The Rnw files (using knitr LaTeX) are there along with the corresponding pdf files. You may have better solutions than ...

Read more »

AusDM 2014 Conference Program

November 12, 2014
By
AusDM 2014 Conference Program

The Program of AusDM 2014 Conference is now available at http://ausdm14.ausdm.org/program. It features two keynote talks, one on Learning in Sequential Decision Problems by Prof Peter Bartlett from UC Berkeley, and the other on Making Sense of a Random World through … Continue reading →

Read more »

Mobility from Mobile Phones

November 12, 2014
By
Mobility from Mobile Phones

I have worked on big data in my work with QuBit in London. In my research I increasingly find the tools I learnt there to be extremely useful. The keywords are smart data management for big data, such as hadoop and hive for querying just the right set of data to work with. I am

Read more »

Convergence of a Series

November 12, 2014
By
Convergence of a Series

Let us explore using simulation some of the concepts of basic asymptotic theory as presented in Wooldridge 2012, Chapter 3.Definition: A sequence of nonrandom numbers {a_N:N=1,2,...} converges to a if for all epsilon>0 there exists N_epsilon such that N>N_epsilon, then $$|a_N - a|<epsilon$$Paraphrase: A sequence converges on point a, if you can choose any positive number (any epsilon...

Read more »

Copying files with R

November 11, 2014
By
Copying files with R

Following on from my recent experience with deleting files using R, I found myself needing to copy a large number of raster files from a folder on my computer to a USB drive so that I could post them to … Continue reading →

Read more »

Some Thoughts on “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?”

November 11, 2014
By

Sorry for the blogging break. I’ve got a few planned for the next few weeks based on some work I’ve been doing. In the meantime, you should check out “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” by Manuel Fernandez-Delgado at JMLR. They took a large number of classifiers and ran them against...

Read more »

3D Plots with ggplot2 and Plotly

November 11, 2014
By
3D Plots with ggplot2 and Plotly

by Matt Sundquist Plotly, co-founder Plotly is a platform for data analysis, graphing, and collaboration. You can use ggplot2, Plotly's R API, and Plotly's web app to make and share interactive plots. Now, you can you can also make 3D plots. Immediately below are a few examples of 3D plots. In this post we will show how to make...

Read more »

Unknown pleasures

November 11, 2014
By
Unknown pleasures

Have I missed unknown pleasures in Python by focusing on R? A comment on my blog post of last week suggested just that. Reason enough to explore Python a little. Learning another computer language is like learning another human language - it takes time...

Read more »

analyze the national incident-based reporting system (nibrs) with r and monetdb

November 11, 2014
By

in 2012, more than one quarter of the united states population lived in the jurisdiction of a police department that submitted details about every crime to a central repository maintained by the fbi.  a production of the uniform crime reports (ucr) program, the national incident-based reporting system (nibrs) compiles statistics from police agencies in thirty five states...

Read more »

3D-Harmonographs In Motion

November 10, 2014
By
3D-Harmonographs In Motion

I would be delighted to co write a post (Andrew Wyer) One of the best things about writing a blog is that occasionally you get to know very interesting people. Last October 13th I published this post about the harmonograph, a machine driven by pendulums which creates amazing curves. Two days later someone called Andrew Wyer made this comment

Read more »

R, R with Atlas, R with OpenBLAS and Revolution R Open: which is fastest?

November 10, 2014
By

In this short post, I benchmark different “versions” of R. I compare the execution speeds of R, R linked against OpenBLAS, R linked against ATLAS and Revolution R Open. Revolution R Open is a new open source version of R made by Revolution Analytics. It is linked against MKL and should offer huge speed improvements over...

Read more »

Ongoing learning with user groups

November 10, 2014
By

Cross-posted from the Software Carpentry Blog. For the past two years I’ve run the UC Davis R Users’ Group (D-RUG). In this post, I’ll (1) outline the structure of D-RUG, and (2) summarize some lessons learned, and (3) discuss how such users’ groups could act to support and complement SWC’s workshops. Per Bill’s suggestion, we...

Read more »

“LaF”-ing about fixed width formats

November 10, 2014
By
“LaF”-ing about fixed width formats

If you have ever worked with US government data or other large datasets, it is likely you have faced fixed-width format data. This format has no delimiters in it; the data look like strings of characters. A separate format file defines which columns of data represent which variables. It seems as if the format is

Read more »

Rcpp11 3.1.2.0

November 10, 2014
By
Rcpp11 3.1.2.0

Rcpp11 3.1.2.0 was released to CRAN, as the ultimate C++11 companion to R 3.1.2 on which it depends. The NEWS extract follows: # Rcpp11 3.1.2 * New `wrap` implementation for `std::tuple<Args...>` (#195) * `colnames` and `rownames` setters for matrices (#210). * Most sugar functions are now processing the expression in parallel. * Forbidden symbols from the C/R API are no...

Read more »

RSiteCatalyst Version 1.4.1 Release Notes

November 10, 2014
By
RSiteCatalyst Version 1.4.1 Release Notes

Changes Version 1.4.1 of RSiteCatalyst is now available on CRAN. There were a handful of bug fixes and new features added, including: Fixed bug in QueueRanked function where only 10 results were returned when requesting multiple element reports. Function now returns up to 50,000 per breakdown (API limit) Created better error message to inform user

Read more »

Benchmarking Revolution R Open on Linux

November 10, 2014
By
Benchmarking Revolution R Open on Linux

We recently shared some benchmarks for Revolution R Open on the Windows platform, which showed significant improvements compared to R downloaded from CRAN. Those performance gains mainly come from multi-threading: Revolution R Open is linked to the Intel Math Kernel Library, which uses all available cores (rather than just one core) to compute matrix and vector operations in parallel....

Read more »