Visual Complexity

March 10, 2015
By
Visual Complexity

Oh, can it be, the voices calling me, they get lost and out of time (Little Black Submarines, The Black Keys) Last October I did this experiment about complex domain coloring. Since I like giving my posts a touch of randomness, I have done this experiment. I plot four random functions on the form p1(x)*p2(x)/p3(x) where pi(x) are polynomials … Continue reading...

Read more »

Introducing a Wishlist for Scientific R Packages

March 10, 2015
By

There are two things that make R such a wonderful programming environment - the vast number of packages to access, process and interpret data, and the enthusiastic individuals and subcommunities (of which rOpenSci is a great example). One, of course, flows from the other: R programmers write R packages to provide language users with more features, which makes everyone's jobs easier...

Read more »

Basics of Lists

March 9, 2015
By

Lists are a data type in R that are perhaps a bit daunting at first, but soon become amazingly useful. They are especially wonderful once you combine them with the powers of the apply() functions. This post will be part 1 of a two-part series on the u...

Read more »

The frequentist case against the significance test, part 2

March 9, 2015
By
The frequentist case against the significance test, part 2

The significance test is perhaps the most used statistical procedure in the world, though has never been without its detractors. This is the second of two posts exploring Neyman's frequentist arguments against the significance test; if you have not read Part 1, you should do so before continuing (“The frequentist case against the significance test, part 1”).Neyman...

Read more »

New R Package – ipapi (IP/Domain Geolocation)

March 9, 2015
By

I noticed that the @rOpenSci folks had an interface to ip-api.com on their ToDo list so I whipped up a small R package to fill said gap. Their IP Geolocation API will take an IPv4, IPv6 or FQDN and kick back a ASN, lat/lon, address and more. The ipapi package exposes one function – geolocate

Read more »

R 3.1.3 is released (+ easy upgrading for Windows users with the installr package)

March 9, 2015
By
R 3.1.3 is released (+ easy upgrading for Windows users with the installr package)

R 3.1.3 (codename “Smooth Sidewalk”) was released today. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below. Upgrading to R 3.1.3 on Windows If you are using Windows you can easily upgrade to the latest version of R using the installr package. … Continue reading...

Read more »

What’s the Point of an API?

March 9, 2015
By
What’s the Point of an API?

Trying to clear my head of code on a dog walk after a couple of days tinkering with the nomis API and I started to ponder what an API is good for. Chris Gutteridge and Alex Duttion’s open data excuses bingo card and Owen Boswarva’s Open Data Publishing Decision Tree both suggest that not having

Read more »

Introduction to my New IKReporting Package

March 9, 2015
By
Introduction to my New IKReporting Package

This post will introduce my up and coming IKReporting package, and functions that compute and plot rolling returns, which are … Continue reading →

Read more »

R 3.1.3 now available

March 9, 2015
By

R 3.1.3, the final update in the R 3.1 series, has been released. As of this writing only the source distribution is currently available, but expect binary builds for Windows, Mac and various Linux platforms to appear soon on your local CRAN mirror. As has become usual in March, this release is primarily for minor bugs and improvements in...

Read more »

Econometrics Sim – 1: Endogeneity

March 9, 2015
By
Econometrics Sim – 1: Endogeneity

Introduction This is the first post in a series devoted to explaining basic econometric concepts using R simulations. The topic in this post is endogeneity, which can severely bias regression estimates. I will specifically simulate endogeneity caused by an omitted variable. In future posts in this series, I’ll simulate other specification issues such as heteroskedasticity, multicollinearity, and collider … Continue reading...

Read more »

Going deeper with dplyr: New features in 0.3 and 0.4 (video tutorial)

March 8, 2015
By

In August 2014, I created a 40-minute video tutorial introducing the key functionality of the dplyr package in R. dplyr continues to be my "go-to" package for data exploration and manipulation because of its intuitive syntax, blazing fast performance, ...

Read more »

Sparse Quadratic Programming with Ipoptr

March 8, 2015
By
Sparse Quadratic Programming with Ipoptr

This post is a follow up to my last post on quadratic programming facilities in R. A commenter pointed me to the ipoptr project which exposes an R interface to the COIN-OR optimization routine Ipopt. COIN-OR is a suite of optimization utilities implemented in C++ and supported by a back-end of configurable FORTRAN linear...

Read more »

Some More Results on the Theory of Statistical Learning

March 8, 2015
By
Some More Results on the Theory of Statistical Learning

Yesterday, I did mention a popular graph discussed when studying theoretical foundations of statistical learning. But there is usually another one, which is the following, Let us get back to the underlying formulas. On the traning sample, we have some empirical risk, defined as for some loss function . Why is it complicated ? From the law of large...

Read more »

SAS PROC MCMC example in R: Nonlinear Poisson Regression Multilevel Random-Effects Model

March 8, 2015
By

I am slowly working my way through the PROC MCMCexamples. Regarding these data, the SAS manual says: 'This example uses the pump failure data of Gaver and O’Muircheartaigh (1987) to illustrate how to fit a multilevel random-effects model with PROC MCMC. The number of failures and the time of operation ...

Read more »

Some Intuition About the Theory of Statistical Learning

March 7, 2015
By
Some Intuition About the Theory of Statistical Learning

While I was working on the Theory of Statistical Learning, and the concept of consistency, I found the following popular graph (e.g. from  thoses slides, here in French) The curve below is the error on the training sample, as a function of the size of the training sample. Above, it is the error on a validation sample. Our learning...

Read more »

Weight-Length & Condition Chapters Updated

March 7, 2015
By
Weight-Length & Condition Chapters Updated

Thanks to a couple of great reviews, I have updated the Weight-Length and Condition chapters of the forthcoming Introduction to Fisheries Analysis with R book. The suggestions resulted in some changes to the FSA package so you may want to … Continue reading →

Read more »

Streamgraph package now supports continuous x axis scale

March 7, 2015
By

A post on StackOverflow asked about using a continuous variable for the x-axis (vs dates) in my streamgraph package. While I provided a workaround for the question, it helped me bump up the priority for adding support for continuous x axis scales. With the DBIR halfway behind me now, I kicked out a new rev

Read more »

Alternative measure to compare difference between performance between two interventions

March 7, 2015
By

Dear all,Click on the github site to see my new post. It is about a new alternative measure to compare difference between performance between two interventions.Any comments, criticisms are welcomeDr Suman Kumar Pramanik(@sumankumarpram1)

Read more »

CRAN download statistics of any packages #rstats

March 7, 2015
By
CRAN download statistics of any packages #rstats

Hadley Wickham announced at Twitter that RStudio now provides CRAN package download logs. I was wondering about the download numbers of my package and wrote some code to extract that information from the logs… The first code snippet is taken from the log website itself: Then I downloaded all files into a folder: Unzipping did

Read more »

The R documentation is bad

March 6, 2015
By

I have been using R for some time now and still can find it frustrating to work with. Over the years have come to the conclusion that it is primarily due to the documentation being bad. I offer no actual solutions here, but thought I would try and write down exactly what I dislike about it.The docs are more...

Read more »

Why the Ban on P-Values? And What Now?

March 6, 2015
By
Why the Ban on P-Values? And What Now?

Just recently, the editors of the academic journal Basic and Applied Social Psychology have decided to ban p-values: that’s right, the nexus for inferential decision making… gone! This has created quite a

Read more »

Motor Vehicle Collision Density in NYC

March 6, 2015
By
Motor Vehicle Collision Density in NYC

In a previous post, I visualized crime density in Boston using R’s ggmap package. In this post, I use ggmap to visualize vehicle accidents in New York City. R code and data are included in this post. The data comes from NYC Open Data. My data cut ranges from 2012 to 2015. The data tracks the type of … Continue reading...

Read more »

How to Use R for Connecting Neo4j and Tableau (A Recommendation Use Case)

March 6, 2015
By
How to Use R for Connecting Neo4j and Tableau (A Recommendation Use Case)

Introduction Year is just a little bit more than two months old and we got the good news from Tableau - beta testing for version 9.0 started. But it looks like that one of my most favored features didn’t manage...

Read more »

Getting Data From An Online Source

March 6, 2015
By

Getting Data From One Online SourceRobert NorbergHello world. It’s been a long time since I posted anything here on my blog. I’ve been busy getting my Masters degree in statistical computing and I haven’t had much free time to blog. But I’ve writing R code as much as ever. Now, with graduation approaching, I’m job hunting and I thought it would...

Read more »

Text bashing in R for SQL

March 6, 2015
By

Fairly often, a coworker who is strong in Excel, but weak in writing code will come to me for help in special details about customers in their datasets. Sometimes the reason is to call, email, or snail mail a survey, … Continue reading →

Read more »

Generating an academic CV with R and YAML

March 6, 2015
By
Generating an academic CV with R and YAML

Follow Email For the past couple years, I’ve been using Kieran Healy’s lovely template for my academic CV. Kieran’s code is a customised *.tex file which, of course, has the virtue of simplicity. All a person needs to do is update it with glorious achievements from time to time and re-compile; this is exactly what

Read more »

Welcome to the Hadleyverse

March 6, 2015
By

It's fair to say that Hadley Wickham, chief scientist at RStudio and a new member of the R Foundation, has made great contributions to the R community. Not only is he the author of several R-related books including Advanced R, Hadley is also the author of dozens of R packages which have transformed the way that data scientists work...

Read more »

Visualising a Classification in High Dimension

March 6, 2015
By
Visualising a Classification in High Dimension

So far, when discussing classification, we’ve been playing on my toy-dataset (actually, I should no claim it’s mine, it is inspired by the one used in the introduction of Boosting, by Robert Schapire and Yoav Freund). But in ral life, there are more observations, and more explanatory variables.With more than two explanatory variables, it starts to be more complicated...

Read more »

Rcpp 0.11.5

March 6, 2015
By

The new release 0.11.5 of Rcpp just reached the CRAN network for GNU R, and a Debian package has also been be uploaded. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 345 packages on CRAN depend on Rcpp for making analyses go faster...

Read more »