## A Stata HTML syntax highlighter in R

August 12, 2013
By

So I have been having difficulty getting my Stata code to look the way I want it to look when I post it to my blog.  To alleviate this condition I have written a html encoder in R.  I don't know much about html so it is likely to be a little ...

## A beginner’s video introduction to R, from Google

August 12, 2013
By

If you're an absolute beginner to the R language, this Intro to R video series from Google Developers is a great place to get started. Just download R for your system, start the playlist below, and follow along with the on-screen examples. (The video uses the MacOS X version of R, but you should be able to follow along...

## Short tales of two NCAA basketball conferences (Big 12 and West Coast) using graphs

August 12, 2013
By

Having been at the University of Kansas (Kansas Jayhawks) as a student and now working at Gonzaga University (Gonzaga Bulldogs), discussions about college basketball are inescapable. This post uses R, ggmap, ggplot2 and the shiny server to graphically ...

## Variable importance in neural networks

August 12, 2013
By

If you’re a regular reader of my blog you’ll know that I’ve spent some time dabbling with neural networks. As I explained here, I’ve used neural networks in my own research to develop inference into causation. Neural networks fall under two general categories that describe their intended use. Supervised neural networks (e.g., multilayer feed-forward networks)

## Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

$Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R$

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from

## Identifying Potential Customers with Classification Techniques in R Language

August 12, 2013
By

Data mining techniques and algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine, Random Forest, and Logistic Regression are “most commonly used for predicting a specific outcome such as response / no-response, high / medium / low-value customer, likely to buy / not buy.”1 In this article, we will demonstrate how to use R

## Time Series Decomposition

August 12, 2013
By

In the last post on the changepoint package, I concluded with a brief example of time series decomposition with the "decompose" command.  After further reading, I discovered the "stl" command, which to me appears a superior method.  STL stand...

## analyze the national plan and provider enumeration system (nppes) with r and monetdb

August 12, 2013
By

the national plan and provider enumeration system (nppes) contains information about every provider, insurance plan, and clearinghouse actively operating in the united states healthcare industry.  did i just see the ears of all the health workforce researchers in the room perk up?  it's freely downloadable, courtesy of the department of health and human services' implementation of the...

## Some belated spring cleaning

August 11, 2013
By

A very busy spring has transitioned into a very busy summer, so let me recap a few topics that probably deserve more time than I’ll give them here. Here are the things I’m overdue on, in no particular order: Publications In the March edition of the Journal of Risk, Kris Boudt, Brian Peterson and I

## Twitter Movie Review – Chennai Express

August 11, 2013
By

In the spirit of my first post (Pappu Vs. Feku) I will continue to explore the use of Twitter in providing an eye into the events of contemporary interest, and movies are certainly something that capture interest of a large majority of Indian audience. So I am looking at Chennai Express that released last week... Read More ...

## Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part three.

August 11, 2013
By

In part one and part two of Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model I developed a model for the number of goals in football matches from five seasons of La Liga, the premier Spanish football league. I’m now reasonably happy with the model and want to use it to rank...

## Software carpentry

August 11, 2013
By

I would never call myself a programmer, but as an ecologists I manage moderately big and complicated datasets, and that require to interact with my computer to get the most of them. I self-taught most of the things I need … Continue reading →

## Finding Correlations in Data with Uncertainty

August 11, 2013
By

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when

## Enhanced meboot package, simulating regression standard errors

August 11, 2013
By

In my June 25 post I described R- (i) code to change scale without changing the mean, and (ii) code to make a probability distribution symmetric by modifying order statistics.  Both are commonly encountered problems by R programmers.  My coauthor Javier Lopez-de-Lacalle of Spain has incorporated an efficient version of my code inside the maximum entropy bootstrap (meboot) package in R See the package...

## XML in R – A (German) tutorial / XML in R – ein Tutorial auf Deutsch

August 10, 2013
By

I used knitr to hack together a very short tutorial about XML in R.It's in German. And it's not very long. But, hey, it's free :)I hope it can be of help to someone who wants to get started with XML processing in R.Please feel free to post or send any ...

## Pappu Vs. Feku – Twitter Wars

August 10, 2013
By

In my quest to practice R and learn text mining, I am looking at one of the popular Twitter Wars between two political personalities of India who are fondly known in the TwitterVerse as ‘Pappu’ and ‘Feku’ which is basically their ‘ghar ka naam’ or ‘pyar wala naam’. Anyway, the discussion about the origin of the... Read More ...

## In case you missed it: July 2013 Roundup

August 9, 2013
By

In case you missed them, here are some articles from July of particular interest to R users: A new 90-second, creative commons video helps R enthusiasts share the history, community and applications of R. Analyst group Butler Analytics reviews 10 predictive analytics platforms, and says that "real analysts use R". An excellent example of Simpsons Paradox: US median wages...

## PIMCO Rolling Correlation, d3, R, gridSVG, lattice | Gets An Axis

August 9, 2013
By

Where else will you hear Pimco, rolling correlation, R, gridSVG, lattice, and d3 all in one post?  Let’s mix them all together to see what might happen.  For those here for the geekery, we will add a d3 axis for our y and it will follow the mouse.  For those who care nothing about d3 and R, you might...

## Approximate string matching in R

August 9, 2013
By

I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading →

## R-Squared for a VBGM

August 9, 2013
By
$R-Squared for a VBGM$

Recently, a fishR user asked me the following question: After fitting the age-length data into VBGM, I overviewed the results. But I can’t find the coefficient of determination () for the VBGM fitting. Because some reviewer want the the coefficient … Continue reading →

## inline 0.3.13

August 9, 2013
By

A minor maintenance release of inline is now on CRAN, and has just been already included in Debian. This release contains a patch kindly contributed by Mikhail Umorin which fixes the of \code{cfunction} with lists of signatures and function bodies. ...

## Data Scientists and Statisticians: Can’t We All Just Get Along

August 9, 2013
By

It seems that the title “data science” has taken the world by storm.  It’s a title that conjures up almost mystical abilities of a person garnering information from oceans of data with ease.  It’s where a data scientist can wave his or her hand like a Jedi Knight and simply tell the data what it

## Google Developers R Programming Video Lectures

August 8, 2013
By

I got this Google Developers R Programming Video Lectures from Stephen's blog - Getting Genetics Done.Very useful R tutorial for beginner! Short and efficient. Here is what I learned after watching the lectures:4.3 - Add a Warning or Stop the Func...

## R/gridSVG/d3 Line Reverse Data Bind

August 8, 2013
By

I veer from finance to tech, so let’s use some data from FRED/OECD this time.  I do not think I need to comment much on what has happened to New Car Registrations in Greece. Reverse data binding a line plot from ggplot2 or lattice is slightly more difficult than what we saw in the last post I Want...

## Absolute Deviation Around the Median

August 8, 2013
By

Median Absolute Deviation (MAD) or Absolute Deviation Around the Median as stated in the title, is a robust measure of central tendency. Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions. Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. This

## R on Your iPhone (Not the Way You Think)

August 8, 2013
By

If you really love R, you should put it on your iPhone.  Apple gives the measurements for its products here. Let's use a little grid magic with ggplot2 to make a chart for the back of your iphone similar to this. require(grid)require(ggplot2)# thanks for the Apple measurements# https://developer.apple.com/resources/cases/x11( height = as.numeric(convertX(unit(58.55, "mm"), "in")),...

## Bio7 1.7 for Windows Released!

August 8, 2013
By

07.08.2013 A new Windows version of Bio7 is available. This version comes with a lot of new features and improvements for Java, R and ImageJ. One highlight is that you can now interpret Jython (Python) code with Bio7. In addition a new console implementation is available which offers access to a native shell, different Java

## Summarize content of a vector or data.frame every n entries

August 8, 2013
By

I imagine that the same result can be achieved by a proper use of quantile, but I like to have an easy way to obtain summary statistics every n entries of my dataset be it a vector or data.frame. The function takes three parameters: the R object on which we need to obtain statistics (x),

## Data Science MD July Recap: Python and R Meetup

August 8, 2013
By

For July’s meetup, Data Science MD was honored to have Jonathan Street of NIH and Brian Godsey of RedOwl Analytics come discuss using Python and R for data analysis. Jonathan started off by describing the growing ecosystem of Python data … Continue reading → The post Data Science MD July Recap: Python and R Meetup appeared first on...