## Classi-Compare of Raster Satellite Images – Before and After

August 13, 2013
For my research on the effect of power outages on fertility , we study a period of extensive power rationing that lasted for almost a whole year and affected most of Latin America, but in particular, it affected Colombia. The key difficult was to determine which areas were exposed to the power-outage and the extent to

## Finding Correlations in Data with Uncertainty: Classical Solution

August 13, 2013
Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance. Let’s look at the synthetic data first. This is

## When Discussing Confidence Level With Others…

August 13, 2013
This post spawned from a discussion I had the other day. Confidence intervals are notoriously a difficult topic for those unfamiliar with statistics. I can’t really think of another statistical topic that is so widely published in newspaper articles, television, and elsewhere that so few people really understand. It’s been this way since the moment

## Genetic drift simulation

August 13, 2013
While preparing for the new teaching semester I have created an implementation of NetLogo GenDrift P local in GNU R.The model works as follows. Initially a square grid having side size is randomly populated with n types of agen...

## Free R Graphics Workshop, Copenhagen, Denmark, 26th August

August 13, 2013
Mango Solutions are pleased to announce a free R Graphics Workshop in Copenhagen on Monday 26th August (6-8pm). The workshop is open to all and any interested R users or those wishing to learn more about R.   The workshop will focus on using R to create powerful graphics, specifically covering: •             An introduction to R •             Getting data into...

## Reverse IP Address Lookups With R (From Simple To Bulk/Asynchronous)

August 12, 2013
R lacks some of the more “utilitarian” features found in other scripting languages that were/are more geared—at least initially—towards systems administration. One of the most frustrating missing pieces for security data scientists is the lack of ability to perform basic IP address manipulations, including reverse DNS resolution (even though it has nsl() which is just

## A Stata HTML syntax highlighter in R

August 12, 2013
So I have been having difficulty getting my Stata code to look the way I want it to look when I post it to my blog.  To alleviate this condition I have written a html encoder in R.  I don't know much about html so it is likely to be a little ...

## A beginner’s video introduction to R, from Google

August 12, 2013
If you're an absolute beginner to the R language, this Intro to R video series from Google Developers is a great place to get started. Just download R for your system, start the playlist below, and follow along with the on-screen examples. (The video uses the MacOS X version of R, but you should be able to follow along...

## Short tales of two NCAA basketball conferences (Big 12 and West Coast) using graphs

August 12, 2013
Having been at the University of Kansas (Kansas Jayhawks) as a student and now working at Gonzaga University (Gonzaga Bulldogs), discussions about college basketball are inescapable. This post uses R, ggmap, ggplot2 and the shiny server to graphically ...

## Variable importance in neural networks

August 12, 2013
If you’re a regular reader of my blog you’ll know that I’ve spent some time dabbling with neural networks. As I explained here, I’ve used neural networks in my own research to develop inference into causation. Neural networks fall under two general categories that describe their intended use. Supervised neural networks (e.g., multilayer feed-forward networks)

## Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

$Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R$

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from

## Identifying Potential Customers with Classification Techniques in R Language

August 12, 2013
Data mining techniques and algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine, Random Forest, and Logistic Regression are “most commonly used for predicting a specific outcome such as response / no-response, high / medium / low-value customer, likely to buy / not buy.”1 In this article, we will demonstrate how to use R

## Time Series Decomposition

August 12, 2013
In the last post on the changepoint package, I concluded with a brief example of time series decomposition with the "decompose" command.  After further reading, I discovered the "stl" command, which to me appears a superior method.  STL stand...

## analyze the national plan and provider enumeration system (nppes) with r and monetdb

August 12, 2013
the national plan and provider enumeration system (nppes) contains information about every provider, insurance plan, and clearinghouse actively operating in the united states healthcare industry.  did i just see the ears of all the health workforce researchers in the room perk up?  it's freely downloadable, courtesy of the department of health and human services' implementation of the...

## Some belated spring cleaning

August 11, 2013
A very busy spring has transitioned into a very busy summer, so let me recap a few topics that probably deserve more time than I’ll give them here. Here are the things I’m overdue on, in no particular order: Publications In the March edition of the Journal of Risk, Kris Boudt, Brian Peterson and I

## Twitter Movie Review – Chennai Express

August 11, 2013
In the spirit of my first post (Pappu Vs. Feku) I will continue to explore the use of Twitter in providing an eye into the events of contemporary interest, and movies are certainly something that capture interest of a large majority of Indian audience. So I am looking at Chennai Express that released last week... Read More ...

## Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part three.

August 11, 2013
In part one and part two of Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model I developed a model for the number of goals in football matches from five seasons of La Liga, the premier Spanish football league. I’m now reasonably happy with the model and want to use it to rank...

## Software carpentry

August 11, 2013
I would never call myself a programmer, but as an ecologists I manage moderately big and complicated datasets, and that require to interact with my computer to get the most of them. I self-taught most of the things I need … Continue reading →

## Finding Correlations in Data with Uncertainty

August 11, 2013
A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when

## Enhanced meboot package, simulating regression standard errors

August 11, 2013
In my June 25 post I described R- (i) code to change scale without changing the mean, and (ii) code to make a probability distribution symmetric by modifying order statistics.  Both are commonly encountered problems by R programmers.  My coauthor Javier Lopez-de-Lacalle of Spain has incorporated an efficient version of my code inside the maximum entropy bootstrap (meboot) package in R See the package...

## XML in R – A (German) tutorial / XML in R – ein Tutorial auf Deutsch

August 10, 2013
I used knitr to hack together a very short tutorial about XML in R.It's in German. And it's not very long. But, hey, it's free :)I hope it can be of help to someone who wants to get started with XML processing in R.Please feel free to post or send any ...

## Pappu Vs. Feku – Twitter Wars

August 10, 2013
In my quest to practice R and learn text mining, I am looking at one of the popular Twitter Wars between two political personalities of India who are fondly known in the TwitterVerse as ‘Pappu’ and ‘Feku’ which is basically their ‘ghar ka naam’ or ‘pyar wala naam’. Anyway, the discussion about the origin of the... Read More ...

## In case you missed it: July 2013 Roundup

August 9, 2013
In case you missed them, here are some articles from July of particular interest to R users: A new 90-second, creative commons video helps R enthusiasts share the history, community and applications of R. Analyst group Butler Analytics reviews 10 predictive analytics platforms, and says that "real analysts use R". An excellent example of Simpsons Paradox: US median wages...

## PIMCO Rolling Correlation, d3, R, gridSVG, lattice | Gets An Axis

August 9, 2013
Where else will you hear Pimco, rolling correlation, R, gridSVG, lattice, and d3 all in one post?  Let’s mix them all together to see what might happen.  For those here for the geekery, we will add a d3 axis for our y and it will follow the mouse.  For those who care nothing about d3 and R, you might...

## Approximate string matching in R

August 9, 2013
I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading →

## R-Squared for a VBGM

August 9, 2013
$R-Squared for a VBGM$

Recently, a fishR user asked me the following question: After fitting the age-length data into VBGM, I overviewed the results. But I can’t find the coefficient of determination () for the VBGM fitting. Because some reviewer want the the coefficient … Continue reading →

## inline 0.3.13

August 9, 2013
A minor maintenance release of inline is now on CRAN, and has just been already included in Debian. This release contains a patch kindly contributed by Mikhail Umorin which fixes the of \code{cfunction} with lists of signatures and function bodies. ...

## Data Scientists and Statisticians: Can’t We All Just Get Along

August 9, 2013
It seems that the title “data science” has taken the world by storm.  It’s a title that conjures up almost mystical abilities of a person garnering information from oceans of data with ease.  It’s where a data scientist can wave his or her hand like a Jedi Knight and simply tell the data what it

## Google Developers R Programming Video Lectures

August 8, 2013
I got this Google Developers R Programming Video Lectures from Stephen's blog - Getting Genetics Done.Very useful R tutorial for beginner! Short and efficient. Here is what I learned after watching the lectures:4.3 - Add a Warning or Stop the Func...