So I have been having difficulty getting my Stata code to look the way I want it to look when I post it to my blog. To alleviate this condition I have written a html encoder in R. I don't know much about html so it is likely to be a little ...

So I have been having difficulty getting my Stata code to look the way I want it to look when I post it to my blog. To alleviate this condition I have written a html encoder in R. I don't know much about html so it is likely to be a little ...

If you're an absolute beginner to the R language, this Intro to R video series from Google Developers is a great place to get started. Just download R for your system, start the playlist below, and follow along with the on-screen examples. (The video uses the MacOS X version of R, but you should be able to follow along...

If you’re a regular reader of my blog you’ll know that I’ve spent some time dabbling with neural networks. As I explained here, I’ve used neural networks in my own research to develop inference into causation. Neural networks fall under two general categories that describe their intended use. Supervised neural networks (e.g., multilayer feed-forward networks)

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series. I will define and calculate the 5-number summary in 2 different ways that are commonly used in R. (It turns out that different methods arise from

Data mining techniques and algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine, Random Forest, and Logistic Regression are “most commonly used for predicting a specific outcome such as response / no-response, high / medium / low-value customer, likely to buy / not buy.”1 In this article, we will demonstrate how to use R

the national plan and provider enumeration system (nppes) contains information about every provider, insurance plan, and clearinghouse actively operating in the united states healthcare industry. did i just see the ears of all the health workforce researchers in the room perk up? it's freely downloadable, courtesy of the department of health and human services' implementation of the...

In the spirit of my first post (Pappu Vs. Feku) I will continue to explore the use of Twitter in providing an eye into the events of contemporary interest, and movies are certainly something that capture interest of a large majority of Indian audience. So I am looking at Chennai Express that released last week... Read More ...

In part one and part two of Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model I developed a model for the number of goals in football matches from five seasons of La Liga, the premier Spanish football league. I’m now reasonably happy with the model and want to use it to rank...

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when

In my June 25 post I described R- (i) code to change scale without changing the mean, and (ii) code to make a probability distribution symmetric by modifying order statistics. Both are commonly encountered problems by R programmers. My coauthor Javier Lopez-de-Lacalle of Spain has incorporated an efficient version of my code inside the maximum entropy bootstrap (meboot) package in R See the package...

I used knitr to hack together a very short tutorial about XML in R.It's in German. And it's not very long. But, hey, it's free :)I hope it can be of help to someone who wants to get started with XML processing in R.Please feel free to post or send any ...

In my quest to practice R and learn text mining, I am looking at one of the popular Twitter Wars between two political personalities of India who are fondly known in the TwitterVerse as ‘Pappu’ and ‘Feku’ which is basically their ‘ghar ka naam’ or ‘pyar wala naam’. Anyway, the discussion about the origin of the... Read More ...

In case you missed them, here are some articles from July of particular interest to R users: A new 90-second, creative commons video helps R enthusiasts share the history, community and applications of R. Analyst group Butler Analytics reviews 10 predictive analytics platforms, and says that "real analysts use R". An excellent example of Simpsons Paradox: US median wages...

Where else will you hear Pimco, rolling correlation, R, gridSVG, lattice, and d3 all in one post? Let’s mix them all together to see what might happen. For those here for the geekery, we will add a d3 axis for our y and it will follow the mouse. For those who care nothing about d3 and R, you might...

I have released a new version of the stringdist package. Besides a some new string distance algorithms it now contains two convenient matching functions: amatch: Equivalent to R's match function but allowing for approximate matching. ain: Similar to R's %in% … Continue reading →

A minor maintenance release of inline is now on CRAN, and has just been already included in Debian. This release contains a patch kindly contributed by Mikhail Umorin which fixes the of \code{cfunction} with lists of signatures and function bodies. ...

I got this Google Developers R Programming Video Lectures from Stephen's blog - Getting Genetics Done.Very useful R tutorial for beginner! Short and efficient. Here is what I learned after watching the lectures:4.3 - Add a Warning or Stop the Func...

I veer from finance to tech, so let’s use some data from FRED/OECD this time. I do not think I need to comment much on what has happened to New Car Registrations in Greece. Reverse data binding a line plot from ggplot2 or lattice is slightly more difficult than what we saw in the last post I Want...

Median Absolute Deviation (MAD) or Absolute Deviation Around the Median as stated in the title, is a robust measure of central tendency. Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions. Unlike the standard mean/standard deviation combo, MAD is not sensitive to the presence of outliers. This

If you really love R, you should put it on your iPhone. Apple gives the measurements for its products here. Let's use a little grid magic with ggplot2 to make a chart for the back of your iphone similar to this. require(grid)require(ggplot2)# thanks for the Apple measurements# https://developer.apple.com/resources/cases/x11( height = as.numeric(convertX(unit(58.55, "mm"), "in")),...

07.08.2013 A new Windows version of Bio7 is available. This version comes with a lot of new features and improvements for Java, R and ImageJ. One highlight is that you can now interpret Jython (Python) code with Bio7. In addition a new console implementation is available which offers access to a native shell, different Java

For July’s meetup, Data Science MD was honored to have Jonathan Street of NIH and Brian Godsey of RedOwl Analytics come discuss using Python and R for data analysis. Jonathan started off by describing the growing ecosystem of Python data … Continue reading → The post Data Science MD July Recap: Python and R Meetup appeared first on...