Printing R help files in the console or in knitr documents

June 18, 2013
By

Yesterday, I was creating a knitr document based on a script, and was looking for a way to include content from an R help file. The script, which was a teaching document, had a help() command for when the author wanted to refer readers to R documentation. I wanted that text in my final document, though. There’s no...

Resources for getting started with R

June 18, 2013
By

As we believe you may know, we are having a webinar tomorrow (June 19th, 2013) on Predictive Analytics. During this webinar, you are going to be introduced to R, learn how to build a predictive model and also how to carry insightful analysis through visualization. As learning a new language can be a really difficult

BCEA 1.3.0

June 18, 2013
By

After months of work (although to be fair, we haven't worked 100% full time on this), Andrea and I are nearly ready to publish the next release of BCEA. Andrea has done a brilliant job and is responsible for most of the good new features (NB: see ...

PivotalR Improves the Scalability and Performance of In-Database Analytics

June 18, 2013
By

One of the greatest challenges while working with big datasets concerns the need to move information out of storage for analysis. To this end, the recent announcement of PivotalR 0.1 extends Pivotal HD's capabilities, allowing users of the statistical programming language R to perform in-database analytics without leaving the command line.

R GIS: Terrain Analysis for Polygons as Simple as it Gets!

June 18, 2013
By

library(rgdal)library(raster)alt gadm gadm_sub plot(alt)plot(gadm_sub, add=T)asp slo > extract(slo, gadm_sub, fun = mean, na.rm = T, small = T, df = T) ID slope1 1 9.9590532 2 1.0474433 3 7.4561654 4 1.6737865 5 11.946553> extract(asp, gadm_sub, fun = mean, na.rm = T, small...

The Green Number Effect

June 18, 2013
By

Following up on a suggestion from my previous post, here are the statistics for medal count versus age. Every point on the plot is the number (see colour legend on right) of athletes who have achieved a given number of medals by a particular age. There is clear evidence of a Green Number Effect: many

Quickly read Excel worksheets into R (Windows only…sorry)

June 18, 2013
By

I suppose most companies use the Microsoft Office suite of programs, and my office is no exception. It easy to import data from an API or a database into R, but importing data from an Excel workbook is a different story. There are a few R packages for reading Excel files, but I’ve had problems

Job opening! Come work with us!

June 18, 2013
By

Postdoctoral position in statistical modeling of social networks A full-time postdoctoral position is available beginning Fall 2014 in the research group of Tian Zheng and Andrew Gelman working on statistical analysis and modeling of social network data, in close cooperation with our experimental collaborators. Four key papers of this project so far are: http://www.stat.columbia.edu/~gelman/research/published/overdisp_final.pdf http://nersp.osg.ufl.edu/~ufruss/documents/mccormick_salganik_zheng10.pdf The post Job...

June 18, 2013
By

Here's a little r-script to convenientely download high quality digital elevation data, i.e. for the Alps, from HERE:require(XML)dir.create("D:/GIS_DataBase/DEM/")setwd("D:/GIS_DataBase/DEM/")doc urls names for (i in 1:length(urls)) download.file(urls, names) # unzip all files in dir and delete them afterwardssapply(list.files(pattern = "*.zip"), unzip)unlink(list.files(pattern = "*.zip"))p.s.: Also check raster::getData which pulls SRTM data at 90m resolution for a location / region!

Evaluating Optimization Algorithms in MATLAB, Python, and R

June 18, 2013
By

As those of you who read my last post know, I’m at the NIMBioS-CAMBAM workshop on linking mathematical models to biological data here at UT Knoxville. Day 1 (today) was on parameter estimation and model identifiability. Specifically, we (quickly) covered … Continue reading →

googleVis 0.4.3 released with improved Geocharts

June 18, 2013
By

The Google Charts Tools provide two kinds of heat map charts for geographical data, the Flash based Geomap and the HTML5/SVG based Geochart. I prefer the Geochart as it doesn't require Flash, but so far there have been two shortcomings with it: I couldn't add additional tooltip information and the default Mercator projection shows Greenland the...

Software Packages for Graphs and Charts

June 17, 2013
By

Graphs can be an important feature of analysis. A graph that has been well designed and put together can make summary statistics much more readable and increase the interpretability. It also makes reports and articles looks more professional. There are many software packages that are available to design great graphs and charts.  This seems to

Computerworld’s Beginners Guide to R

June 17, 2013
By

Sharon Machlis is not only the online managing editor at Computerworld, she's also a budding data scientist who recently started learning the R language. To the benefit of all other new R users, she's shared her learnings in an excellent 6-part beginners guide to R, published by Computerworld. It's jam-packed with useful information for anyone getting started with R,...

Zombie Apocalypse Survival Test – R-Powered (using Concerto)

June 17, 2013
By

This test is the first attempt to seriously assess the ability of individuals to survive a zombie apocalypse.  This test is administered using the R powered open-source testing platform Concerto developed at the University of Cambridge. The t...

Bayesian computational tools

June 17, 2013
By

I just updated my short review on Bayesian computational tools I first wrote in April for the Annual Review of Statistics and Its Applications. The coverage is quite restricted, as I took advantage of two phantom papers I had started a while ago, one with Jean-Michel Marin, on hierarchical Bayes methods and on ABC. (As

Dave Harris on Maximum Likelihood Estimation

June 17, 2013
By

At our last Davis R Users’ Group meeting of the quarter, Dave Harris gave a talk on how to use the bbmle package to fit mechanistic models to ecological data. Here’s his script, which I ran throgh the spin function in knitr: # Load data library(emdbook) ## Loading required package: MASS Loading required package: lattice library(bbmle) ## Loading required package:...

Oracle R Connector for Hadoop 2.1.0 released

June 17, 2013
By

(This article was first published on Oracle R Enterprise, and kindly contributed to R-bloggers) Oracle R Connector for Hadoop (ORCH), a collection of R packages that enables Big Data analytics using HDFS, Hive, and Oracle Database from a local R environment, continues to make advancements. ORCH 2.1.0 is now available, providing a flexible framework while remarkably improving performance and...

Model Selection in Bayesian Linear Regression

June 17, 2013
By
$Model Selection in Bayesian Linear Regression$

Previously I wrote about performing polynomial regression and also about calculating marginal likelihoods. The data in the former and the calculations of the latter will be used here to exemplify model selection. Consider data generated by and suppose we wish to fit a polynomial of degree 3 to the data. There are then 4 regression The post Model...

Stashing and playing with raw data locally from the web

June 17, 2013
By

It is getting easier to get data directly into R from the web. Often R packages that retrieve data from the web return useful R data structures to users like a data.frame. This is a good thing of course to make things user friendly. However, what if you want to drill down into the data that's returned from a query...

analyze the pesquisa de orcamentos familiares (pof) with r

June 17, 2013
By

for the unlucky among us born without a portuguese mother tongue, the pesquisa de orcamentos familiares (pof) translates to survey of household budgets.  this data set captures brazilian family consumption habits, allocation of expenses, and incom...

Annotating select points on an X-Y plot using ggplot2

June 16, 2013
By

or, Is the Seattle Mariners outfield a disaster?The BackstoryEarlier this week (2013-06-10), a blog post by Dave Cameron appeared at USS Mariner under the title “Maybe It's Time For Dustin Ackley To Play Some Outfield”. In the first paragraph, Cameron describes to the Seattle Mariners outfield this season as “a complete disaster” and Raul Ibanez as...

Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

Introduction Recently, I began a series on exploratory data analysis (EDA), and I have written about descriptive statistics, box plots, and kernel density plots so far.  As previously mentioned in my post on box plots, there is a way to combine box plots and kernel density plots.  This combination results in violin plots, and I

Dynamic Data Visualizations in the Browser Using Shiny

June 16, 2013
By

After being busy the last two weeks teaching and attending academic conferences, I finally found some time to do what I love, program data visualizations using R. After being interested in Shiny for a while, I finally decided to pull the trigger and build my first Shiny app! I wanted to make a proof of

General Regression Neural Network with R

June 16, 2013
By

Similar to the back propagation neural network, the general regression neural network (GRNN) is also a good tool for the function approximation in the modeling toolbox. Proposed by Specht in 1991, GRNN has advantages of instant training and easy tuning. A GRNN would be formed instantly with just a 1-pass training with the development data.

Scenario analysis and trading options using R

June 16, 2013
By

I present you with my restructured project on options trading and scenario analysis. You are more than welcome to try it out. Firstly, I will give a small presentation that will reveal what you can do with it and whether you need to continue reading. T...

The scaling of Expected Shortfall

June 16, 2013
By

Getting Expected Shortfall given the standard deviation or Value at Risk. Previously There have been a few posts about Value at Risk and Expected Shortfall. Properties of the stable distribution were discussed. Scaling One way of thinking of Expected Shortfall is that it is just some number times the standard deviation, or some other number … Continue reading...

Distribution of car weights

June 16, 2013
By

Two weeks ago I described car data, among which weight distribution of cars in Netherlands. At that time it was purely plots. In the mean time I decided I wanted to model trends. As a first step of that, I decided to fit distributions for these da...

Modeling an Infant’s Feeding Schedule with Periodic Smoothing Splines

June 15, 2013
By

Feeding Schedule While on paternity leave I had an opportunity to test out periodic smoothing splines (within the framework of generalized additive models) on an interesting time-series-- an infant's feeding schedule. read more

Some days ago H. Wickham (Chief Scientist of the RStudio company) posted an article about the RStudio CRAN mirror with …Continuar leyendo »