Mapping San Francisco crime

December 30, 2014
By
Mapping San Francisco crime

When I was working as a data scientist at Apple in Silicon Valley, I’d drive up to San Francisco on nights and weekends to meet a girl for dinner or go to a meetup. I sort of fell in love with the city, and ... The post Mapping San Francisco crime appeared first on SHARP SIGHT LABS.

Read more »

Cluster Analysis of the NFL’s Top Wide Receivers

December 29, 2014
By
Cluster Analysis of the NFL’s Top Wide Receivers

“The time has come to get deeply into football. It is the only thing we have left that ain't fixed.”Hunter S. Thompson, Hey Rube Column, November 9, 2004I have to confess that I haven’t been following the NFL this year as much as planned or hoped.  On only 3 or 4 occasions this year have I been able to...

Read more »

OpenCPU release 1.4.6: gzip and systemd

December 29, 2014
By
OpenCPU release 1.4.6: gzip and systemd

OpenCPU server version 1.4.6 has been released to launchpad, OBS, and dockerhub (more about docker in a future blog post). I also updated the instructions to install the server or build from source for rpm or deb. If you have a running deployme...

Read more »

top posts for 2014

December 29, 2014
By
top posts for 2014

Here are the most popular entries for 2014: 17 equations that changed the World (#2) 995 Le Monde puzzle 992 “simply start over and build something better” 991 accelerating MCMC via parallel predictive prefetching 990 Bayesian p-values 960 posterior predictive p-values 849 Bayesian Data Analysis 846 Bayesian programming 834 Feller’s shoes

Read more »

WrightMap and TAM – Example continued…

December 29, 2014
By
WrightMap and TAM – Example continued…

As a follow up on the previous about integrating the TAM and WrightMap packages, we received a message from one of the TAM developers, Alexander Robitzsch, suggesting that it is possible to generate the Wright Map directly from the MML estimated distribution (instead of using the WLE estimates used in the previous post). Let’s start with the same setup: library(TAM) library(WrightMap) data( sim.rasch...

Read more »

First Day of the Month, Using R

December 29, 2014
By
First Day of the Month, Using R

Future-proofing is an important concept when designing automated reports. One thing that can get out of hand over time is when you accumulate so many periods of data that your charts start to look overcrowded. You can solve for this by limiting the num...

Read more »

Multivariate Medians

December 29, 2014
By

I'll bet that in the very first "descriptive statistics" course you ever took, you learned about measures of "central tendency" for samples or populations, and these measures included the median. You no doubt learned that one useful feature of the median is that, unlike the (arithmetic, geometric, harmonic) mean, it is relatively "robust" to outliers in the data.(You...

Read more »

R wins a 2014 Bossie Award

December 29, 2014
By

I missed this when it was announced back on September 29, but R won a 2014 Bossie Award for best open-source big-data tools from InfoWorld (see entry number 5): A specialized computer language for statistical analysis, R continues to evolve to meet new challenges. Since displacing lisp-stat in the early 2000s, R is the de-facto statistical processing language, with...

Read more »

Making Static & Interactive Maps With ggvis (+ using ggvis maps w/shiny)

December 29, 2014
By
Making Static & Interactive Maps With ggvis (+ using ggvis maps w/shiny)

Even though it’s still at version 0.4, the ggvis package has quite a bit of functionality and is highly useful for exploratory data analysis (EDA). I wanted to see how geographical visualizations would work under it, so I put together six examples that show how to use various features of ggvis for presenting static &

Read more »

Getting R and Java 1.8 to work together on OSX

December 29, 2014
By

Hey Mac OSX users with Java 1.8 installed. Did R just request a Java 1.6 installation and then promptly crash your session?  If so, read on… The Problem A few days ago I was attempting to use the mallet package for topic models and I found that typing > library(mallet) caused two things to happen:

Read more »

rfoaas 0.0.5

December 29, 2014
By

A new version of rfoaas is now on CRAN. The rfoaas package provides an interface for R to the most excellent FOAAS service--which provides a modern, scalable and RESTful web service for the frequent need to tell someone to eff off. This version align...

Read more »

How to extract a data.frame from string data

December 28, 2014
By

A guest article by Asher Raz, PhD, CareerHarmony Sometimes, data of subjects are recorded on a server (e.g. SQL server) as string data records for each subject. In some cases we need only a part of those string data for each subject and we need it as numerical data (e.g. as a data.frame). How can we get the required...

Read more »

RcppArmadillo 0.4.600.0

December 28, 2014
By

Conrad produced another minor release 4.600 of Armadillo. As before, I had created a GitHub-only pre-release(s) of his pre-release(s), and tested a pre-release as well as the actual release against the now over one hundred CRAN dependents of our RcppArmadillo package. The tests passed fine as usual with less than a handful of checks not...

Read more »

A time series contest attempt

December 28, 2014
By
A time series contest attempt

I saw the post a time series contest on Rob J Hyndman's blog. Since I am still wanting to play around with some bigger data sets, so I went to the source website https://drive.google.com/folderview?id=0BxmzB6Xm7Ga1MGxsdlMxbGllZnM&usp=shar...

Read more »

[NYC] Featured R experts Meetup, R classes and 12 week Data Science Bootcamp

December 28, 2014
By
[NYC] Featured R experts Meetup, R classes and 12 week Data Science Bootcamp

There are a few exciting announcements I would love to share with R community. We feel very honored to host meetup and class offered by Kaggle #1 ranked Data Scientist, Owen Zhang and book author of Applied predictive modeling, Max Kuhn. Featured R experts meetup Featured talk given by Kaggle world ranked #1 Owen Zhang

Read more »

Sequence of shopping carts in-depth analysis with R – Clustering

December 27, 2014
By
Sequence of shopping carts in-depth analysis with R – Clustering

This is the second part of the in-depth sequence analysis. In the previous post, we processed data to the required format, plotted a Sankey diagram, and did some distribution, frequency, time lapse and entropy analysis with visualization. For dessert, clustering! Clustering is an exploratory data analysis method aimed at finding automatically homogeneous groups or clusters in... Read More »

Read more »

Introductory R Presentation

December 27, 2014
By

I put together a short intro presentation for some people explaining a little bit about R from an introductory point of view. Slides put together with R/markdown and ioslides. Presentation here.

Read more »

Snowdoop/partools Update

December 27, 2014
By
Snowdoop/partools Update

I’ve put together an updated version of my partools package, including Snowdoop, an alternative to MapReduce algorithms.  You can download it here, version 1.0.1. To review:  The idea of Snowdoop is to create your own file chunking, rather than having something like Hadoop do it for you, and then using ordinary R coding to perform … Continue reading...

Read more »

Fitting a mixture of independent Poisson distributions

December 26, 2014
By
Fitting a mixture of independent Poisson distributions

This is an example from Zucchini & MacDonald’s book on Hidden Markov Models for Time Series (exercise 1.3). The data is annual counts of earthquakes of magnitude 7 or greater, which exhibits both overdispersion for a Poisson (where the mean should equal the variance) as well as serial dependence. The aim is to fit a mixture of...

Read more »

Create an R-tree data structure using Rcpp and Boost::Geometry

December 26, 2014
By
Create an R-tree data structure using Rcpp and Boost::Geometry

Introduction The purpose of this post is to show how to use Boost::Geometry library which was introduced recently in Rcpp. Especially, we focus on R-tree data structure for searching objects in space because only one spatial index is implemented - R-tree Currently in this library. Boost.Geometry which is part of the Boost C++ Libraries gives us algorithms for solving geometry problems. In this library, the...

Read more »

Animations and GIFs using ggplot2

December 26, 2014
By
Animations and GIFs using ggplot2

Tracing a regression lineDiverging density plotsHappy New Year plot Happy New Year everyone! For the last post of the year, I thought I'd have a little fun with the new animation package in R. It's actually really easy to use. I recently had some fun w...

Read more »

rfoaas 0.0.4.20141225 — not on CRAN

December 25, 2014
By

A new version of rfoaas was prepared for CRAN, but refused on the grounds of having been updated within 24 hours. Oh well. To recap, the rfoaas package provides an interface for R to the most excellent FOAAS service -- which provides a modern, scalable and RESTful web service for the frequent need to tell someone to...

Read more »

Principal Component Analysis on Imaging

December 25, 2014
By
Principal Component Analysis on Imaging

Ever wonder what's the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as...

Read more »

rfoaas 0.0.4.20141224

December 24, 2014
By

A new version of rfoaas is now on CRAN. The rfoaas package provides an interface for R to the most excellent FOAAS service -- which provides a modern, scalable and RESTful web service for the frequent need to tell someone to eff off. This is minor up...

Read more »

Time Stacking and Time Slicing in R

December 24, 2014
By
Time Stacking and Time Slicing in R

Time lapses are a fun way to quickly show a long period of time. They typically involve setting up your camera on a tripod and taking photos at a regular interval, like every 5 seconds. After all the photos have been taken, they are combined into a mov...

Read more »

Visualizing APA 6 Citations: qdapRegex 0.2.0 & qdapTools 1.1.0

December 24, 2014
By
Visualizing APA 6 Citations: qdapRegex 0.2.0 & qdapTools 1.1.0

qdapRegex 0.2.0 & qdapTools 1.1.0 have been released to CRAN.  This post will provide some of the packages’ updates/features and provide an integrate demonstration of extracting and viewing in-text APA 6 style citations from an MS Word (.docx) document. qdapRegex … Continue reading →

Read more »

Plotting Fundamentals Chapter

December 24, 2014
By
Plotting Fundamentals Chapter

I have uploaded a draft of Chapter 3 — Plotting Fundamentals — for the Introduction to Fisheries Analyses with R (IFAR) book (note the new name).  This chapter is meant to be a quick introduction to plotting in base R for plots … Continue reading →

Read more »

Explore a comet with R’s "rgl" package

December 24, 2014
By
Explore a comet with R’s "rgl" package

Last month, the Philae lander touched down on comet Churyumov–Gerasimenko. In the process, the lander and the orbiting Rosetta probe captured detailed data on the geometry of the comet, which the ESA published as a shape file. You can use the rgl package to visualize and explore such shape files quite simply as follows: Then you can manipulate the...

Read more »

Update on improving examples in base-R

December 24, 2014
By
Update on improving examples in base-R

Last month I was ranting about the state of some of the examples in base-R, particularly the paste function. Martin Maechler has now kindly taken my suggested examples and added them into R. Hopefully this will reduce the number of newbie questions about “how do I join these strings together”. Since Martin showed some interest

Read more »