Animated choropleths to visualize mortality rates of children under 5 and gender differences using rMaps

September 17, 2014
By

This post displays two animated choropleths. One for global mortality rates for children under 5 (per 1000 live births) and the second for the difference in global mortality rates for males and female children under 5 (per 1000). Please click here:&nbs...

Read more »

BCEA 2.1

September 17, 2014
By
BCEA 2.1

We're about to release the new version of BCEA, which will contain some major changes.A couple of changes in the basic code that should improve the computational speed. In general, BCEA doesn't really run into troubles because most of the computations ...

Read more »

Applications of R presentations at Dataweek

September 17, 2014
By

I'm speaking at the DataWeek conference in San Francisco today. My talk follows Skylar Lyon from Accenture — I'm really looking forward to hearing how he uses Revolution R Enterprise with Teradata Database to run R in-database with 400 million rows of data. Update: Here are Skylar's slides. The slides for my talk on other companies' applications of R...

Read more »

Migrating Table-oriented Web Scraping Code to rvest w/XPath & CSS Selector Examples

September 17, 2014
By

I was offline much of the day Tuesday and completely missed Hadley Wickham’s tweet about the new rvest package: Are you an #rstats user who misses python's beautiful soup? Please try out rvest (http://t.co/PeiIHr3jDW) and let me know what you think.— Hadley Wickham (@hadleywickham) September 12, 2014 My intrepid colleague (@jayjacobs) informed me of this (and didn’t...

Read more »

Bayes says “don’t worry” about Scotland’s Referendum

September 17, 2014
By
Bayes says “don’t worry” about Scotland’s Referendum

Just few hours before Scots head to the polls, there is not an overwhelming advantage of the anti-independence vote. Actually, the margin is shorter than last time I looked at it, but despite such a growing trend in favor of the "Yes" campaign in the last weeks, the "NO" side has an edge still. To … Read More...

Read more »

Using great circles and ggplot2 to map arrival/departure of 2014 US Open Tennis Players

September 17, 2014
By
Using great circles and ggplot2 to map arrival/departure of 2014 US Open Tennis Players

Please click on the image for information on how to use R and ggplot2 to generate this plot. 

Read more »

Maximal Information Coefficient (Part II)

September 17, 2014
By
Maximal Information Coefficient (Part II)

A while back, I wrote a post simply announcing a recent paper that described a new statistic called the "Maximal Information Coefficient" (MIC), which is able to describe the correlation between paired variables regardless of linear or nonlinear relationship. This turned out to be quite a popular post, and included a lively discussion...

Read more »

The Traveling Salesman with Simulated Annealing, R, and Shiny

September 17, 2014
By
The Traveling Salesman with Simulated Annealing, R, and Shiny

I built an interactive Shiny application that uses simulated annealing to solve the famous traveling salesman problem. You can play around with it to create and solve your own tours at the bottom of this post. Here's an animation of the annealing process finding the shortest path through the 48 state capitals of the contiguous...

Read more »

Changes to FSA — Size Structure

September 16, 2014
By
Changes to FSA — Size Structure

I have added a (very rough) first draft to the Size Structure chapter of the forthcoming Introductory Fisheries Science with R book on the book’s fishR webpage.  Accompanying this chapter are major changes to all of the proportional size distribution … Continue reading →

Read more »

PerformanceAnalytics update released to CRAN

September 16, 2014
By
PerformanceAnalytics update released to CRAN

Version number 1.4.3541 of PerformanceAnalytics was released on CRAN today. If you’ve been following along, you’ll note that we’re altering our version numbering system.  From here on out, we’ll be using a “major.cran-release.r-forge-rev” form so that when issues are reported it will be easier for us to track where they may have been introduced. Even

Read more »

New members for R-core and R Foundation

September 16, 2014
By

The R Foundation for Statistical Computing, the Vienna-based non-profit organization that oversees the R Project, has just added several new "ordinary members". (Ordinary members participate in R Foundation meetings and provide guidance to the project.) The new members are: Dirk Eddelbuettel, Torsten Hothorn, Marc Schwartz, Hadley Wickham, and Achim Zeileis, Martin Morgan and Michael Lawrence. The R Core group,...

Read more »

R package to convert statistical analysis objects to tidy data frames

September 16, 2014
By

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject. R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to...

Read more »

3D Sine Wave

September 16, 2014
By
3D Sine Wave

Had a headache last night, so decided to take things easy and just read posts Google+. Then I came across this post which seems interesting so I thought I would play around before I head to bed. ...

Read more »

Notes from the Kölner R meeting, 12 September 2014

September 16, 2014
By
Notes from the Kölner R meeting, 12 September 2014

Last Friday we had guests from Belgium and the Netherlands joining us in Cologne. Maarten-Jan Kallen from BeDataDriven came from The Hague to introduce us to Renjin, and the guys from DataCamp in Leuven, namely Jonathan, Martijn and Dieter, gave an overview of their new online interactive training platform.RenjinMaarten-Jan gave a fascinating introduction to Renjin,...

Read more »

Using SQLite in R

September 16, 2014
By
Using SQLite in R

Working on big data requires a clean and robust approach on storing and accessing the data. SQLite is an all inclusive server-less database system in a single file. This is very convenient for data exchange between colleagues. Here is a workflow of SQLite data accessing and data storing in R. Connect to an SQLite database Related posts:

Read more »

Nuts and Bolts of Quantstrat, Part II

September 16, 2014
By
Nuts and Bolts of Quantstrat, Part II

Last week, I covered the boilerplate code in quantstrat. This post will cover parameters and adding indicators to strategies in … Continue reading →

Read more »

how to provide a variance calculation on your public-use survey data file without disclosing sampling clusters or violating respondent confidentiality

September 16, 2014
By

this post and accompanying syntax would not have been possible without dan oberski.  read more, find out why.  thanks dan.dear survey administrator: someone sent you this link because you work for an organization or a government agency that c...

Read more »

Why Are We Still Teaching t-Tests?

September 15, 2014
By
Why Are We Still Teaching t-Tests?

My posting about the statistics profession losing ground to computer science drew many comments, not only here in Mad (Data) Scientist, but also in the co-posting at Revolution Analytics, and in Slashdot.  One of the themes in those comments was that Statistics Departments are out of touch and have failed to modernize their curricula.  Though

Read more »

Interview with Romain Francois at useR! 2014

September 15, 2014
By

At the useR! 2014 conference, without a doubt one of the overriding themes was R’s...

Read more »

If the typing monkeys have met Mr Markov: probabilities of spelling "omglolbbq" after the digitial monkeys have read Dracula

September 15, 2014
By
If the typing monkeys have met Mr Markov: probabilities of spelling "omglolbbq" after the digitial monkeys have read Dracula

On the weekend, randomly after watching Catching Fire, I remember the problem of the typing monkeys (Infinite monkey theorem) in which basically could be defined as (Thanks to Wiki):# *******************#  INTRODUCTION# *******************The infi...

Read more »

Using Reddit’s JSON API to analyze post popularity

September 15, 2014
By
Using Reddit’s JSON API to analyze post popularity

Graduate student Clay McLeod decided to find out what makes a post on the social-sharing site Reddit popular. These are the questions he seeks to answer: What’s in a post? Reddit pulls in around 115 million unique visitors each month, amassing a staggering 5 billion page views per month. For a long time, I’ve wondered what factors draw people...

Read more »

Creating a map showing land covered by rising sea levels

September 15, 2014
By

I joined the Geekli.st climate Hackathon this weekend at the Hub Westminster (my favorite venue for Hackathons). While the organizers had lots of enthusiasm they had very little in the way of data for us to work on. No problem, ever since the Flood-relief hackathon I have wanted to use the SRTM ‘whole Earth’ elevation

Read more »

Mapping every IPv4 address

September 15, 2014
By
Mapping every IPv4 address

During July I was working with a commercial data source that provides extra data around IP addresses and it dawned on me: rather than pinging billions of IP addresses and creating map, I could create a map from all the geolocation data I had at my finger tips. At a high level I could answer “Where are all the IPv4 addresses worldwide?” But in...

Read more »

PCA / EOF for data with missing values – a comparison of accuracy

September 15, 2014
By
PCA / EOF for data with missing values – a comparison of accuracy

Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. "gappy"). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the "true" data set, as was similarly...

Read more »

How do you say π^π^π?

September 15, 2014
By
How do you say π^π^π?

Well, not that you really probably want to know how to say such an absurdly large number. However for those of you who are interested (allowing for rounding) it is:one quintillion, three hundred forty quadrillion, one hundred sixty-four trillion, one h...

Read more »

√(x²−1)(x²−k²).      x,k∈ℂ (actually just going over the unit…

September 15, 2014
By
√(x²−1)(x²−k²).      x,k∈ℂ
(actually just going over the unit…

√(x²−1)(x²−k²), k∈ℂ x²√(x²−1)(x²−k²).      x,k∈ℂ (actually just going over the unit circle, not all of ℂ) edit: hey, are these showing up as moving gif’s for you? code: require(animation) source(wegert.R) #where I define "plat" and "Z", standard for...

Read more »

One datavis for you, ten for me

September 14, 2014
By
One datavis for you, ten for me

Over the years of my graduate studies I made a lot of plots. I mean tonnes. To get an extremely conservative estimate I grep’ed for every instance of “plot(” in all of the many R scripts I wrote over the past five years. The actual number is very likely orders of magnitude larger as 1) many

Read more »

Trying a prefmap

September 14, 2014
By
Trying a prefmap

Preference mapping is a key technique in sensory and consumer research. It links the sensory perception on products to the liking of products and hence provides clues to the development of new, well tasting, products. Even though it is a key technique,...

Read more »

RDataMining Slides Series

September 14, 2014
By
RDataMining Slides Series

by Yanchang Zhao, RDataMining.com I have made a series of slides on R and data mining, based on my book titled R and Data Mining — Examples and Case Studies. The slides will be used at my presentations at seminars … Continue reading →

Read more »