Le Monde puzzle [#815]

April 11, 2013
By
Le Monde puzzle [#815]

The last puzzle was as follows: Take a card stack with 32 cards and divide it into five non-empty piles. A move consists in doubling a pile size by taking card from a single and larger pile. Is it possible to recover the original stack by repeatedly using moves? Same question for 100 cards and five

Read more »

Dropbox & R Data

April 11, 2013
By

I'm always looking for ways to download data from the internet into R. Though I prefer to host and access plain-text data sets (CSV is my personal favourite) from GitHub (see my short paper on the topic) sometimes it's convenient to get data stored on Dropbox. There has been a change in the way Dropbox...

Read more »

Reserving with negative increments in triangles

April 11, 2013
By
Reserving with negative increments in triangles

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique: Fit a model not on ‘s but on , for some , Use that model to make predictions, and then...

Read more »

Stepwise Regression for Big Data with RevoScaleR

April 11, 2013
By

by Joseph Rickert In a recent blog post, Revolution's Thomas Dinsmore announced stepwise regression for big data as a new feature of Revolution R Enterprise 6.2 that is scheduled for general availability later this month. Today, I would like to provide a simple example of doing stepwise regression with rxLinMod() (the RevoScaleR analog of lm()), using a 100,000 row...

Read more »

High Obesity levels found among fat-tailed distributions

April 11, 2013
By
High Obesity levels found among fat-tailed distributions

In my never ending quest to find the perfect measure of tail fatness, I ran across this recent paper by Cooke, Nieboer, and Misiewicz. They created a measure called the “Obesity index.” Here’s how it works: Step 1: Sample four times from a distribution. The sample points should be independent and identically distributed (did your

Read more »

Spring Cleaning Data: 4 of 6- Combining the files & Changing the Dates/Credit Type

April 11, 2013
By

So far the individual files have been left on their own, it is now time to combine using the rbind function, simple enough after all we have done so far, then the quick check with summary.Now that we have one data frame, time to make larger changes to ...

Read more »

Summarizing Data in R

April 10, 2013
By
Summarizing Data in R

When work with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. Of course, R also has similar calculations that can be used to

Read more »

In case you missed it: March 2013 Roundup

April 10, 2013
By

In case you missed them, here are some articles from March of particular interest to R users. Facebook used R to analyze profile photo changes to create a map of same-sex marriage support in the USA. Joe Rickert contrasts random sampling with fitting models directly to large data sets. A presentation by Carlos Somohano summarizes the history, skills and...

Read more »

A quick introduction to ggplot2

April 10, 2013
By

My friend Jonah asked me to guest lecture in his R seminar aimed at grad students and postdocs in Integrative Biology. I gave Jonah a bunch of topic options ranging from reproducible research with R to data manipulation. The consensus was data visualization so I put together a 2 hour talk/hands on presentation for ggplot2

Read more »

Tweaking Movie Subtitles with R

April 10, 2013
By
Tweaking Movie Subtitles with R

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in the srt file. For simplicity I used exactly 1 second in the below example.You'll see that I use my function dl_from_dropbox(), on which I wrote...

Read more »

Download Files from Dropbox Programmatically with R

April 10, 2013
By

Here is a usefull snippet that I stole from qdap::url_dl to download files from my Dropbox to the working directory.Argument x is the document name and d the document key. dl_from_dropbox require(RCurl) ...

Read more »

Are knuckleballers more volatile?

April 10, 2013
By
Are knuckleballers more volatile?

For years, the Blue Jays have been also-rans in the AL East but splashed out this season turning prospects into established stars in the hope of reaching the World Series Seven games in and the 2-5 start has the perennial doubts resurfacing, particularly as none of the much-vaunted starters has yet to pitch a seventh

Read more »

R and social media

April 10, 2013
By

R is a piece of software, but it is also a community. Help community The most visible aspect of the R community is help.  This is also the most useful to new users.  The initial sense of cooperation with R was driven mainly by people helping each other. You don’t need to actively participate in The post R...

Read more »

A few lists for data scientists and statisticians

April 10, 2013
By

Looking for more resources on the web or people to follow on Twitter? Here are some lists you may find useful: 100 Savvy Sites on Statistics, which includes 17 sites that focus on R Programming Kalido offers this list of 30 influential data scientists on Twitter Big Data Republic is taking votes for this list of 100 influential tweeters...

Read more »

Highlight cells in markdown tables

April 10, 2013
By
Highlight cells in markdown tables

Although I have always wanted to add such feature to pander, a recent question on SO urged me to create some helper functions so that users could easily highlight some rows, columns or even just a few cells in a table and export the result to markdown,...

Read more »

Video: Using R for causal inference in a study of expensive public policy decisions

April 10, 2013
By
Video: Using R for causal inference in a study of expensive public policy decisions

This post shares the video from a talk presented on 9th April 2013 by Jim Savage at Melbourne R Users. Billions of dollars a year are spent subsidising tuition of Australian university students. A controversial report last year by the … Continue reading →

Read more »

Spring Cleaning Data: 3 of 6- The Little but Big Correction

April 10, 2013
By

Building on the previous posts (post 1 & post 2) I found there were 12 instances with the type of credit where there was a "Primary*" which means the lender borrowed twice in the same day, in the 2010 q4 data. It would seem simple enough in Excel, ...

Read more »

Global Distribution of Breast Cancer: some initial considerations

April 10, 2013
By
Global Distribution of Breast Cancer: some initial considerations

As mentioned on a previous post, I am interested in analysing if people’s ‘unhealthy’ lifestyle is associated to new cases of cancer diagnosed globally. The outcome variable I want to explore (at least for now), is the number of new cases of breast cancer in 100,000 female residents. I have this data for 173...

Read more »

highlight 0.4.1

April 10, 2013
By

The highlight package has been missing from CRAN for quite some time Now it is back, with fewer dependencies. It used to depend on Rcpp and parser, but since the code logic from parser has been brought to R, highlight … Continue reading →

Read more »

Mobile version of the graph gallery

April 10, 2013
By
Mobile version of the graph gallery

The R Graph Gallery has been a popular website for many years now. The number of graphics keeps growing as people send me their code. When browsing the website with a mobile device the experience was frustrating, as too much … Continue reading →

Read more »

Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda

April 10, 2013
By
Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda

April 18, 2013 - 18:00 - 21:00 Fiori Oscuri Bistrot & Bar (www.fiorioscuri.it) Via Fiori Oscuri, 3 - Milano (Zona Brera) 18.00 - 18.15 Registration 18.15 - 18.30 Welcome presentation Andrea Spanò, Partner at Quantide 18.30 - 19.00 Digit recognition Machine … Continue reading →

Read more »

Finding the Distribution Parameters

April 9, 2013
By
Finding the Distribution Parameters

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution

Read more »

2013-4 Generating Structured and Labelled SVG

April 9, 2013
By

This article discusses the importance of providing structure and labelling within SVG code, particularly when the SVG code is generated indirectly by a high-level system and when the SVG code describes a complex image such as a statistical plot. We … Continue reading →

Read more »

Second edition of Crawley’s The R Book

April 9, 2013
By
Second edition of Crawley’s The R Book

The second edition of Michael Cawley's The R Book is available from Wiley. According to the publisher, the new edition boasts the following features:"Features full colour text and extensive graphics throughout.Introduces a clear structure with numbered...

Read more »

Some R User Group Presentations from Europe

April 9, 2013
By

by Joseph Rickert I am beginning to get excited about going to Spain for useR 2013 which will be held at the University of Castilla-La Mancha, so I have been using the links on the Revolution's local user directory webpage to see what the European R user groups are doing. Here are just a few highlights of materials that...

Read more »

Behind the NCAA Visualizer: Python, R and JavaScript

April 9, 2013
By

Rodrigo Zamith's NCAA Tournament Visualizer is a great example of an interactive data visualization. If you want to create something similar, Rodrigo has shared detailed behind-the-scenes information on how it was created. He used a mix of tools: Python was used to scrape team statistics fromt the NCAA website R was used to prepare the data for analysis, and...

Read more »

Matrix Cumulative Coherence: Fourier Bases, Random and Sensing Matrices

April 9, 2013
By

Compressive sampling (CS) is revolutionizing the way we process analog to digital conversion, our understanding of linear systems and the limits of information theory. One of the key concept in CS is that a signal can be represented in a sparse bases o...

Read more »

Spring Cleaning Data: 2 of 6- Changing Column Names and Adding a Column

April 9, 2013
By

The first post (found here) we downloaded the data and imported it to R using the gdata package. This post we will be changing the column names to make them more reasonable, and adding a quarter variable. The reason for changing the column names is bec...

Read more »

Happy biRthday

April 9, 2013
By
Happy biRthday

Today is my birthday. It’s also the birthday of a close friend. What an incredible coincidence! Or wait, may be is just expected. One more time R comes into our help, because it has a built-in function to answer our question. … Continue reading →

Read more »

Sponsors