Mathematical abstraction and the robustness to assumptions

April 12, 2013
By
Mathematical abstraction and the robustness to assumptions

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities. All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst

Read more »

Stan 1.3.0 and RStan 1.3.0 Ready for Action

April 12, 2013
By
Stan 1.3.0 and RStan 1.3.0 Ready for Action

The Stan Development Team is happy to announce that Stan 1.3.0 and RStan 1.3.0 are available for download. Follow the links on: Stan home page: http://mc-stan.org/ Please let us know if you have problems updating. Here’s the full set of release notes. v1.3.0 (12 April 2013) ====================================================================== Enhancements ---------------------------------- Modeling Language * forward sampling (random The post Stan...

Read more »

Extending RevoScaleR for Mining Big Data – Discretization

April 12, 2013
By
Extending RevoScaleR for Mining Big Data – Discretization

by Derek McCrae Norton, Senior Sales Engineer In this second installment of Extending RevoScaleR for Mining Big Data we look at how to use the building blocks provided by RevoScaleR to transform continuous variables into discrete. Motivation: Discretize continuous variables on big data. Discretization is a technique to convert continuous variables into discrete variables, and it is sometimes useful...

Read more »

Spring Cleaning Data: 5 of 6- 2 ifelse vs Merge

April 12, 2013
By

The blog in the data cleaning series looks at separating out the Federal Reserve Districts. What I wanted was two additional columns, where I had the name of the city and the number for each district. Since I was on a separation kick I thought it would...

Read more »

Using the RcppArmadillo-based Implementation of R’s sample()

April 12, 2013
By
Using the RcppArmadillo-based Implementation of R’s sample()

Overview and Motivation All of R’s (r*, p*, q*, d*) distribution functions are available in C++ via the R API. R is written in C, and the R API has no concept of a vector (at least not in the STL sense). Consequently, R’s sample() function can’t just be exported via the R API, despite its importance and usefulness....

Read more »

Travis CI for R! (not yet)

April 12, 2013
By
Travis CI for R! (not yet)

A few days ago I wrote about Travis CI, and was wondering if we could integrate the testing of R packages into this wonderful platform. A reader (Vincent Arel-Bundock) pointed out in the comments that Travis was running Ubuntu that allows you to install software packages at your will. I took a look at the documentation, and realized...

Read more »

Processing ABI .fsa files in R, part 1.

April 11, 2013
By
Processing ABI .fsa files in R, part 1.

I’ve been working on a lot of AFLP data this winter. I’d really like to be able to do all the analysis in R, for a few reasons. First, it would mean no more fighting with GeneMapper, which is incredibly frustrating: it’s Windows-only, expensive, closed-source and painfully underpowered for the job. Second, presumably if I

Read more »

Download File from Google Drive/Docs Programmatically with R

April 11, 2013
By

Following up my lattest posting on how to download files from the cloud with R..dl_from_GoogleD ## Arguments:## output = output file name## key = Google document key## format = output format (pdf, rtf, doc, txt..)## Note: File must be shareable! ...

Read more »

Le Monde puzzle [#815]

April 11, 2013
By
Le Monde puzzle [#815]

The last puzzle was as follows: Take a card stack with 32 cards and divide it into five non-empty piles. A move consists in doubling a pile size by taking card from a single and larger pile. Is it possible to recover the original stack by repeatedly using moves? Same question for 100 cards and five

Read more »

Dropbox & R Data

April 11, 2013
By

I'm always looking for ways to download data from the internet into R. Though I prefer to host and access plain-text data sets (CSV is my personal favourite) from GitHub (see my short paper on the topic) sometimes it's convenient to get data stored on Dropbox. There has been a change in the way Dropbox...

Read more »

Reserving with negative increments in triangles

April 11, 2013
By
Reserving with negative increments in triangles

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique: Fit a model not on ‘s but on , for some , Use that model to make predictions, and then...

Read more »

Stepwise Regression for Big Data with RevoScaleR

April 11, 2013
By

by Joseph Rickert In a recent blog post, Revolution's Thomas Dinsmore announced stepwise regression for big data as a new feature of Revolution R Enterprise 6.2 that is scheduled for general availability later this month. Today, I would like to provide a simple example of doing stepwise regression with rxLinMod() (the RevoScaleR analog of lm()), using a 100,000 row...

Read more »

High Obesity levels found among fat-tailed distributions

April 11, 2013
By
High Obesity levels found among fat-tailed distributions

In my never ending quest to find the perfect measure of tail fatness, I ran across this recent paper by Cooke, Nieboer, and Misiewicz. They created a measure called the “Obesity index.” Here’s how it works: Step 1: Sample four times from a distribution. The sample points should be independent and identically distributed (did your

Read more »

Spring Cleaning Data: 4 of 6- Combining the files & Changing the Dates/Credit Type

April 11, 2013
By

So far the individual files have been left on their own, it is now time to combine using the rbind function, simple enough after all we have done so far, then the quick check with summary.Now that we have one data frame, time to make larger changes to ...

Read more »

Summarizing Data in R

April 10, 2013
By
Summarizing Data in R

When work with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. Of course, R also has similar calculations that can be used to

Read more »

In case you missed it: March 2013 Roundup

April 10, 2013
By

In case you missed them, here are some articles from March of particular interest to R users. Facebook used R to analyze profile photo changes to create a map of same-sex marriage support in the USA. Joe Rickert contrasts random sampling with fitting models directly to large data sets. A presentation by Carlos Somohano summarizes the history, skills and...

Read more »

A quick introduction to ggplot2

April 10, 2013
By

My friend Jonah asked me to guest lecture in his R seminar aimed at grad students and postdocs in Integrative Biology. I gave Jonah a bunch of topic options ranging from reproducible research with R to data manipulation. The consensus was data visualization so I put together a 2 hour talk/hands on presentation for ggplot2

Read more »

Tweaking Movie Subtitles with R

April 10, 2013
By
Tweaking Movie Subtitles with R

I use R to fix subtitles that are not in sync with my movies. For the example below the subs were showing too early - so I added some time to each sequence in the srt file. For simplicity I used exactly 1 second in the below example.You'll see that I use my function dl_from_dropbox(), on which I wrote...

Read more »

Download Files from Dropbox Programmatically with R

April 10, 2013
By

Here is a usefull snippet that I stole from qdap::url_dl to download files from my Dropbox to the working directory.Argument x is the document name and d the document key. dl_from_dropbox require(RCurl) ...

Read more »

Are knuckleballers more volatile?

April 10, 2013
By
Are knuckleballers more volatile?

For years, the Blue Jays have been also-rans in the AL East but splashed out this season turning prospects into established stars in the hope of reaching the World Series Seven games in and the 2-5 start has the perennial doubts resurfacing, particularly as none of the much-vaunted starters has yet to pitch a seventh

Read more »

R and social media

April 10, 2013
By

R is a piece of software, but it is also a community. Help community The most visible aspect of the R community is help.  This is also the most useful to new users.  The initial sense of cooperation with R was driven mainly by people helping each other. You don’t need to actively participate in The post R...

Read more »

A few lists for data scientists and statisticians

April 10, 2013
By

Looking for more resources on the web or people to follow on Twitter? Here are some lists you may find useful: 100 Savvy Sites on Statistics, which includes 17 sites that focus on R Programming Kalido offers this list of 30 influential data scientists on Twitter Big Data Republic is taking votes for this list of 100 influential tweeters...

Read more »

Highlight cells in markdown tables

April 10, 2013
By
Highlight cells in markdown tables

Although I have always wanted to add such feature to pander, a recent question on SO urged me to create some helper functions so that users could easily highlight some rows, columns or even just a few cells in a table and export the result to markdown,...

Read more »

Video: Using R for causal inference in a study of expensive public policy decisions

April 10, 2013
By
Video: Using R for causal inference in a study of expensive public policy decisions

This post shares the video from a talk presented on 9th April 2013 by Jim Savage at Melbourne R Users. Billions of dollars a year are spent subsidising tuition of Australian university students. A controversial report last year by the … Continue reading →

Read more »

Spring Cleaning Data: 3 of 6- The Little but Big Correction

April 10, 2013
By

Building on the previous posts (post 1 & post 2) I found there were 12 instances with the type of credit where there was a "Primary*" which means the lender borrowed twice in the same day, in the 2010 q4 data. It would seem simple enough in Excel, ...

Read more »

Global Distribution of Breast Cancer: some initial considerations

April 10, 2013
By
Global Distribution of Breast Cancer: some initial considerations

As mentioned on a previous post, I am interested in analysing if people’s ‘unhealthy’ lifestyle is associated to new cases of cancer diagnosed globally. The outcome variable I want to explore (at least for now), is the number of new cases of breast cancer in 100,000 female residents. I have this data for 173...

Read more »

highlight 0.4.1

April 10, 2013
By

The highlight package has been missing from CRAN for quite some time Now it is back, with fewer dependencies. It used to depend on Rcpp and parser, but since the code logic from parser has been brought to R, highlight … Continue reading →

Read more »

Mobile version of the graph gallery

April 10, 2013
By
Mobile version of the graph gallery

The R Graph Gallery has been a popular website for many years now. The number of graphics keeps growing as people send me their code. When browsing the website with a mobile device the experience was frustrating, as too much … Continue reading →

Read more »

Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda

April 10, 2013
By
Milano (Italy). April 18, 2013. Third Milano R net meeting: agenda

April 18, 2013 - 18:00 - 21:00 Fiori Oscuri Bistrot & Bar (www.fiorioscuri.it) Via Fiori Oscuri, 3 - Milano (Zona Brera) 18.00 - 18.15 Registration 18.15 - 18.30 Welcome presentation Andrea Spanò, Partner at Quantide 18.30 - 19.00 Digit recognition Machine … Continue reading →

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.