Following up on news stories with choroplethr and R

August 25, 2015
By
Following up on news stories with choroplethr and R

by Ari Lamstein, consultant specializing in software engineering and data analysis and author of the free email course Learn to Map Census Data in R. One of my favorite things about R is that it allows me to follow up on interesting news stories. Consider this interview on EconTalk about the history of fracking in America. Russ Roberts interviewed...

Read more »

Visualising the predictive distribution of a log-transformed linear model

August 25, 2015
By
Visualising the predictive distribution of a log-transformed linear model

Last week I presented visualisations of theoretical distributions that predict ice cream sales statistics based on linear and generalised linear models, which I introduced in an earlier post.Theoretical distributionsToday I will take a closer look at t...

Read more »

R Graphical Systems

August 24, 2015
By
R Graphical Systems

Just had an article published over at Simple Talk about the 3 R Major R Graphical Systems:https://www.simple-talk.com/content/article.aspx?article=2271

Read more »

email graphs

August 24, 2015
By
email graphs

Communication between individuals working on a group project is commonly carried over email, and in-person meetings tend to be preceded by an emailed agenda, and followed by emailed minutes. Projects organized around GitHub or similar systems tend also to have email updates for issue reports, etc. All of this means that a graph of email timing can be helpful in indicating...

Read more »

PDQ Version 6.2.0 Released

August 24, 2015
By
PDQ Version 6.2.0 Released

PDQ (Pretty Damn Quick) is a FOSS performance analysis tool based on the paradigm of queueing models that can be programmed natively in R Python Perl C and several other languages. This minor release is now available for download. If you're new to PDQ, here's a simple queueing model...

Read more »

How R is used at Zillow to estimate housing values

August 24, 2015
By

Zillow, the leading real estate and rental marketplace in the USA, uses R to estimate housing values. Zillow's signature product is the Zestimate, their estimated market value for individual homes, and it's calculated using R in a parallel batch job for 100 million homes nationwide. The process is described in this Slideshare presentation Data Science At Zillow by Nicholas...

Read more »

New Pacakge “docxtractr” – Easily Extract Tables From Microsoft Word Docs

August 24, 2015
By
New Pacakge “docxtractr” – Easily Extract Tables From Microsoft Word Docs

This is more of a follow-up from yesterday’s post. The hack and function in said post was fine, but it was limited to uniform tables and made you do more work than you had to. So, there’s now a devtools-installable package on github that makes it way easier to get information about the tables in

Read more »

RStudio adds a new Starter Plan, More Active Hours, and a Performance Boost to shinyapps.io

August 24, 2015
By
RStudio adds a new Starter Plan, More Active Hours, and a Performance Boost to shinyapps.io

Five months ago we launched shinyapps.io. Since then, more than 25,000 accounts have been created and countless Shiny applications have been deployed. It’s incredibly exciting to see! It’s also given us lots of data and feedback on how we can make shinyapps.io better. Today, we’re happy to tell you about some changes to our subscription

Read more »

RcppDE 0.1.3

August 24, 2015
By

A pure maintenance release 0.1.3 of the RcppDE package arrived on CRAN yesterday. RcppDE is a "port" of DEoptim, a popular package for derivative-free optimisation using differential optimization, to C++. By using RcppArmadillo, the code becomes a lot shorter and more legible. This version simply fixes a typo in the vignette metadata noticed by Kurt, and...

Read more »

Changing the font of R base graphic plots.

August 24, 2015
By
Changing the font of R base graphic plots.

Want to change the font used in your R plots? I got a quite simple solution that works on Mac OS.You need the function 'quartzFonts'. With this function, you can define additional font families to use in your R base graphic plots. The default font fami...

Read more »

Canberra Data Miners: Seminar on Text, Knowledge and Information Extraction, by Dr Lizhen Qu (NICTA), Canberra, 4:30-5:30pm, Tuesday 1 Sept

August 24, 2015
By
Canberra Data Miners: Seminar on Text, Knowledge and Information Extraction, by Dr Lizhen Qu (NICTA), Canberra, 4:30-5:30pm, Tuesday 1 Sept

Topic: Text, Knowledge, and Information Extraction Speaker: Dr. Lizhen Qu, Researcher at NICTA Organizer: Canberra Data miners Meetup Group Date and time: 4:30-5:30pm, Tuesday 1 Sept Location: Teal Room of Inspire Centre, University of Canberra, Building 25, University of Canberra, … Continue reading →

Read more »

labels.dendrogram in R 3.2.2 can be ~70 times faster (for trees with 1000 labels)

August 23, 2015
By
labels.dendrogram in R 3.2.2 can be ~70 times faster (for trees with 1000 labels)

The recent release of R 3.2.2 came with a small (but highly valuable) improvement to the stats:::labels.dendrogram function. When working with dendrograms with (say) 1000 labels, the new function offers a 70 times speed improvement over the version of the function from R 3.2.1. This speedup is even better than the Rcpp version of labels.dendrogram … Continue reading...

Read more »

Using R To Get Data *Out Of* Word Docs

August 23, 2015
By
Using R To Get Data *Out Of* Word Docs

This was asked on twitter recently: Is it possible to import data entered in MS Word into R – I have multiple tables in 235 files that need importing #rstats— Richard Telford (@richardjtelford) August 23, 2015 The answer is a very cautious “yes”. Much depends on how well-formed and un-formatted the table is. Take this

Read more »

Predicting Titanic deaths on Kaggle IV: random forest revisited

August 23, 2015
By
Predicting Titanic deaths on Kaggle IV: random forest revisited

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...

Read more »

Modern Honey Network Machinations with R, Python, phantomjs, HTML & JavaScript

August 23, 2015
By
Modern Honey Network Machinations with R, Python, phantomjs, HTML & JavaScript

This was (initially) going to be a blog post announcing the new mhn R package (more on what that is in a bit) but somewhere along the way we ended up taking a left turn at Albuquerque (as we often do here at ddsec hq) and had an adventure in a twisty maze of Modern Honey Network...

Read more »

Examining Email Addresses in R

August 22, 2015
By
Examining Email Addresses in R

I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would I perform that task in R? Here’s some

Read more »

Pricing Game

August 22, 2015
By

In November, with Romuald Elie and Jérémie Jakubowicz, we will organize a session during the 100% Actuaires day, in Paris, based on a “pricing game“. We provide two datasets, (motor insurance, third party claims), with 2  years of experience, and 100,000 policies. Each ‘team’ has to submit premium proposal for 36,000 potential insured for the third year (third party, material + bodily injury). We will work as a ‘price...

Read more »

Where Does the S&P 500 Stand?

August 22, 2015
By
Where Does the S&P 500 Stand?

Last week was brutal for pretty much all markets. Surprisingly, it was bad even for the US dollar. The sharp and straight downward move was reminiscent of the descent of 2011. It’s time to review where does the major index stands from technical point of view. Let’s start with a visual inspection. Clearly the 200-day The post

Read more »

Analysing longitudinal data: Multilevel growth models (I)

August 22, 2015
By
Analysing longitudinal data: Multilevel growth models (I)

Last time we have discussed the two formats of longitudinal data and visualised the individual growth trajectories using an imaginary randomised controlled trial data-set. But could we estimate the overall trajectory of the outcomes along time and see if it’s increasing, decreasing, or stable? Yes, of course, we could estimate that in multilevel growth models

Read more »

Partools 1.1.4

August 21, 2015
By
Partools 1.1.4

Partools 1.1.4 is now on GitHub. The main change this time is enhancement of the debugging facilities (which work not only for partools but also the cluster-based portion of R’s parallel package in general). As some of you know, I place huge importance on debugging, so much so that I wrote a book on it … Continue reading...

Read more »

Comparing World Ocean Atlases 2013 and 2013v2

August 21, 2015
By
Comparing World Ocean Atlases 2013 and 2013v2

Introduction The ocedata package provides data that may be of use to oceanographers, either working with their own R code or working with the oce package . One such dataset, called levitus, holds sea surface temperature and salinity (SST and SSS), based on the 2013 version of the World Ocean Atlas. An updated version of this atlas is suggested by the WOA...

Read more »

Function Argument Lists and missing()

August 21, 2015
By

This entry is part 17 of 17 in the series Using RSometimes it is useful to write a wrapper function for an existing function. In this short example we demonstrate how to grab the list of arguments passed to a …   read more ...

Read more »

A better interactive neuroimage plotter in R

August 21, 2015
By
A better interactive neuroimage plotter in R

In a previous post, I described how you can interactively explore a 3D nifti object in R. I used the manipulate package, but the overall results were sluggish and not really usable. I was introduced to a a good neuroimaging viewer called Mango, by a friend or two and use it somewhat inconsistently. One major

Read more »

Free edX course for R beginners

August 21, 2015
By
Free edX course for R beginners

If you've thought about learning the R language but didn't know how to start, there's a new, free course on edX that starts you from the R basics and lets you learn R by trying R as you go. Presented by DataCamp and Microsoft, the course starts from the very basics of R (arithmetic on the command line, creating...

Read more »

RTutor: How Soap Operas Reduced Fertility in Brazil

August 21, 2015
By
RTutor: How Soap Operas Reduced Fertility in Brazil

What is the real world impact of tv series? Did Brazilian women get fewer children because they watched soap operas that portray happy, rich families that have few children? Clara Ulmer has written a very nice RTutor problem set that allows you interactively explore this question in R. It is based on the paper Soap Operas and Fertility: Evidence...

Read more »

R courses on basic R, advanced R, statistical machine learning with R, text mining with R, spatial modelling with R and R package building

R courses on basic R, advanced R, statistical machine learning with R, text mining with R, spatial modelling with R and R package building

Waw, our course list for teaching R is getting bigger and bigger. We have now courses on basic, R, advanced R, R package building, statistical machine learning with R, text mining with R and spatial analysis with R. All face-to-face courses given in ...

Read more »

Normality tests for continuous data

August 21, 2015
By
Normality tests for continuous data

We use normality tests when we want to understand whether a given sample set of continuous (variable) data could have come from the Gaussian distribution (also called the normal distribution). Normality tests are a form of hypothesis test, which is used to make an inference about the population from which we have collected a sample

Read more »

Doh! I Could Have Had Just Used V8!

August 21, 2015
By

An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like “David Regan” to more complex & diverse such as “John Smith Jr.”, “Izaque Iuzuru Nagata” or “Christian Schmit de la Breli”. Despite the

Read more »

functional enrichment analysis with NGS data

August 20, 2015
By
functional enrichment analysis with NGS data

I found that there is a Bioconductor package, seq2pathway, that can apply functional analysis to NGS data. It consists of two components, seq2gene and gene2pathway. seq2gene converts genomic coordination to genes while gene2pathway performs functional analysis at gene level. Read More: 1007 Words Totally

Read more »