RcppGSL 0.2.5

July 13, 2015
By

A new version of RcppGSL arrived on CRAN a couple of days ago. This package provides an interface from R to the GNU GSL using our Rcpp package. In the course of preparation for the higher-performance R via C++ course I gave in Zuerich last month, I o...

Read more »

[paper published] ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization

July 13, 2015
By
[paper published] ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization

My R/Bioconductor package, ChIPseeker, published in Bioinformatics. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization Read More: 992 Words Totally

Read more »

RcppRedis 0.1.4

July 12, 2015
By

A small update to RcppRedis arrived on CRAN a few days ago. One of the unit tests failed as we (still) initialized the rredis package (loaded only for a comparison) in a form long-internalized by Bryan Lewis, its author. No actual changes to function...

Read more »

New package "SparkRext" – SparkR extension for closer to dplyr

July 11, 2015
By

Apache Spark is one of the hottest products in data science.Spark 1.4.0 has formally adopted SparkR package which enables to handle Spark DataFrames on R.SparkR is very useful and powerful.One of the reasons is that SparkR DataFrames present an API sim...

Read more »

Detecting Undercuts in F1 Races Using R

July 11, 2015
By
Detecting Undercuts in F1 Races Using R

One of the things that’s been on my to do list for some time has been the identification of tactical or strategic events within a race that might be detected automatically. One such event is an undercut described by F1 journalist James Allen in the following terms (The secret of undercut and offset): An undercut

Read more »

random 0.2.5

July 11, 2015
By

A few days ago and while we were traveling, an updated release of our random package for truly (hardware-based) random numbers as provided by random.org arrived on CRAN. Brian Ripley had pointed out to us that some of the curl implementations (which ...

Read more »

CDK Literature #8

July 11, 2015
By
CDK Literature #8

Tool validationThe first paper this week is a QSAR paper. In fact, it does some interesting benchmarking of a few tools with a data set of about 6000 compounds. It includes looking into the applicability domain, and studies the error of prediction for compounds inside and outside the chemical space defined by the training set. The paper indirectly uses...

Read more »

Constructing a Word Cloud for ICML 2015

July 10, 2015
By
Constructing a Word Cloud for ICML 2015

Word clouds have become a bit cliché, but I still think that they have a place in giving a high level overview of the content of a corpus. Here are the steps I took in putting together the word cloud for the International Conference on Machine Learning (2015). Extract the hyperlinks to the PDFs of The post

Read more »

R Package to access the Open Movie Database (OMDB) API

July 10, 2015
By
R Package to access the Open Movie Database (OMDB) API

It’s not on CRAN yet, but there’s a devtools-installable R package for getting data from the OMDB API. It covers all of the public API endpoints: find_by_id: Retrieve OMDB info by IMDB ID search find_by_title: Retrieve OMDB info by title search get_actors: Get actors from an omdb object as a vector get_countries: Get countries from

Read more »

In case you missed it: June 2015 roundup

July 10, 2015
By

In case you missed them, here are some articles from June of particular interest to R users. The R Consortium, a trade group dedicated to the support and growth of the R Community, has launched with the R Foundation, Microsoft, RStudio and others as founding members. A detailed FAQ for fitting Generalized Linear Models in R. My presentation on...

Read more »

“Just the text ma’am” – Web Site Content Extraction with XSLT & R

July 9, 2015
By

Sometimes you just need the salient text from a web site, often as a first step towards natural language processing (NLP) or classification. There are many ways to achieve this, but XSLT (eXtensible Stylesheet Language) was purpose-built for slicing, dicing and transforming XML (and, hence, HTML) so, it can make more sense and even be speedier use XSLT...

Read more »

Line plots of longitudinal summary data in R using ggplot2

July 9, 2015
By
Line plots of longitudinal summary data in R using ggplot2

I recently had an email for a colleague asking me to make a figure like this in ggplot2 or trellis in R: As I know more about how to do things in ggplot2, I chose to use that package (if it wasn't obvious from the plot or other posts). Starting Point Cookbook R/) has a

Read more »

Faceted “World Population by Income” Choropleths in ggplot

July 9, 2015
By
Faceted “World Population by Income” Choropleths in ggplot

Poynter did a nice interactive piece on world population by income (i.e. “How Many Live on How Much, and Where”). I’m always on the lookout for optimized shapefiles and clean data (I’m teaching a data science certificate program starting this Fall) and the speed of the site load and the easy availability of the data

Read more »

The Ecology of Local Subspaces: Mixtures of Parochial Views

July 9, 2015
By
The Ecology of Local Subspaces: Mixtures of Parochial Views

No matter where you live, your view of the world is biased and limited, which is the beauty of this magazine cover.As a marketer, of course, all my maps depict, not place, but consumption. For example, in an earlier post I asked, "What apps are on your...

Read more »

Two principles approaches to data visualization

July 9, 2015
By
Two principles approaches to data visualization

Yesterday I spoke at Stat Bytes, our student-run statistical computing seminar. My goal was to introduce two principled frameworks for thinking about data visualization: human visual perception and the Grammar of Graphics. (We also covered some relevant R packages: RColorBrewer, … Continue reading →

Read more »

Top 2 Packages for Newly Hired Data Scientists

July 9, 2015
By

 library(NewCo knowledge)function (X, FUN, ..., ) {FUN <-                                Read the business wires +                                Go to lunch with wide range of people +                                Read the 10-K and maybe 10-Q +                                Find a go-to source for “stupid questions”                 else Ignorant}library(credibility)function (X, FUN, ..., ) {FUN <-                                Double-check all assumptions +                                Underpromise +                                Save counterintuitive findings for last +                                Find a...

Read more »

RTutor: Credit Booms Gone Bust

RTutor: Credit Booms Gone Bust

RTutor: Credit Booms Gone Bust 2015-07-09 12:00:00 Currently, quite a few students here at Ulm University create RTutor problem sets based on economic articles as part of their Bachelor or Master thesis. RTutor is an R package that allows to develop interactive R problem sets, that can be solved in a browser based or markdown based environment. Thomas Clausing has written a very...

Read more »

New Tutorial: Make a Census Explorer!

July 9, 2015
By
New Tutorial: Make a Census Explorer!

Today I am happy to announce a new tutorial I am running titled Make a Census Explorer with Shiny! It is free and will be held July 28 in San Francisco. In the tutorial we will combine R’s Shiny framework for web development with the census-mapping choroplethr package to create a browser-based census explorer. You can see the final app here. Here The post

Read more »

Combining Hadoop, Spark, R, SparkR and Shiny…. and it works :-)

July 9, 2015
By
Combining Hadoop, Spark, R, SparkR and Shiny…. and it works :-)

A long time ago in 1991 I had my first programming course (Modula 2) at the Vrije University in Amsterdam. I spend months behind a terminal with a green monochrome display doing the programming exercises using VI. Do you remeber Shift … Continue reading →

Read more »

Online Dashboards: Eight Helpful Tips You Should Hear From Visualization Experts

July 8, 2015
By
Online Dashboards: Eight Helpful Tips You Should Hear From Visualization Experts

“There is no such thing as information overload, only bad design” - Professor Emeritus of Political Science, Statistics, and Computer Science at Yale University Edward Tufte The number of organizations working on data-driven projects increased by 125% in the past year. 44% of companies tackle big data “all the time.” 82% of executives call big data “important...

Read more »

Working with Sessionized Data 1: Evaluating Hazard Models

July 8, 2015
By

When we teach data science we emphasize the data scientist’s responsibility to transform available data from multiple systems of record into a wide or denormalized form. In such a “ready to analyze” form each individual example gets a row of data and every fact about the example is a column. Usually transforming data into this … Continue reading...

Read more »

Job opening with Morningstar’s behavioral science team

July 8, 2015
By
Job opening with Morningstar’s behavioral science team

Morningstar is looking for an applied behavioral scientist, to help understand and overcome the behavioral obstacles that individuals face to financing their retirement. The post Job opening with Morningstar’s behavioral science team appeared first on Decision Science News.

Read more »

Parallel Computing for Data Science

July 8, 2015
By

Hot off the press, Norman Matloff's book, Parallel Computing for Data Science: With Examples in R, C++ and CUDA  (Chapman and Hall/ CRC Press, 2015) should appeal to a lot of the readers of this blog.The book's coverage is clear from the following chapter titles:1. Introduction to Parallel Processing in R2. Performance Issues: General3. Principles of Parallel...

Read more »

The network structure of CRAN

July 8, 2015
By
The network structure of CRAN

by Andrie de Vries My experience of UseR!2015 drew to an end shortly after I gave a Kaleidoscope presentation discussing "The Network Structure of CRAN". My talk drew heavily on two previous blog posts, Finding the essential R packages using the pagerank algorithm and Finding clusters of CRAN packages using igraph. However, in this talk I went further, attempting...

Read more »

Time series outlier detection (a simple R function)

July 8, 2015
By
Time series outlier detection (a simple R function)

(By Andrea Venturini) Imagine you have a lot of time series – they may be short ones – related to a lot of different measures and very little time to find outliers. You need something not too sophisticated to solve quickly the mess. This is – very shortly speaking – the typical situation in which you can adopt washer.AV()...

Read more »

The Moon And The Sun

July 8, 2015
By
The Moon And The Sun

Do not swear by the moon, for she changes constantly. Then your love would also change (William Shakespeare, Romeo and Juliet) The sun is a big point ant the moon is a cardioid: Here you have the code. It is a simple example of how to use ggplot:

Read more »

Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

July 7, 2015
By
Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)

Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data. This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular

Read more »

Creating a TRULY Interactive Map of Craft Breweries in VA Using the leafletR Package (Guest Blog Post by Dr. Keegan Hines)

July 7, 2015
By
Creating a TRULY Interactive Map of Craft Breweries in VA Using the leafletR Package (Guest Blog Post by Dr. Keegan Hines)

It’s a good feeling when a great friend who is smarter than you offers to write a blog post, for your blog, that’s better than anything you’ve written so far. Friends, colleagues, people who’ve not yet realized they are at the wrong site: please allow me to introduce to you the awe-inspiring Dr. Keegan Hines. The post

Read more »

Bioenergetics in R

July 7, 2015
By
Bioenergetics in R

In the last few months I have received queries about whether a “fish bioenergetics” model has been implemented in R.  Here is what I am aware of. Fish Bioenergtics 4.0 (an update of the Fish Bioenergetics program distributed by Wisconsin Sea … Continue reading →

Read more »