## Supervised Classification, discriminant analysis

March 3, 2015
By

Another popular technique for classification (or at least, which used to be popular) is the (linear) discriminant analysis, introduced by Ronald Fisher in 1936. Consider the same dataset as in our previous post > clr1 <- c(rgb(1,0,0,1),rgb(0,0,1,1)) > x <- c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) > y <- c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) > z <- c(1,1,1,1,1,0,0,1,0,0) > df <- data.frame(x,y,z) > plot(x,y,pch=19,cex=2,col=clr1) The main interest of...

## JASP

March 3, 2015
By

JASP is an interesting project. It is based on R with additional facilities and functions such as automatic table generation. There appears to be an R package that provide that functions directly to R users. Very worth checking out.

## Update on The Pre-FOMC Announcement Drift

March 3, 2015
By

In the February 2015 edition of The Journal of Finance, a well known academic paper, “The Pre-FOMC Announcement Drift”, was finally published, almost 4 years after the working paper was released in the public domain in 2011.Authored by researchers, Lucca and Moench, at the US Federal Reserve, it documents the tendency for the S&P500 Index to rise in the...

## A Linear Congruential Generator (LCG) in R

March 3, 2015
By

In my simulation classes, we talk about how to generate random numbers. One of the techniques we talk about is the Linear Congruential Generator (LCG). Starting with a seed, the LCG produces

## Plotly Graphs with Domino’s New R Notebook

March 3, 2015
By

by Matt Sundquist co-founder of Plotly Domino's new R Notebook and Plotly's R API let you code, make interactive R and ggplot2 graphs, and collaborate entirely online. Here is the Notebook in action: Published R Notebook To execute this Notebook, or to build your own, head to Domino's Plotly Project. The GIF below shows how to get started: choose...

## Google Summer of Code 2015

March 3, 2015
By

The R Project has once again been selected as a mentoring organization for this year's Google Summer of Code (GSoC).  If you're not familiar with GSoC, it's a global program that offers students a stipend to write code for open source projects, under the direction of a mentor.  Mentors get code written for their project, but no...

## Mapping Paris bikes stands

March 3, 2015
By

A Sharp Sight Labs reader (and now student), Jason P. recently started learning data science. He has a background in data analysis (primarily with Excel and related tools in the Microsoft ecosystem) but he wanted to start learning some of the harder skills of data science. He contacted me after he had diligently reviewed past The post

## Next Kölner R User Meeting: Friday, 6 March 2014

March 3, 2015
By

The next Cologne R user group meeting is scheduled for this Friday, 6 March 2014 and we have an exciting agenda with two talks, followed by networking drinks:Using R in Excel via R.NETGünter Faes and Matthias SpixMS Office and Excel are the 'de-facto' standards in many industries. Using R with Excel offers an opportunity...

## Supervised Classification, Logistic and Multinomial

March 2, 2015
By

We will start, in our Data Science course,  to discuss classification techniques (in the context of supervised models). Consider the following case, with 10 points, and two classes (red and blue) > clr1 <- c(rgb(1,0,0,1),rgb(0,0,1,1)) > clr2 <- c(rgb(1,0,0,.2),rgb(0,0,1,.2)) > x <- c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) > y <- c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) > z <- c(1,1,1,1,1,0,0,1,0,0) > df <- data.frame(x,y,z) > plot(x,y,pch=19,cex=2,col=clr1) To get...

March 2, 2015
By

For the newcomers; scheduleR is a framework to deploy/schedule R tasks, reports and Shiny apps. The tool has an integrated logging and notification system to ease the maintenance of scheduled R related jobs. After a lot of refactoring the tasks have been separated into tasks (e.g. ETL scripts) and reports (rmarkdown). The back-end that handles the execution of R...

## ComputerWorld’s R for Beginners Hands-On Guide

March 2, 2015
By

Computerworld's Sharon Machlis has done a great service for the R community — and R especially novices — by creating the on-line Beginner's Guide to R. You can read our overview of her guide from 2013 here, but it's been regularly updated since then. As an added bonus, the guide is now available as a downloadable PDF for your...

## At the APS Observer: a profile of JASP

March 2, 2015
By

The APS Observer has just published a profile of JASP, a graphical user interface designed to make statistics easier. It includes Bayesian procedures by means of the R and the BayesFactor package. From the article: JASP distinguishes itself from S...

## Experiments in Time Series Clustering

March 2, 2015
By

Last night I spotted this tweet about the R package TSclust. Thank you Pablo and Jose for #TSclust - time series clustering package in #rstats ! http://t.co/GBQtQnQ8Lr— Pasha Roberts (@pasharoberts) March 2, 2015 I should start by saying that I really don’t know what I’m doing, so be warned.  I thought it would interesting to apply TSclust to...

## So What Can Text Analysis Do for You?

March 2, 2015
By

Despite believing we can treat anything we can represent in digital form as “data”, I’m still pretty flakey on understanding what sorts of analysis we can easily do with different sorts of data. Time series analysis is one area – the pandas Python library has all manner of handy tools for working with that sort

## Electric Power System simulations using R

March 2, 2015
By

This is a guest post by Ben Ubah. The field of electric power systems engineering relies heavily on computer simulations for analysis because of its nature. These computer simulations aid the planning, operation and management of the system. Computer simulations have been implemented using several scientific computing tools. However, I have not yet seen any implementations using R. This inspired my...

## Silhouettes

March 2, 2015
By

Romeo, Juliet, balcony in silhouette, makin o’s with her cigarette, it’s juliet (Flapper Girl, The Lumineers) Two weeks ago I published this post for which designed two different visualizations. At the end, I decided to place words on the map of the United States. The discarded visualization was this other one, where I place the words over the silhouette … Continue reading...

## R Markdown Tutorial by RStudio and DataCamp

March 1, 2015
By

In collaboration with Garrett Grolemund, RStudio’s teaching specialist, DataCamp has developed a new interactive course to facilitate reproducible reporting of your R analyses. R Markdown enables you to generate reports straight from your R code, documenting your works as an HTML, pdf or Microsoft document. This course is part of DataCamp’s R training path, but can The post

## Using Tables for Statistics on Large Vectors

March 1, 2015
By
$Using Tables for Statistics on Large Vectors$

This is the first post I’ve written in a while. I have been somewhat radio silent on social media, but I’m jumping back in. Now, I work with brain images, which can have millions of elements (referred to as voxels). Many of these elements are zero (for background). We want to calculate basic statistics on

## drat 0.0.2: Improved Support for Lightweight R Repositories

March 1, 2015
By

A few weeks ago we introduced the drat package. Its name stands for drat R Archive Template, and it helps with easy-to-create and easy-to-use repositories for R packages. Two early blog posts describe drat: First Steps Towards Lightweight Repositorie...

## Should I use premium Diesel? Setup

March 1, 2015
By

Since I drive quite a lot, I have some interest in getting the most km out every Euro spent on fuel. One thing to change is the fuel. The oil companies have a premium fuel, which is supposed to be better for both engine and fuel consumption. On the oth...

## DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis

February 28, 2015
By

My R/Bioconductor package, DOSE, published in Bioinformatics. Summary: Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions...

## Book Review: Mastering Scientific Computing with R

February 28, 2015
By

PACKT marketing guys again contact me to review their new book Mastering Scientific Computing with R.  The book 432 pages (including covers) book is consist of 10 chapters which starts from basic R and ends with advanced data management. However, ...

## One weird trick to compile multipartite dynamic documents with Rmarkdown

February 28, 2015
By

This afternoon I stumbled across this one weird trick an undocumented part of the YAML headers that get processed when you click the ‘knit’ button in RStudio. Knitting turns an Rmarkdown document into a specified format, using the rmarkdown package’s render function to call pandoc (a universal document converter written in Haskell). If you...

## Playing around with #rstats twitter data

As a bit of weekend fun, I decided to briefly look into the #rstats twitter data that Stephen Turner collected and made available (thanks!). Essentially, this data set contains some basic information about over 100,000 tweets that contain the hashtag… Continue reading →

## Tools in Tandem – SQL and ggplot. But is it Really R?

February 28, 2015
By

Increasingly I find that I have fallen into using not-really-R whilst playing around with Formula One stats data. Instead, I seem to be using a hybrid of SQL to get data out of a small SQLite3 datbase and into an R dataframe, and then ggplot2 to render visualise it. So for example, I’ve recently been

## Scalable Machine Learning for Big Data Using R and H2O

February 28, 2015
By

Part I Part II H2O is an open source parallel processing engine for machine learning on Big Data. This prediction engine is designed by, h20, a Mountain View-based startup that has implemented a number of impressive statistical and machine learning algorithms to run on HDFS, S3, SQL and NoSQL. We were honored to have Tom Kraljevic (Vice President of...

## RcppEigen 0.3.2.4.0

February 28, 2015
By

A new release of RcppEigen is now on CRAN and in Debian. It synchronizes the Eigen code with the 3.2.4 upstream release, and updates the RcppEigen.package.skeleton() package creation helper to use the kitten() function from pkgKitten for enhanced pac...

## John Snow, and Google Maps

February 27, 2015
By

In my previous post, I discussed how to use OpenStreetMaps (and standard plotting functions of R) to visualize John Snow’s dataset. But it is also possible to use Google Maps (and ggplot2 types of graphs). library(ggmap) get_london <- get_map(c(-.137,51.513), zoom=17) london <- ggmap(get_london) Again, the tricky part comes from the fact that the coordinate representation system, here, is not...