tidyr 0.6.0

August 15, 2016
By
tidyr 0.6.0

I’m pleased to announce tidyr 0.6.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install

Read more »

Probably the most useful R function I’ve ever written

August 15, 2016
By

The function in question is scriptSearch. I’m not much for superlatives — “most” and “best” imply one dimension, but we live in a multi-dimensional world. I’m making an exception. The statistic I have in mind for this use of “useful” is the waiting time between calls to the function divided by the human time saved The post

Read more »

The inexorable growth of student debt, charted with R

August 15, 2016
By
The inexorable growth of student debt, charted with R

Len Kiefer, Deputy Chief Economist at Freddie Mac, recently published the following chart to his personal blog showing household debt in the United States (excluding mortgage debt). As you can see, student loan debt has steadily increased over the last 13 years and has now eclipsed all other forms of non-mortgage debt: He also created this animated chart showing...

Read more »

Can you nest parallel operations in R?

August 15, 2016
By

When we teach parallel programming in R we start with the basic use of parallel (please see here for example). This is, in our opinion, a necessary step before getting into clever notation and wrapping such as doParallel and foreach. Only then do the students have a sufficiently explicit interface to frame important questions about … Continue...

Read more »

Dates and Times – Simple and Easy with lubridate exercises (part 1)

August 15, 2016
By
Dates and Times – Simple and Easy with lubridate exercises (part 1)

As in any programming language, handling date and time variables can be quite frustrating, since, for example, there is no one single format for dates, there are different time zones and there are issues such as daylight saving time. Base R provides several packages for handling date and time variables, but they require mastering cumbersome

Read more »

colourpicker: A colour picker widget for Shiny apps, RStudio, R-markdown, and ‘htmlwidgets’

August 15, 2016
By
colourpicker: A colour picker widget for Shiny apps, RStudio, R-markdown, and ‘htmlwidgets’

Have you ever wanted to allow your users to select colours in your Shiny apps? Have you ever wanted to select a few colours to use in your R code, but found it tedious to search for the right colours? If you answered yes to any of those quest...

Read more »

A Shiny App for Passing Bablok and Deming Regression

A Shiny App for Passing Bablok and Deming Regression

Background Back in 2011 I was not aware of any tool in R for Passing Bablok (PB) regression, a form of robust regression described in a series of three papers in Clinical Chemistry and Laboratory Medicine (then J Clin Chem and Biochem) available here, here and here. For reasons that are not entirely clear to … Continue...

Read more »

Result of the mlr summer workshop in Palermo

August 14, 2016
By
Result of the mlr summer workshop in Palermo

The mlr developer team is quite international: Germany, USA, Canada. The time difference between these countries sometimes makes it hard to communicate and develop new features. The idea for this workshop or sprint was to have the possibility to talk ...

Read more »

rfoaas 1.0.0

August 14, 2016
By
rfoaas 1.0.0

The big 1.0.0 is here! Following the unsurpassed lead of the FOAAS project, we have arrived at a milestone: Release 1.0.0 is now on CRAN. The rfoaas package provides an interface for R to the most excellent FOAAS service--which itself provides a mod...

Read more »

Chi-Squared Test

August 14, 2016
By
Chi-Squared Test

Before we build stats/machine learning models, it is a good practice to understand which predictors are significant and have an impact on the response variable. In this post we deal with a particular case when both your response and predictor are categorical variables. By the end of this you’d have gained an understanding of what Related Post

Read more »

Simulating local community dynamics under ecological drift

August 14, 2016
By
Simulating local community dynamics under ecological drift

In 2001 the book by Stephen Hubbell on the neutral theory of biodiversity was a major shift from classical community ecology. Before this book the niche-assembly framework was dominating the study of community dynamics. Very briefly under this framework local species composition is the result of the resource available at a particular site and species

Read more »

Gaussian predictive process models in Stan

August 14, 2016
By
Gaussian predictive process models in Stan

Gaussian process (GP) models are computationally demanding for large datasets. Much work has been done to avoid expensive matrix operations that arise in parameter estimation with larger datasets via sparse and/or reduced rank covariance matrices (Datta et al. 2016 provide a nice review). What follows is an implementation of a spatial Gaussian predictive process Poisson GLM in Stan, following...

Read more »

Labeling Opportunities in Price Series

August 13, 2016
By
Labeling Opportunities in Price Series

One approach to trading which has been puzzling me lately, is to sit and wait for opportunities. 🙂 Sounds simplistic, but it is indeed different than, for instance, the asset allocation strategies. In order to be able to even attempt taking advantage of these opportunities, however, we must be able to identify them. Once the The post

Read more »

Introduction to LabKey and R Integration

August 12, 2016
By
Introduction to LabKey and R Integration

How and What to deliver are two main themes of my journey to look for an effective way of developing data products. For the former, decent web technologies encompassing HTML, CSS and Javascript are important. Read More ...

Read more »

Tuning Apache Spark for faster analysis with Microsoft R Server

August 12, 2016
By
Tuning Apache Spark for faster analysis with Microsoft R Server

My colleagues Max Kaznady, Jason Zhang, Arijit Tarafdar and Miguel Fierro recently posted a really useful guide with lots of tips to speed up prototyping models with Microsoft R Server on Apache Spark. These tips apply when using Spark on Azure HDInsight, where you can spin up a Spark cluster the cloud with Microsoft R installed on the head...

Read more »

Combinations Exercises

August 12, 2016
By
Combinations Exercises

When doing data analysis it happens often that we have a set of values and want to obtain various possible combinations of them. For example, taking 5 random samples from a dataset of 20. How many possible 5-sample sets are there and how to obtain all of them? R has a bunch of functions that

Read more »

Project package libraries and reproducibility

August 12, 2016
By

Gábor Csárdi, Consultant, Mango Solutions Introduction If you are an R user it has probably happened to you that you upgraded some R package in your R installation, and then suddenly your R script or application stopped working. R packages … Continue reading →

Read more »

Annotating sets of genomic intervals with genomic annotations such as chromHMM

August 12, 2016
By
Annotating sets of genomic intervals with genomic annotations such as chromHMM

Annotating sets of genomic intervals with genomic annotations such as chromHMM Genomation is an R package to summarize, annotate and visualize genomic intervals. It contains a collection of tools for visualizing and analyzing genome-wide data sets, i.e....

Read more »

Effluent Nutrient Concentrations by Waste Water Treatment Type: A Shiny App

August 12, 2016
By
Effluent Nutrient Concentrations by Waste Water Treatment Type: A Shiny App

In 2014, EPA documented the relative lack of nutrient data from waste water treatment plant effluents, even though development of surface water quality standards for nitrogen and phosphorus has been a stated priority for more than a decade.   A new shiny app lets users explore effluent nutrient concentrations from an existing data...

Read more »

Elastic net regularization of a model of burned calories

August 12, 2016
By
Elastic net regularization of a model of burned calories

Deal with feature selection and collinearity Recently I’ve been making more use of elastic net regularization as a way of fitting linear models to data when I have more candidate explanatory variables than I know what to do with and some of them are...

Read more »

Visualizing multiple dimensions of Google Analytics in R

August 12, 2016
By
Visualizing multiple dimensions of Google Analytics in R

Stack bar chart  It is used when we wish to visualize a combination of categorical variables ggplot(data, aes(date, fill = region)) + geom_bar()+ labs(title = "Stacked Bar Chart", x = "Date", y = "Count of Regions") where date : ga:date and region : ga:region Example ggplot(train, aes(Outlet_Location_Type, fill = Outlet_Type)) + geom_bar()+ labs(title = "Stacked... Read MoreThe...

Read more »

Google Analytics makes Demo Account available to all

August 11, 2016
By
Google Analytics makes Demo Account available to all

Playing with GA data is much much easier now.Last week biggest news was definitely Google making a Demo Google Analytics Account available to everyone. As the word "demo" says, the main purpose is demonstrating all the features and reports GA offers, and become a learning platform for analysts. But it´s actually real numbers! All the data available come from...

Read more »

Handling required and missing R packages in Microsoft R Services

August 11, 2016
By

I have seen several time, that execute R code using procedure sp_execute_external_script was not valid due to missing library or library dependencies. Problem is – in general – not solved out of the box. But can be solved using and maintaining a list of installed libraries used by Microsoft R services or by simply create

Read more »

R Packages for Data Access

August 11, 2016
By
R Packages for Data Access

by Joseph Rickert Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that,...

Read more »

Shorting at High: Algo Trading Strategy in R

August 11, 2016
By
Shorting at High: Algo Trading Strategy in R

By Milind Paradkar Milind began his career in Gridstone Research, building earnings models and writing earnings notes for NYSE listed companies, covering Technology and REITs sectors. Milind has also worked at CRISIL and Deutsche Bank, where he was involved in modeling of Structured Finance deals covering Asset Backed Securities (ABS), and Collateralized Debt Obligations (CDOs)... The post Shorting...

Read more »

Plotting background data for groups with ggplot2

August 11, 2016
By
Plotting background data for groups with ggplot2

This tweet by mikefc alerted me to a mind-blowingly simple but amazing trick using the ggplot2 package: to visualise data for different groups in a facetted plot with all of the data plotted in the background. Here’s an example that we’ll learn to make in this post so you know what I’m talking about:

Read more »

Benchmarking mlr (default) learners on OpenML

August 10, 2016
By
Benchmarking mlr (default) learners on OpenML

There are already some benchmarking studies about different classification algorithms out there. The probably most well known and most extensive one is the Do we Need Hundreds of Classifers to Solve Real World Classication Problems? paper. They use d...

Read more »

Exploring Learner Predictions with Partial Dependence and Functional ANOVA

August 10, 2016
By
Exploring Learner Predictions with Partial Dependence and Functional ANOVA

Learners use features to make predictions but how those features are used is often not apparent. mlr can estimate the dependence of a learned function on a subset of the feature space using generatePartialDependenceData. Partial dependence plots reduce the potentially high dimensional function estimated by the learner, and display a marginalized version of this function in a lower dimensional space....

Read more »

Creating Pretty Documents with the prettydoc Package

August 10, 2016
By
Creating Pretty Documents with the prettydoc Package

Have you ever tried to find a lightweight yet nice theme for the R Markdown documents, like this page? Themes for R Markdown With the powerful rmarkdown package, we could easily create nice HTML document by adding some meta information in the he...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.