A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Discount R courses at Simplilearn

January 28, 2016
By

Guest post by Simplilearn Simplilearn is offering access to its R Language courses at reduced prices. The offer is good till 7th Feb, 2016 with the coupon: GetAhead Check out the R-courses they offer: Certified Data Scientist with R Language At the end of the training, you will be technically competent in key R programming language concepts such as data visualization...

Read more »

love-hate Metropolis algorithm

January 27, 2016
By
love-hate Metropolis algorithm

Hyungsuk Tak, Xiao-Li Meng and David van Dyk just arXived a paper on a multiple choice proposal in Metropolis-Hastings algorithms towards dealing with multimodal targets. Called “A repulsive-attractive Metropolis algorithm for multimodality” . The proposal distribution includes a downward

Read more »

In-depth analysis of Twitter activity and sentiment, with R

January 27, 2016
By
In-depth analysis of Twitter activity and sentiment, with R

Astronomer and budding data scientist Julia Silge has been using R for less than a year, but based on the posts using R on her blog has already become very proficient at using R to analyze some interesting data sets. She has posted detailed analyses of water consumption data and health care indicators from the Utah Open Data Catalog,...

Read more »

Materials for NYU Shortcourse “Data Science and Social Science”

January 27, 2016
By

Pablo Barberá, Dan Cervone, and I prepared a short course at New York University on Data Science and Social Science, sponsored by several institutes at NYU. The course was intended as an introduction to R and basic data science tasks, including data visualization, social network analysis, textual analysis, web scraping, and APIs. The workshop is geared… Continue reading →

Read more »

Intro to Sound Analysis with R

January 27, 2016
By

Guest post by Christopher Johnson from www.codeitmagazine.com   Some of my articles cover getting started with a particular software, and some cover tips and tricks for seasoned users.  This article, however, is different.  It does demonstrate the usage of an R package, but the main purpose is for fun. In an article in Time, Matt Peckham described how French researchers...

Read more »

How To Import Data Into R – New Course

January 26, 2016
By
How To Import Data Into R – New Course

Importing your data into R to start your analyses: it should be the easiest step. Unfortunately, this is almost never the case. Data is stored in all sorts of formats, ranging from from flat files to other statistical software files to databases and web data. A skilled data scientist knows which techniques to use to in order to...

Read more »

R typos

January 26, 2016
By
R typos

At MCMskv, Alexander Ly (from Amsterdam) pointed out to me some R programming mistakes I made in the introduction to Metropolis-Hastings algorithms I wrote a few months ago for the Wiley on-line encyclopedia! While the outcome (Monte Carlo posterior) of the corrected version is moderately changed this is nonetheless embarrassing! The example (if not the

Read more »

Conditional execution exercises

January 26, 2016
By
Conditional execution exercises

In the exercises below we cover the basics of conditional execution. In all previous exercises, the solutions required one or more R statements that were all executed consecutively. In this series of exercises we’re going to use the if, else and ifelse functions, to execute only a subset of the R script, depending on one

Read more »

“Introduction to Data Science” video course contest is closed

January 26, 2016
By

Congratulations to all the winners of the Win-Vector “Introduction to Data Science” Video Course giveaway! We’ve emailed all of you your individual subscription coupons. Even though this contest is over, we still encourage those interested to join our mailing list. Our updates to the list will be infrequent, but (we hope) informative. For fun, we … Continue reading...

Read more »

Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

January 26, 2016
By
Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

TL;DR Click here to play a puzzle game written entirely in Shiny (source code). Anyone who reads my blog posts knows by now that I’m very enthusiastic about Shiny (the web app framework for R - if you didn’t know what Shiny is then I suggest reading my previous post about it). One of my reasons for...

Read more »

Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

January 26, 2016
By
Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

Anyone who reads my blog posts knows by now that I’m very enthusiastic about Shiny (the web app framework for R - if you didn’t know what Shiny is then I suggest reading my previous post about it). One of my reasons for liking Shiny so much is that you can do so much more with it than...

Read more »

Pipelining R and Python in Notebooks

January 26, 2016
By
Pipelining R and Python in Notebooks

by Micheleen Harris Microsoft Data Scientist As a Data Scientist, I refuse to choose between R and Python, the top contenders currently fighting for the title of top Data Science programming language. I am not going to argue about which is better or pit Python and R against each other. Rather, I'm simply going to suggest to play to...

Read more »

Linear regression with random error giving EXACT predefined parameter estimates

January 26, 2016
By
Linear regression with random error giving EXACT predefined parameter estimates

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope). It does so by: 1)

Read more »

Launching Data Science Africa Blog

January 26, 2016
By

We are glad to announce the launch of datascience-africa.org as a blog that aggregates all the events, news and information impacting the data science community in some of the major cities in Africa. Our community has witnessed the birth and steady growth of several data science meetup groups with a very enthusiastic group of devoted members. We are a community of data...

Read more »

Bayesian regression with STAN Part 2: Beyond normality

January 26, 2016
By
Bayesian regression with STAN Part 2: Beyond normality

In a previous post we saw how to perform bayesian regression in R using STAN for normally distributed data. In this post we will look at how to fit non-normal model in STAN using three example distributions commonly found in empirical data: negative-binomial (overdispersed poisson data), gamma (right-skewed continuous data) and beta-binomial (overdispersed binomial data).

Read more »

Flowing triangles

January 26, 2016
By
Flowing triangles

I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise. Yet, the actual painting can be...

Read more »

How to create confounders with regression: a lesson from causal inference

January 25, 2016
By
How to create confounders with regression: a lesson from causal inference

By Ben Ogorek Introduction Regression is a tool that can be used to address causal questions in an observational study, though no one said it would be easy. While this article won't close the vexing gap between correlation and causation, it will offer specific advice when you're after a causal truth - keep an eye out for...

Read more »

high dimension Metropolis-Hastings algorithms

January 25, 2016
By
high dimension Metropolis-Hastings algorithms

When discussing high dimension models with Ingmar Schüster Schuster the other day, we came across the following paradox with Metropolis-Hastings algorithms. If attempting to simulate from a multivariate standard normal distribution in a large dimension, when starting from the mode of the target, i.e., its mean γ, leaving the

Read more »

American Community Survey analyzed with R

January 25, 2016
By
American Community Survey analyzed with R

The American Community Survey, conducted by the US Census Bureau, collects data from around 3.5 million households each year in order to estimate various demographic statistics of the US population, including appliances installed in the home, languages spoken, work experience and much more (here's the complete data dictionary). The data science competition platform Kaggle recently introduced a library of...

Read more »

Mapping US Religion Adherence by County in R

January 25, 2016
By
Mapping US Religion Adherence by County in R

Today’s guest post is by Julia Silge. After reading Julia’s analysis of religions in America (“This is the Place, Apparently“) I invited her to teach my readers how to map information about US Religious Adherence by County in R. Julia can be found blogging here or on Twitter. I took Ari’s free email course for The post

Read more »

11 new R jobs from around the world (2016-01-25)

January 25, 2016
By
11 new R jobs from around the world (2016-01-25)

This is the bi-monthly R-bloggers post (for 2016-01-25) for new R Jobs. To post your R job on the next post Just visit this link and post a new R job to the R community (it’s free and quick). New R jobs Freelance R Analytics Consultant Evergreen Retail – Posted by EvergreenRetail Anywhere 24 Jan2016 Full-Time R Programming Software Engineer III @ Princeton, New Jersey USA sdemaree...

Read more »

Do basic R operations much faster in bash [Slightly off-topic]

January 25, 2016
By

R is great, and you can do a LOT OF stuff with it.However, sometimes you want to do really basic stuff with huge or a lot of files. At work, I have to do that a lot because I am mostly dealing with language data that often needs some pre-processing.Mos...

Read more »

I will survive!

January 25, 2016
By

Here's a very long post, to make up for the recent silence on the blog... Lately, I've been working on a new project involving the use of survival analysis data and results, specifically for health economic evaluation (cue Cake's rendition below).I hav...

Read more »

analyze the pesquisa nacional de saude (pns) with r

January 25, 2016
By

starting in 1988, the brazilian institute of geography and statistics (ibge) quinquennially included a health supplement questionnaire alongside their annual pesquisa nacional de domicilios (pnad) to monitor public health and inform the ongoing debate ...

Read more »

RcppExamples 0.1.7

January 24, 2016
By

After an usually long hiatus, the RcppExamples package has been updated once more: a new version 0.1.7 is now on CRAN. The RcppExamples provides a handful of short examples detailing by concrete working examples how to set up basic R data structures ...

Read more »

More Fun with Choropleth Maps

January 24, 2016
By
More Fun with Choropleth Maps

I have a guest post up today at Ari Lamstein’s blog where I show some more fun things that can be done with the Religious Congregations and Membership Study at the ARDA that I used to look at Utah. I looked in some detail at Iowa ahead of their caucus in a few days, in light of...

Read more »

The ‘rsvg’ Package: High Quality Image Rendering in R

January 24, 2016
By
The ‘rsvg’ Package: High Quality Image Rendering in R

The new rsvg package renders (vector based) SVG images into high-quality bitmap arrays. The resulting image is an array of 3 dimensions: height * width * 4 (RGBA) and can be written to png, jpeg or webp format: # create an svg image library(svglite) library(ggplot2) svglite("plot.svg", width = 10, height = 7) qplot(mpg, wt, data =...

Read more »

Using webp in R: A New Format for Lossless and Lossy Image Compression

January 24, 2016
By
Using webp in R: A New Format for Lossless and Lossy Image Compression

A while ago I blogged about the brotli, a new general purpose compression algorithm which Google promotes as an alternative to gzip. The same company also happens to be working on a new format for images called webp, which is actually a derivative of the VP8 video format. Google claims...

Read more »

Sponsors