Pitfall of XML package: issues specific to cp932 locale, Japanese Shift-JIS, on Windows

January 31, 2016
By
Pitfall of XML package:  issues specific to cp932 locale, Japanese Shift-JIS, on Windows

CRAN package XML has something wrong at parsing html pages encoded in cp932 (shift-jis).  In this report, I will show these issues and also their solutions which is workable at … Continue reading →

Read more »

Hillary Clinton’s Biggest 2016 Rival: Herself

January 30, 2016
By
Hillary Clinton’s Biggest 2016 Rival: Herself

In a recent post I noted that despite Bernie Sanders doing better in many important indicators, Obama 2008 received 3x more media coverage than Sanders 2016.Reasonably, a reader of my blog noted that not all coverage was equal, that a presidential hope...

Read more »

Strategies to Speedup R Code

January 30, 2016
By
Strategies to Speedup R Code

The for-loop in R, can be very slow in its raw un-optimised form, especially when dealing with larger data sets. There are a number of ways you can make your logics run fast, but you will be really surprised how fast you can actually go. This posts shows a number of approaches including simple tweaks

Read more »

The R-Podcast Episode 16: Interview with Dean Attali

January 30, 2016
By

Direct from the first-ever Shiny Developer conference, here is episode 16 of the R-Podcast! In this episode I sit down with Dean Attali for an engaging conversation about his journey to using R, his motivation for creating the innovative shinyjs packa...

Read more »

The correlation between original and replication effect sizes might be spurious

January 29, 2016
By
The correlation between original and replication effect sizes might be spurious

In the reproducibility project, original effect sizes correlated r=0.51 with the effect sizes of replications. Some researchers find this hopeful.Less-popularised findings from the "estimating the reproducibility" paper @Eli_Finkel #SPSP2016 pic.twitter.com/8CFJMbRhi8— Jessie Sun (@JessieSunPsych) January 28, 2016I don’t think we should be interpreting this correlation at all, because it might very well...

Read more »

New Yorkers, municipal bikes, and the weather

January 29, 2016
By
New Yorkers, municipal bikes, and the weather

Like many modern cities, New York offers a public pick-up/drop-off bicycle service (called Citi Bikes). Subscribing City Bike members can grab a bike from almost 500 stations scattered around the city, hop on and ride to their destination, and drop the bike at a nearby station. (Visitors to the city can also purchase day passes.) The City Bike program...

Read more »

2016 Prior Exposure Bayesian Data Analysis workshops for social scientists

January 29, 2016
By

Mark Andrews and I launched our Prior Exposure Bayesian Data Analysis workshop series last year and are pleased to announce that bookings for year the 2016 workshops 1 and 2 are now open. This is part of the ESRC Advanced Training Initiative.Further details including booking links and details of bursaries for UK PhD students are available here. The dates are 31...

Read more »

Cricket analytics with cricketr!!!

January 29, 2016
By
Cricket analytics with cricketr!!!

My ebook “Cricket analytics with cricketr’  has been published in Leanpub.  You can now download the book (hot off the press!)  for all formats to your favorite device (mobile, iPad, tablet, Kindle)  from the link  “Cricket analytics with cricketr”. The book has been published in the following formats namely PDF (for your computer) EPUB (for

Read more »

FQDN (Fully Qualified Domain Names) in R

January 29, 2016
By

By Steph Locke Get the fully qualified domain name for your machine This is just a quick post, to mention how you can get your computer name with the domain it is registered in i.e.  the fully qualified domain name … Continue reading →

Read more »

Better prediction intervals for time series forecasts

January 29, 2016
By
2016-01-31 15_57_13-Clipboard

Forecast Combination I’ve referred several times to this blog post by Rob Hyndman in which he shows that a simple averaging of the ets() and auto.arima() functions in his {forecast} R package not only out performs ets() and auto.arima() individually (in the long run, not every time), they outperform nearly every method that was entered in the M3...

Read more »

Obama 2008 recieved 3x more media coverage than Sanders 2016

January 28, 2016
By

Many supporters of presidential hopeful Bernie Sanders have claimed that there is a media blackout in which Bernie Sanders has been for whatever reason blocked from communicating his campaign message. Combined with a dramatically cut democratic debate scheme (from 18 in 2008 with Obama to 4 in 2016 with Sanders) scheduled on days of the week least likely...

Read more »

R User Groups on GitHub

January 28, 2016
By
R User Groups on GitHub

by Joseph Rickert Quite a few times over the past few years I have highlighted presentations posted by R user groups on their websites and recommended these sites as a source for interesting material, but I have never thought to see what the user groups were doing on GitHub. As you might expect, many people who make presentations at...

Read more »

A Million Text Files And A Single Laptop

January 28, 2016
By
A Million Text Files And A Single Laptop

More often that I would like, I receive datasets where the data has only been partially cleaned, such as the picture on the right: hundreds, thousands…even millions of tiny files. Usually when this happens, the data all have the same format (such as having being generated by sensors or other memory-constrained devices). The problem with data

Read more »

Discount R courses at Simplilearn

January 28, 2016
By

Guest post by Simplilearn Simplilearn is offering access to its R Language courses at reduced prices. The offer is good till 7th Feb, 2016 with the coupon: GetAhead Check out the R-courses they offer: Certified Data Scientist with R Language At the end of the training, you will be technically competent in key R programming language concepts such as data visualization...

Read more »

love-hate Metropolis algorithm

January 27, 2016
By
love-hate Metropolis algorithm

Hyungsuk Tak, Xiao-Li Meng and David van Dyk just arXived a paper on a multiple choice proposal in Metropolis-Hastings algorithms towards dealing with multimodal targets. Called “A repulsive-attractive Metropolis algorithm for multimodality” . The proposal distribution includes a downward

Read more »

In-depth analysis of Twitter activity and sentiment, with R

January 27, 2016
By
In-depth analysis of Twitter activity and sentiment, with R

Astronomer and budding data scientist Julia Silge has been using R for less than a year, but based on the posts using R on her blog has already become very proficient at using R to analyze some interesting data sets. She has posted detailed analyses of water consumption data and health care indicators from the Utah Open Data Catalog,...

Read more »

Materials for NYU Shortcourse “Data Science and Social Science”

January 27, 2016
By

Pablo Barberá, Dan Cervone, and I prepared a short course at New York University on Data Science and Social Science, sponsored by several institutes at NYU. The course was intended as an introduction to R and basic data science tasks, including data visualization, social network analysis, textual analysis, web scraping, and APIs. The workshop is geared… Continue reading →

Read more »

Intro to Sound Analysis with R

January 27, 2016
By

Guest post by Christopher Johnson from www.codeitmagazine.com   Some of my articles cover getting started with a particular software, and some cover tips and tricks for seasoned users.  This article, however, is different.  It does demonstrate the usage of an R package, but the main purpose is for fun. In an article in Time, Matt Peckham described how French researchers...

Read more »

How To Import Data Into R – New Course

January 26, 2016
By
How To Import Data Into R – New Course

Importing your data into R to start your analyses: it should be the easiest step. Unfortunately, this is almost never the case. Data is stored in all sorts of formats, ranging from from flat files to other statistical software files to databases and web data. A skilled data scientist knows which techniques to use to in order to...

Read more »

R typos

January 26, 2016
By
R typos

At MCMskv, Alexander Ly (from Amsterdam) pointed out to me some R programming mistakes I made in the introduction to Metropolis-Hastings algorithms I wrote a few months ago for the Wiley on-line encyclopedia! While the outcome (Monte Carlo posterior) of the corrected version is moderately changed this is nonetheless embarrassing! The example (if not the

Read more »

Conditional execution exercises

January 26, 2016
By
Conditional execution exercises

In the exercises below we cover the basics of conditional execution. In all previous exercises, the solutions required one or more R statements that were all executed consecutively. In this series of exercises we’re going to use the if, else and ifelse functions, to execute only a subset of the R script, depending on one

Read more »

“Introduction to Data Science” video course contest is closed

January 26, 2016
By

Congratulations to all the winners of the Win-Vector “Introduction to Data Science” Video Course giveaway! We’ve emailed all of you your individual subscription coupons. Even though this contest is over, we still encourage those interested to join our mailing list. Our updates to the list will be infrequent, but (we hope) informative. For fun, we … Continue reading...

Read more »

Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

January 26, 2016
By
Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

TL;DR Click here to play a puzzle game written entirely in Shiny (source code). Anyone who reads my blog posts knows by now that I’m very enthusiastic about Shiny (the web app framework for R - if you didn’t know what Shiny is then I suggest reading my previous post about it). One of my reasons for...

Read more »

Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

January 26, 2016
By
Need any more reason to love R-Shiny? Here: you can even use Shiny to create simple games!

Anyone who reads my blog posts knows by now that I’m very enthusiastic about Shiny (the web app framework for R - if you didn’t know what Shiny is then I suggest reading my previous post about it). One of my reasons for liking Shiny so much is that you can do so much more with it than...

Read more »

Pipelining R and Python in Notebooks

January 26, 2016
By
Pipelining R and Python in Notebooks

by Micheleen Harris Microsoft Data Scientist As a Data Scientist, I refuse to choose between R and Python, the top contenders currently fighting for the title of top Data Science programming language. I am not going to argue about which is better or pit Python and R against each other. Rather, I'm simply going to suggest to play to...

Read more »

Linear regression with random error giving EXACT predefined parameter estimates

January 26, 2016
By
Linear regression with random error giving EXACT predefined parameter estimates

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope). It does so by: 1)

Read more »

Launching Data Science Africa Blog

January 26, 2016
By

We are glad to announce the launch of datascience-africa.org as a blog that aggregates all the events, news and information impacting the data science community in some of the major cities in Africa. Our community has witnessed the birth and steady growth of several data science meetup groups with a very enthusiastic group of devoted members. We are a community of data...

Read more »

Bayesian regression with STAN Part 2: Beyond normality

January 26, 2016
By
Bayesian regression with STAN Part 2: Beyond normality

In a previous post we saw how to perform bayesian regression in R using STAN for normally distributed data. In this post we will look at how to fit non-normal model in STAN using three example distributions commonly found in empirical data: negative-binomial (overdispersed poisson data), gamma (right-skewed continuous data) and beta-binomial (overdispersed binomial data).

Read more »

Flowing triangles

January 26, 2016
By
Flowing triangles

I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise. Yet, the actual painting can be...

Read more »

Sponsors