by Doug Ashton, Mango Solutions @dougashton Doug Ashton Data Science Radar – Nov 2015 1. Tell us a bit about your background in Data Science. I was a physicist for 10 years where I used Monte Carlo simulations to solve … Continue reading →

“In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(μ|ν)) is necessary and sufficient for accurate estimation by importance sampling.” Sourav Chatterjee and Persi Diaconis arXived yesterday an exciting paper where they study the proper sample size in an importance sampling setting with no variance. That’s right,

In a previous post, we described how we performed exploratory data analysis (EDA) in real-world log files, as provided by Skroutz.gr, the leading online company in Greece for online price comparison, in the context of Athens Datathon 2015. In the present post we will have a look at the same job as performed with Oracle Big Data Discovery (v....

In this follow-up tutorial of This R Data Import Tutorial Is Everything You Need-Part One, DataCamp continues with its comprehensive, yet easy tutorial to quickly import data into R, going from simple, flat text files to the more advanced SPSS and SAS files. As a lot of our readers noticed correctly from the first post, The post

The quest for income microdata For a separate project, I've been looking for source data on income and wealth inequality. Not aggregate data like Gini coefficients or the percentage of income earned by the bottom 20% or top 1%, but the sources used to calculate those things. Because it's sensitve personal financial data either from surveys or tax...

We better keep an eye on this one: she is tricky (Michael Banks, talking about Mary Poppins) Professor Bertrand teaches Simulation and someday, ask his students: Given a circumference, what is the probability that a chord chosen at random is longer than a side of the equilateral triangle inscribed in the circle? Since they must reach the … Continue reading...

Why an R Tutorial on Reading and Importing Excel Files into R is necessary As most of you know, Excel is a spreadsheet application developed by Microsoft. It is an easy accessible tool for organizing, analyzing and storing data in tables and has a widespread use in many different application fields all over the world. The post

e-mails with the latest R posts.

(You will not see this message again.)