Posts Tagged ‘ Data Science ’

The R packages in a data scientist’s toolbox

July 17, 2012
By

John Myles White, self-described "statistics hacker" and co-author of "Machine Learning for Hackers" was interviewed recently by The Setup. In the interview, he describes his some of his go-to R packages for data science: Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use. I primarily program in R,...

Read more »

A new open journal on Data Science

July 4, 2012
By

Springer has introduced a new open, peer-reviewed journal focused on Data Science: EPJ Data Science. What makes this a Data Science journal is novel uses of statistics, data analysis, computer techniques and public data sources to research a topic in another domain, rather than methodological research. Here are a few examples of the papers you'll find in the journal:...

Read more »

Modeling Trick: Masked Variables

July 1, 2012
By
Modeling Trick: Masked Variables

A primary problem data scientists face again and again is: how to properly adapt or treat variables so they are best possible components of a regression. Some analysts at this point delegate control to a shape choosing system like neural nets. I feel such a choice gives up far too much statistical rigor, transparency and Related posts:

Read more »

integrating R with other systems

June 16, 2012
By

I just returned from the useR! 2012 conference for developers and users of R. One of the common themes to many of the presentations was integration of R-based statistical systems with other systems, be they other programming languages, web systems, or enterprise data systems. Some highlights for me were an update to Rserve that includes

Read more »

More on birthday probabilities

June 15, 2012
By
More on birthday probabilities

Last week, Joe Rickert used R and four years of US Census data to create an image plot of the relative probabilities of being born on a given day of the year: Chris Mulligan also tackled this problem with R, but this time using 20 years of Census data from 1969 to 1988. Chris extracted the birthday frequencies using...

Read more »

Selection in R

June 1, 2012
By

The design of the statistical programming language R sits in a slightly uncomfortable place between the functional programming and object oriented paradigms. The upside is you get a lot of the expressive power of both programming paradigms. A downside of this is: the not always useful variability of the language’s list and object extraction operators. Related posts:

Read more »

Computational Journalism Server Version 1.6.5 Released

May 31, 2012
By

I’ve just released version 1.6.5 of the Computational Journalism Server. This is going to be the last release for a while. Release notes: I removed CoffeeScript and Node.js. I wasn’t using them. I dropped back to Erlang R14B-1.1. Everything tes...

Read more »

New Data Science Packages Coming To Computational Journalism Server

May 30, 2012
By

I’ve just received an announcement from Michael Lang that packages BatchJobs and BatchExperiments have been added to the Comprehensive R Archive Network (CRAN). From the announcement: The package BatchJobs implements the basic objects and procedu...

Read more »

Survey of Data Science / Analytics / Big Data / Applied Stats / Machine Learning etc. Practitioners

May 10, 2012
By

As I’ve discussed here before, there is a debate raging (ok, maybe not raging) about terms such as “data science”, “analytics”, “data mining”, and “big data”. What do they mean, how do they overlap, and perhaps most importantly, who are the people who work in these fields? Along with two other DC-area Data Scientists, Marck

Read more »

Data Science Books for Computational Journalists

May 8, 2012
By

There are quite a few books out now on “data science”. I’ve picked out three that I think are the best place to start for computational journalists. First is Machine Learning for Hackers, by Drew Conway and John Myles White. The autho...

Read more »