- “Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;
- “Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, pages...
- “Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;
- “Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, pages...
The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full December edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Tell us what you're looking for in R training. 2013 is the International Year...
In case you missed them, here are some articles from November of particular interest to R users. In the webinar "Real-Time Predictive Analytics with Big Data", I showed how R fits into a real-time production system. R package developer Yihui Xie shares his favorite software and hardware in an interview with The Setup. Hadley Wickham created a handy tutorial...
Yesterday was the fourth anniversary of the Revolutions blog. Our first post was way back on December 9, 2008, and in the four years since we've been regularly posting about R, open source, statistics, big data, data science and other random things that happened to catch our eye. In fact, there have been 1488 posts published in the last...
I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my
I have found that I get data from many different sources. These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files. I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is probably the most common and
Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regresson trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package). It also introduced the possibility of applying this and the other big-data statistical methods of RevoScaleR to data files distributed in in Hadoop's HDFS file system*,...
In a recent post by Revolution Analytics (link & link) in which Revolution was benchmarking their closed source generalized linear model approach with SAS, Hadoop and open source R, they seemed to be pointing out that there is no 'easy' R open source solution which exists for building a poisson regression model on large datasets.