Reflections on Data Science Summit 2011

May 13, 2011

The Data Science Summit held in Las Vegas this week was outstanding – kudos and thanks to EMC/Greenplum for organizing the event. The energy of 150+ data scientists coupled with a well-curated agenda of talks created a real sense of being at the cusp of a real revolution in the applications of data analysis. Here are just a few of the highlights that stood out for me:

  • The opening keynote from cultural anthropologist and futurist Thornton May, who created a real sense of excitement around the future of Data Science as a discipline. He noted that 35% of Fortune 500 companies are reformulating their strategies on the basis of Big Data and analytics, and as a consequence he hypothesizes that the "hero of the next age" will be the Data Scientist.
  • Tom Do from 23andMe on how even though the costs of genome sequencing are dropping much faster than Moore's Law would predict, we still need much more data to be able to link many diseases to genetic causes. They're doing some incredible things by linking sequence data from many individuals to survey response data, which could lead to better understanding of Parkinson's Disease, for example.
  • JoAnn Kuchera-Morin's video demonstration of the Allosphere, a fully-immersive 3-D environment for exploring data. I was especially intrigued by the concept of "sonification" — converting data into sounds. Combining sonification with visualizaion allows for the exploration of data in more than 3 dimensions simultaneously.
  • Colin Hill of Via Science, who reminded us that while Big Data has enabled many breakthroughs in data science it's not just about the bytes – it's about the flops, as well. The availability of Big Computing — massively parallel, high-performance architectures — is also taking Data Science in new directions.
  • The Data Science DNA panel discussion, where we learned that Roger Magoulas from O'Reilly coined the term "Big Data" in 2005 or thereabouts, and UC Berkeley CS prof Joe Hellerstein said "Everyone is desperate to hire Data Scientists".
  • Learning about the Code for America program from Jen Pahlka and Tim O'Reilly, where they are "building a Geek Army for a new American revolution". If you have data skills and can offer your services for a year (for a small stipend and great sense of civic pride), applications for fellowships are open now.
  • The Major Mashups in Action session – Pete Skomoroch's slides include some useful pointers to data sources and tools.
  • Learning about the winners of the Data Hero Awards.
  • Finally, I was totally blown away by Jonathan Harris's keynote on the theme of bridging data with art and emotion. Jonathan is a programmer, artist and storyteller who has created projects like "We Feel Fine", an "almanac of human emotion" which visualizes feelings taken from blog posts with the phrases "I feel" and "I am feeling". (If you try the app, click on the Montage link in the bottom-left to get a sense of the data – Jonathan showed a slideshow from this section that was truly moving.) All of the talks were outstanding, but this one really opened my eyes to the impact that data science can have on a human level — and the impact that art and design should have on data science in return.

Overall, I think one of the unique things about the Data Scientist Summit was the almalgam it made of people, service, business, culture, technology and art that made for a really special conference. On a personal level, I really enjoyed meeting with the many R users at the conference (many of whom I met in person for the first time), and participating in a lively panel discussion on the data science toolkit (where I made the case that R is an essential element in an ecosystem of tools). Thanks again to the organizers for inviting me, and I hope to attend again next year.

