O'Reilly Media has published a new whitepaper, Real-Time Big Data Analytics: Emerging Architecture. This 32-page document describes the processes and components necessary for getting on-demand information from big-data stores such as Hadoop. It answers the questions “How fast is fast?” and “How real is real-time?” and “how big is big?”, and provides practical guidance for implementing real-time analytics systems.
The author (Mike Barlow) interviewed a broad range of experts and practitioners, including Justin Erickson (Cloudera), Matei Zaharia (creator of Spark), Nathan Marz (Storm, Cascalog), Dhiraj Rajaram (Mu Sigma) and yours truly (I describe the role of R in the real-time big data analytics stack). The guide offers a broad range of perspectives and distils them into a set of best practices in a clear and approachable way. It's available for download as a free PDF (with registration) at the link below.