A Rough Guide to Data Science

July 9, 2013
By

(This article was first published on Pivotal P.O.V. » R, and kindly contributed to R-bloggers)

Visualization by speedoflife via Flickr.

Visualization by speedoflife via Flickr.

If Big Data was last year’s buzzword, Data Science may reach the same level of hype this year. There’s no shortage of discussion about the high demand for data scientists, the term’s usefulness as a designation, and even declarations of its “sexiness” as a career. And as with many terms that reach a critical mass on social media, data science is a concept more widely discussed than understood. What is data science? What differentiates the practice to justify this new term? And how does someone become a data scientist?

The definition of data science varies among practitioners, but is widely understood as the application of statistical analysis and software engineering to transform vast amounts of data into useful insight. Beyond this, the data scientist iterates on models to further explore questions posed by the data, and then uses techniques such as visualization to communicate the insights and stories revealed from the process.

In a useful new document, “A Practical Introduction to Data Science Skills”, Google’s Michael Manoochehri offers a syllabus for those wanting to learn more about data science, its role in organizations and society, and the common skills, platforms, and frameworks used by practitioners. Manoochehri is the author of the forthcoming book Data Just Right, which aims to disambiguate the role of big data within the modern enterprise, and explore how organizations can not only adapt to this paradigm shift, but embrace it.

And while expert data scientists are in command of numerous mathematical and programming skills, Manoochehri offers some entry points and potential projects for the curious. Many of the “short term skills” he identifies are common among reasonably-technical users — proficiency in Python and JavaScript, familiarity with UNIX and SQL — along with data science-specific learning tasks such as gaining a basic understanding of R and running a Hadoop instance locally. While the long-term skills may be more imposing to neophytes, there’s a lot of free tools, tutorials, and datasets to learn from, and even entry-level skills can be useful for non-profits and municipalities that lack such expertise.

To leave a comment for the author, please follow the link and comment on their blog: Pivotal P.O.V. » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)