Data Analysis Training

Posted on March 20, 2012 by prasoonsharma in Uncategorized | 0 Comments

[This article was first published on Enterprise Software Doesn't Have to Suck, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m training some of my colleagues on Big’ish data analysis this week. Here’s how I’m running the class. Would love your ideas to make it better.

CLASS OBJECTIVES (LEARNING OUTCOMES)

After completion of the course, you will be able to:

Understand concepts of data science, related processes, tools, techniques and path to building expertise
Use Unix command line tools for file processing (awk, sort, paste, join, gunzip, gzip)
Use Excel to do basic analysis and plots
Write and understand R code (data structures, functions, packages, etc.)
Explore a new dataset with ease (visualize it, summarize it, slice/dice it, answer questions related to dataset)
Plot charts on a dataset using R

CLASS PREREQUISITES

Good knowledge of basic statistics (min, max, avg, sd, variance, factors, quantiles/deciles, etc.)
Familiarity with Unix OS

CLASS TOPICS

A) Intro to data science

Explain data science and its importance. Data-driven business functions e.g. MROI, mix optimization, IPL teams / fantasy teams, predictions
Big data
– Definition: Data sets that no longer fit on a disk, requiring compute clusters and respective software and algorithms (map/reduce running on Hadoop).
– Real big data problems: parallel computing, distributed computing, cloud, hadoop, casandra
– Most analysis isn’t Big Data. Business apps often deal with datasets that fit in Excel/Access
Products: Desktop tools (Excel (solver, what if), Access, SQL, spss, stata, R, sas, programming languages (ruby, python, java) — stats libs in these languages, BI tools, etc.

B) Steps in data science

Acquire data: “obtaining the data”… databases, log files… exports, surveys, web scraping etc.
Verify data
Cleanse and transform data: outliers, missing values, dedupe, merge
Explore data: The first step when dealing with a new data set needs to be exploratory in nature: what actually is in the data set? Summarize, Visually inspect entire data
– What does the data look like? summaries, cross-tabulation
– What does knowing one thing tell me about another? Relationships between data elements
– What the heck is going on?
Visualize data
Interact with data (not covered here): BI tools, custom dashboards, other tools (ggobi etc.)
Archive data (not covered here)

C) Skills needed for data science

Statistics: Concepts, approach, techniques
Databasing: SQL
Scripting language: Ruby, Python
RegEx
Visual design: Story telling with charts
File handling: Unix preferred. awk, gzip, gunzip, paste, sort etc.
Office tools: Excel (plugins like Solver, What If)
Statistical tools: R, SAS, SPSS, Stata, MATLAB, etc.
BI tools: Qlikview, Tableau

D) Learning R

We will pick a tool to learn the concepts of data science. We will use R, a leading open source stats package. Why I started learning data science and picked R

Curriculum for Intro to R (R has steep learning curve. Purpose of this discussion is to get you started)

E) Where to go from here?

Learn adv techniques: sampling, predictions. Books, Conferences
Analyse your favorite dataset: e.g. Cricket data analysis
Compete (kaggle)
Learn other tools (Excel Solver, SAS etc.)

REFERENCE

Tutorials

Books

To leave a comment for the author, please follow the link and comment on their blog: Enterprise Software Doesn't Have to Suck.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Analysis Training

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)