Guest post by Vivian Zhang, original post.
Brief: The course (which will meet five Sundays) will start from the basics,
introducing the building blocks used for programming in R and building
intuition for writing clean and robust code. We will move on to cover
data analysis, applications of statistical techniques, and graphing.
Date: Nov 10th, Nov 17th, Nov 24th, Dec 1st, Dec 8th (Five Sundays)
Time: 12:00pm to 4pm
Scott Kostyshak (Data Scientist @ Supstat Inc, 5th year Econ PhD at Princeton Univ.)
Vivian Zhang (CTO @Supstat Inc, Master degrees in Computer Science and Statistics)
For group(5 or more persons) and enterprise pricing, please email [email protected]
(Content may be adjusted based on the real teaching condition)
Basics 6 hours
Abstract: explain the basic operation of knowledge through this unit of study , students can learn the characteristics of R , resource acquisition mode , and mastery of basic programming
Case and Exercise: Using the R language completion of certain Euler Project (euler project)
* How to learn R
* How to get help
* R language resources and books
* Expansion Pack
* Custom Startup Items
* Batch Mode
* Data Objects
* Custom Functions
* Control statements
* Vectorized operations
Data for two hours
Abstract: explain the various ways the R language read data , the participants through the basic WEB knowledge of web crawling , connect to the database via sql statement calling data from a variety of local read excel file data .
Case studies and exercises: crawl watercress data on the site , write a custom function .
* Web data capture
* API data source
* Connect to the database
* Local Documentation
* Other data sources
* Data Export
Data collation 3 hours
Abstract: how to manipulate the data use R for the all kinds of data conversion, especially for string operation processing .
Case studies and exercises : Find the QQ(the most used instant messager tool) group , then discuss research options with text features.
* Data sorting
* Merge Data
* Summary data
* Remodeling Data
* Take a subset of data
* String manipulation
* Date Actions
Data Visualization 3 hours
Abstract: cover two advanced drawing package , lattice and ggplot2, understand the various methods of visualization to explore.
Case and Exercise: Using graphics to right before the movie , text and other data to describe
* Box Plot
* Matrix related
Elementary statistical methods 5 hours
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.
* Descriptive Statistics
* Statistical Distributions
* Frequency and contingency tables
* T test
* Non-parametric statistics
* Linear Regression
* Regression Diagnostics
* Robust Regression
* Nonlinear regression
* Principal Component Analysis
* Logistic Regression
* Statistical Simulation
Preliminary data mining ( Selected Topics )
Abstract: explain the R language for data mining expansion pack and functions use , students can master the supervised learning and unsupervised learning two mining methods .
Case and Exercise: Use R to participate in Kaggle Data Mining Competition
* General Mining Process
* Rattle bag
* Hierarchical clustering
* K -means clustering
* Decision Trees
* BP neural network
What does SupStat offer?(click on the image to see more details.)
Our services include consulting on statistical methods, software training on statistical computing and data analysis (mainly R), statistical graphics and data visualization, as well as statistical reports. We have Beijing, Shanghai and New York office. Our team includes top 0.1% ranked Kagglers.(www.kaggle.com hosts excellent data mining competitions and gathers more than 100K data scientists.) For business inquiry, please email: