Articles by Ashish

Sold! How do home features add up to its price tag?

September 5, 2016 | Ashish

I begin with a new project. It is from the Kaggle playground wherein the objective is to build a regression model (as the response variable or the outcome or dependent variable is continuous in nature) from a given set of predictors or independent variables. My motivation to work on this ... [Read more...]

Predict Blood Donation -warmup

August 28, 2016 | Ashish

Continuing from my previous post, in this post I will discuss on the inferential and predictive analysis. About the dataset and the problem to solve: a brief The dataset is derived from UCI Machine learning repository and the task is to predict if a donor has donated blood in March 2007 (1 ...
[Read more...]

Learning from data science competitions- baby steps

August 23, 2016 | Ashish

Off lately a considerable number of winner machine learning enthusiasts have used XGBoost as their predictive analytics solution. This algorithm has taken a preceedence over the traditional tree based algorithms like Random Forests and Neural Networks. The acronym Xgboost stands for eXtreme Gradient Boosting package. The creators of this algorithm ...
[Read more...]

Data Transformations

August 8, 2016 | Ashish

A number of reasons can be attributed to when a predictive model crumples such as: Inadequate data pre-processing Inadequate model validation Unjustified extrapolation Over-fitting (Kuhn, 2013) Before we dive into data preprocessing, let me quickly define a few terms that I will be commonly using. Predictor/Independent/Attributes/Descriptors – are the ... [Read more...]

Data Splitting

August 7, 2016 | Ashish

A few common steps in data model building are; Pre-processing the predictor data (predictor – independent variable’s) Estimating the model parameters Selecting the predictors for the model Evaluating the model performance Fine tuning the class prediction rules “One of the first decisions to make when modeling is to decide which ...
[Read more...]

Gini index to compute inequality or impurity in the data

May 18, 2015 | Ashish

"Gini index measures the extent to which the distribution of income or consumption expenditure among individuals or households within an economy deviates from a perfectly equal distribution. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality.
[Read more...]

Assessing Clustering Tendency in R

May 13, 2015 | Ashish

In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)