Blog Archives

Anomaly Detection in R

December 17, 2015
By
Anomaly Detection in R

Introduction Inspired by this Netflix post, I decided to write a post based on this topic using R. There are several nice packages to achieve this goal, the one we´re going to review is AnomalyDetection. Download full -and tiny- R code of this post here. Normal Vs. Abnormal The definition for abnormal, or outlier, is an element which does not follow the behaviour of...

Read more »

Text Mining Analysis: some theory and practice in R

October 21, 2015
By
Text Mining Analysis: some theory and practice in R

Introduction Big Data help us to analyze unstructred data (aka "text" ), with many techniques, in this post it is presented one: Cosine Similarity. There are also other analysts work, who scraped data from twitter who spot some airplane complai...

Read more »

Recommendation Systems in R

September 12, 2015
By
Recommendation Systems in R

These systems are used in cross-selling industries, and they measure correlated items as well as their user rate. This last point wasn't included the apriori algorithm (or association rules), used in market basket analysis. The link: http://blog.yha...

Read more »

{Long Vs. Wide} Data Frames

July 24, 2015
By
{Long Vs. Wide} Data Frames

Introduction This is an excellent resource to understand 2 types of data frame format: Long and Wide. Just take a look at figure 1 inside the article 1) Long format: ggplot2 needs in certain scenarios this kind of format to work (generally grouped...

Read more »

3-step lesson, going into the life of machine learning

July 2, 2015
By
3-step lesson, going into the life of machine learning

Automatic Machine Learning Introduction "I want to develop a model that automatically learns over time", a really challenging objective. We'll develop in this post a procedure that loads data, build a model, make predictions and, if something chang...

Read more »

Data Science – Short lesson on cluster analysis

May 13, 2015
By
Data Science – Short lesson on cluster analysis

Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster model is developed, one question arises: How can I describe my model? Here we present a way to approach this question, through the implementation of Coordinate Plot in R...

Read more »

EU Life Quality Geo Report

May 6, 2015
By
EU Life Quality Geo Report

Living longer, living better? It's equally important to measure the longer living as well as its quality. Analyzing data from eurostat which containts the following two variables: 1- Healthy life years: Is a health expectancy indicator which com...

Read more »

Dynamic analysis on outliers

April 24, 2015
By
Dynamic analysis on outliers

Treating outliers Introduction Outliers are the extreme values that a variable has, depending on the model or requirement, it could be necessary to treat them, either transforming or deleting. Variable “Income” distribution This is going to be our main variable in this example, which represents customer's income in $. We can observe how there are a few cases with very high...

Read more »

Geo Analysis

March 19, 2015
By
Geo Analysis

EU - Life Quality Geo Report Living longer, living better? It's equally important to measure the longer living as well as its quality. Analyzing data from eurostat which containts the following two variables: 1- Healthy life years: Is a healt...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)