Blog Archives

WoE and IV Variable Screening with {Information} in R

August 27, 2017
By
WoE and IV Variable Screening with {Information} in R

A short note on information-theoretic variable screening in R w. {information}. Variable screening comes as an important step in the contemporary EDA for predictive modeling: what can we tell about the nature of the relationships between a set of predictors and the dependent before entering the modeling phase? Can we infer something about the predictive power of the independent variables...

Read more »

Visualising Similarity: Maps vs. Graphs

July 28, 2017
By
Visualising Similarity: Maps vs. Graphs

The visualization of complex data sets is of essential importance in communicating your data products. Beyond pie charts, histograms, line graphs and other common forms of visual communication begins the reign of data sets that encompass too much information to be easily captured by these simple data displays. A typical context that abounds with complexity is found in the...

Read more »

Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (2): Recommendation as discrete choice

April 14, 2017
By
Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (2): Recommendation as discrete choice

In this continuation of “Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (1): Feature engineering” I will describe the application of the {ordinal} clm() function to test a new, hybrid content-based, collaborative filtering approach to recommender engines by fitting a class of ordinal logistic (aka ordered logit) models to ratings data from the MovieLens 100K dataset. All...

Read more »

Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (1): Feature engineering

April 14, 2017
By
Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (1): Feature engineering

I will use {ordinal} clm() (and other cool R packages such as {text2vec} as well) here to develop a hybrid content-based, collaborative filtering, and (obivously) model-based approach to solve the recommendation problem on the MovieLens 100K dataset in R. All R code used in this project can be obtained from the respective GitHub repository; the chunks of code present...

Read more »

#AskNASA: What’s the Optimal Time for Aliens to Invade Earth?

February 22, 2017
By
#AskNASA: What’s the Optimal Time for Aliens to Invade Earth?

This post was originally published on SmartCat, 22 Feb 2017.My inaugural blog as a Data Science Consultant for SmartCat. The code that accompanies the analyses presented here is available at the respective GitHub repository. On how to use R to estimate the optimal time during the day for aliens to invade Earth and a few more interesting things.A few...

Read more »

R in Open Data: Complaints in The Field of Freedom of Information data set from data.gov.rs

February 12, 2017
By
R in Open Data: Complaints in The Field of Freedom of Information data set from data.gov.rs

The notebooks (R, Rmd, and HTML files are provided in my GitHub repository) focus on an exploratory analysis of the open data set on the complaints in the field of freedom of information, provided at the Open Data Portal of the Republic of Serbia that is currently under development. The data set was kindly provided to the Open Data Portal...

Read more »

Open Data R Meetup: exploring the Distribution of Traffic Accidents in Belgrade, 2015 in R

January 31, 2017
By
Open Data R Meetup: exploring the Distribution of Traffic Accidents in Belgrade, 2015 in R

The R code that accompanies this post is found on GitHub: you will find R, Rmd, and HTML files there that were used during the first Open Data R Meetup held in Belgrade, 31 January 2017, organized by Data Science Serbia in Startit Center, Savska 5, Bel...

Read more »

Distributional Semantics in R: Part 2 Entity Recognition w. {openNLP}

January 2, 2017
By
Distributional Semantics in R: Part 2 Entity Recognition w. {openNLP}

The R code for this tutorial on Methods of Distributional Semantics in R is found in the respective GitHub repository. You will find .R, .Rmd, and .html files corresponding to each part of this tutorial (e.g. DistSemanticsBelgradeR-Part2.R, DistSemant...

Read more »

Distributional Semantics in R: Part 1 {tm} classes + read/write

December 24, 2016
By
Distributional Semantics in R: Part 1 {tm} classes + read/write

The R code for this tutorial on Methods of Distributional Semantics in R is found in the respective GitHub repository. Following my Methods of Distributional Semantics in R BelgradeR Meetup with Data Science Serbia, organized in Startit Center, Belgrade, 11/30/2016, several people asked me for the R code used for the analysis of William Shakespeare’s plays that was presented....

Read more »

Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, ML Estimation + Binomial Logistic Regression]

June 21, 2016
By
Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, ML Estimation + Binomial Logistic Regression]

Welcome to Introduction to R for Data Science, Session 8: Intro to Text Mining in R, ML Estimation + Binomial Logistic Regression [Web-scraping with tm.plugin.webmining. The tm package corpora structures: assessing document metadata and content. Typical corpus transformations and Term-Document Matrix production. A simple binomial regression model with tf-idf scores as features and its shortcommings due to sparse data....

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)