Blog Archives

Kannada MNIST Prediction Classification using H2O AutoML in R

October 2, 2019
By
Kannada MNIST Prediction Classification using H2O AutoML in R

Kannada MNIST dataset is another MNIST-type Digits dataset for Kannada (Indian) Language. All details of the dataset curation has been captured in the paper titled: “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” by Vinay Uday Prabhu. The github repo of the author can be found here. The objective of this post is to demonstrate how to use...

Read more »

Handling Missing Values in R using tidyr

September 22, 2019
By

In this post, We’ll see 3 functions from tidyr that’s useful for handling Missing Values (NAs) in the dataset. Please note: This post isn’t going to be about Missing Value Imputation. tidyr According to the documentation of tidyr, The goal of tidyr is to help you create tidy data. Tidy data is data where: + Every column is variable. + Every row is an...

Read more »

Functional Programming + Iterative Web Scraping in R

September 18, 2019
By
Functional Programming + Iterative Web Scraping in R

Web Scraping in R Web scraping needs no introduction among Data enthusiasts. It’s one of the most viable and most essential ways of collecting Data when the data itself isn’t available. Knowing web scraping comes very handy when you are in shortage of data or in need of Macroeconomics indicators or simply no data available for a particular project like a...

Read more »

Hindi and Other Languages in India based on 2001 census

September 16, 2019
By
Hindi and Other Languages in India based on 2001 census

India is the world’s largest Democracy and as it goes, also a highly diverse place. This is my attempt to see how “Hindi” and other languages are spoken in India. In this post, we’ll see how to collect data for this relevant puzzle - directly from Wikipedia and How we’re going to visualize it - highlighting the insight. Data Wikipedia is a...

Read more »

Regex Problem? Here’s an R package that will write Regex for you

September 12, 2019
By

REGEX is that thing that scares everyone almost all the time. Hence, finding some alternative is always very helpful and peaceful too. Here’s a nice R package thst helps us do REGEX without knowing REGEX. REGEX This is the REGEX pattern to test the validity of a URL: ^(http)(s)?(\:\/\/)(www\.)?(*)$ A typical regular expression contains — Characters ( http ) and Meta Characters (). The...

Read more »

How to do Tamil Text Analysis & NLP in R

September 3, 2019
By
How to do Tamil Text Analysis & NLP in R

udpipe is a beautiful R package for Text Analytics and NLP and helps in Topic Extraction. While most Text Analytics resources online are only about English, This post picks up a different lanugage - Tamil and fortuntely, udpipe has got a Tamil Language Model. Loading library(udpipe) Tamil Text Below is part extracted from a Tamil Movie Review text % ggplot() + geom_bar(aes(reorder(lemma,-n),n),...

Read more »

How to scrape Zomato Restaurants Data in R

August 26, 2019
By
How to scrape Zomato Restaurants Data in R

Zomato is a popular restaurants listing website in India (Similar to Yelp) and People are always interested in seeing how to download or scrape Zomato Restaurants data for Data Science and Visualizations. In this post, We’ll learn how to scrape / download Zomato Restaurants (Buffets) data using R. Also, hope this post would serve as a basic web scraping framework...

Read more »

Combining the power of R and Python with reticulate

August 26, 2019
By
Combining the power of R and Python with reticulate

R + Py In the word of R vs Python fights, This is a simple (could be called, naive as well) attempt to show how we can combine the power of Python with R and create a new superpower. Like this one, If you have watched The Incredibles before! About this Dataset This dataset contains a bunch of tweet that came with...

Read more »

How to do Topic Extraction from Customer Reviews in R

August 21, 2019
By
How to do Topic Extraction from Customer Reviews in R

Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called RAKE is what we’re going to see in this post. Udpipe udpipe is...

Read more »

3 tidyverse tricks for most commonly used Excel Features

August 16, 2019
By

In this post, We’re simply going to see 5 tricks that could help improve your tooling using {tidyverse}. Create a difference variable between the current value and the next value This is also known as lead and lag - especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done by creating...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)