Blog Archives

An accidental side effect of text mining

December 12, 2019
By
An accidental side effect of text mining

Reprinted from https://towardsdatascience.com/an-accidental-side-effect-of-text-mining-4b43f8ee1273 As I read on my kindle I highlight the passages that I like so that I can re-read them later. These annotations are stored on my Kindle and are backed up at Amazon. And after some time, they started to accumulate and became some kind of data. I came up with an idea to analyze all those text...

Read more »

Speed boosting in R: Writing efficient code & parallel programming

November 14, 2019
By
Speed boosting in R: Writing efficient code & parallel programming

Have more things happen at once: Parallel Programming Parallel processing is about using multiple cores of your computer’s CPU to run multiple tasks simultaneously. This enables you to complete the same task multiple times quicker! In R, usually computations run sequentially. When we initiate multiple tasks they are performed one after the other, new task starts only after the previous one...

Read more »

Cleaning and visualizing Five-year cancer survival statistics with ggplot2 and animation

November 4, 2019
By
Cleaning and visualizing Five-year cancer survival statistics with ggplot2 and animation

Where are we standing on fight against cancer? Five-year survival rates is a good indicator of improvement in cancer medicine. I am using the article by Jemal et. al. published on the Journal of the National Cancer institute. You can find the original publication here: https://academic.oup.com/jnci/article/109/9/djx030/3092246 Final take home messages in this article were: Cancer death rates continue to decrease in the United States But...

Read more »

An intuitive real life example of a binomial distribution and how to simulate it in R: Learn it once, use it everyday

October 27, 2019
By
An intuitive real life example of a binomial distribution and how to simulate it in R: Learn it once, use it everyday

Last week, I came across a data that I thought it is a great opportunity to write about Binomial probability distributions. What is a binomial distribution and why we need to know it? Binomial distributions are formed when we repeat a set of events and each single event in a set has two possible outcomes. Bi- in binomial distributions refers to...

Read more »

#TidyTuesday: Which are the best family cars for your weekend trip?

October 22, 2019
By
#TidyTuesday: Which are the best family cars for your weekend trip?

This week, I will analyze Car Fuel Economy dataset from TidyTuesday. What is TidyTuesday? TidyTuesday is a weekly social data project in R organized by the R for Data Science community. It is a great way of improving your Data wrangling and visualization techniques, sharing and learning from others. You can find more information on their github. Fuel economy data are the result of...

Read more »

Add custom summary statistics in ggplot2

October 15, 2019
By
Add custom summary statistics in ggplot2

It is hard to understand your data by looking at the numbers on a csv file. You need to plot it. And adding statistics to your plots will make it more informative. To evaluate data, we typically use mean and median to define its central tendency and range, quartiles, variance and standard deviation to define how spread it is. Mean and...

Read more »

Data Preparation: Web Scraping html tables with rvest

October 9, 2019
By
Data Preparation: Web Scraping html tables with rvest

Accessing different data sources Sometimes, the data you need is available on the web. Accessing those will ease your life as a data scientist. I want to perform an exploratory data analysis on 2018/19 Season of England Premier league. Are there changes in team performances during the season timeline? Does some teams cluster? Which is the earliest week we can predict team’s final positions? I...

Read more »

What is aesthetics and attributes in ggplot’s world?

October 7, 2019
By
What is aesthetics and attributes in ggplot’s world?

ggplot2 is a powerful data visualization tool of R. Make quick visualizations to explore or share your insights. Learning how aesthetics and attributes are defined in ggplot will give you an edge to develop your skills quickly. ggplot2 tips: distin...

Read more »

Why not everyone who smokes develop cancer or who eats a lot develop fatty liver disease? Predicting diseases with machine learning

September 30, 2019
By
Why not everyone who smokes develop cancer or who eats a lot develop fatty liver disease? Predicting diseases with machine learning

We are much better at handling diseases than 30 years ago. For example cancer survival rates are much higher now. The significant portion of this increase can be attributed directly to our ability to detect and diagnose cancer earlier. Also, use of insulin and other drugs to control blood glucose in diabetic patients reduced the risk of developing coronary...

Read more »

Data Wrangling for Text mining: Extract individual elements from a Book

September 24, 2019
By
Data Wrangling for Text mining: Extract individual elements from a Book

My ambitious goal is to write a machine learning algorithm that predicts authors. But let’s start with something simpler. An important part in a Data Science workflow is data preparation. Clean it, reformat it and make it usable for further analysis. Figure 1: Photo by Patrick Tomasso on Unsplash I will work on a Poetry book called “New Poems” from D....

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)