Blog Archives

Teaching to machines: What is learning in machine learning entails?

November 16, 2017
By
Teaching to machines: What is learning in machine learning entails?

Preamble Figure 1: The oldest learning institution  in the world; University of Bologna. (Source: Wikipedia). Machine Learning (ML) is now a de-facto skill for every quantitative job and almost every industry embraced it, even though fundamentals of the field is not new at all. However, what does it mean to teach to a machine? Unfortunately, for even moderate technical people coming from different backgrounds, answer to this...

Read more »

Understanding overfitting: an inaccurate meme in supervised learning

August 16, 2017
By
Understanding overfitting: an inaccurate meme in supervised learning

Preamble There is a lot of confusion among practitioners regarding the concept of overfitting. It seems like, a kind of an urban legend or a meme, a folklore is circulating in data science or allied fields with the following statement:Applying cross-validation prevents overfitting and a good out-of-sample performance, low generalisation error in unseen data, indicates not an overfit. This statement is of course not true:...

Read more »

Post-statistics: Lies, damned lies and data science patents

August 5, 2017
By
Post-statistics: Lies, damned lies and data science patents

US Patent (Wikipedia) Statistics is so important field in our daily lives nowadays, the emerging field of 50 years old data science that is applied to almost every human activity now, or post-statistics, a kind of post-rock,  fusing operations research, data mining, software and performance engineering and of course multitude fields of statistics to machine learning. Even though, the reputation of statistics...

Read more »

Pitfalls in pseudo-random number sampling at scale with Apache Spark

June 15, 2017
By
Pitfalls in pseudo-random number sampling at scale with Apache Spark

In many data science applications and in academic research, techniques involving Bayesian Inference is now used commonly. One of the basic operation in Bayesian Inference techniques is drawing instances from given statistical distribution. This of course well known pseudo-random number sampling. Most commonly used methods first generates uniform random number and mapping that into distribution of interest via cumulative...

Read more »

Practical Kullback-Leibler (KL) Divergence: Discrete Case

January 7, 2017
By
Practical Kullback-Leibler (KL) Divergence: Discrete Case

KL divergence (Kullback-Leibler57) or KL distance is non-symmetric measure of difference between two probability distributions. It is related to mutual information and can be used to measure the association between two random variables.Figure: Distance between two distributions. (Wikipedia)In this short tutorial, I show how to compute KL divergence and mutual information for two categorical variables, interpreted as discrete random...

Read more »

Understanding the empirical law of large numbers and the gambler’s fallacy

August 1, 2016
By
Understanding the empirical law of  large numbers and the gambler’s fallacy

One of the misconceptions in our understanding of statistics, or a counter-intuitive guess, fallacy, appears in the assumption of the existence of the law of averages. Imagine we toss a fair coin many times, most people would think that the number of heads and tails would be balanced over the increasing number of trails, which is wrong. If you...

Read more »

Economy and dynamic modelling: Haavelmo’s approach

July 25, 2016
By

Updated on 25 August 2017Preamable: Predictions using dynamic modellingMachine Learning and  Neural Networks are not the only way to do data science or AI. There are other techniques to explore  , for example, from quantitative economics. Apart from Game Theory, dynamic modelling could be suitable to many prediction problems, specially the ones with temporal datasets. Here is one example technique...

Read more »

Economy and dynamic modelling: Haavelmo’s approach

July 25, 2016
By

Econometrics aims at estimating observables in the economy and their inter-dependencies and testing the estimates against the economic reality. A quantitative approach to express these inter-dependencies appear as simultaneous equations, an i.e. system of linear equations, this is  a mathematical structure of economic relationships that were made possible with the pioneering work of Nobel prize winning economist Trygve Haavelmo...

Read more »

S-shaped data: Smoothing with quasibinomial distribution

January 16, 2016
By
S-shaped data: Smoothing with quasibinomial distribution

Figure 1: Synthetic data and fitted curves.S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function .  Cumulative distribution function of logistic distribution function is a...

Read more »

S-shaped data: Smoothing with quasibinomial distribution

January 16, 2016
By
S-shaped data: Smoothing with quasibinomial distribution

Figure 1: Synthetic data and fitted curves. S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function .  Cumulative distribution function of logistic distribution function is a logistic function, i.e., logit.To demonstrate this, in this short example, after generating a synthetic data, we will fit quasibinomial regression model to different observations.ggplot ,...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)