Articles by msuzen

Teaching to machines: What is learning in machine learning entails?

November 16, 2017 | msuzen

Preamble Figure 1: The oldest learning institution  in the world; University of Bologna. (Source: Wikipedia). Machine Learning (ML) is now a de-facto skill for every quantitative job and almost every industry embraced it, even though fundamentals of the field is not new at all. However, what does it mean to teach ... [Read more...]

Understanding overfitting: an inaccurate meme in supervised learning

August 16, 2017 | msuzen

Preamble There is a lot of confusion among practitioners regarding the concept of overfitting. It seems like, a kind of an urban legend or a meme, a folklore is circulating in data science or allied fields with the following statement:Applying cross-validation prevents overfitting and a good out-of-sample performance, low ...
[Read more...]

Post-statistics: Lies, damned lies and data science patents

August 5, 2017 | msuzen

US Patent (Wikipedia) Statistics is so important field in our daily lives nowadays, the emerging field of 50 years old data science that is applied to almost every human activity now, or post-statistics, a kind of post-rock,  fusing operations research, data mining, software and performance engineering and of course multitude fields ...
[Read more...]

Pitfalls in pseudo-random number sampling at scale with Apache Spark

June 15, 2017 | msuzen

In many data science applications and in academic research, techniques involving Bayesian Inference is now used commonly. One of the basic operation in Bayesian Inference techniques is drawing instances from given statistical distribution. This of course well known pseudo-random number sampling. Most commonly used methods first generates uniform random number ...
[Read more...]

Practical Kullback-Leibler (KL) Divergence: Discrete Case

January 7, 2017 | msuzen

KL divergence (Kullback-Leibler57) or KL distance is non-symmetric measure of difference between two probability distributions. It is related to mutual information and can be used to measure the association between two random variables.Figure: Distance between two distributions. (Wikipedia)In this short tutorial, I show how to compute KL divergence ...
[Read more...]

Economy and dynamic modelling: Haavelmo’s approach

July 25, 2016 | msuzen

Updated on 25 August 2017Preamable: Predictions using dynamic modellingMachine Learning and  Neural Networks are not the only way to do data science or AI. There are other techniques to explore  , for example, from quantitative economics. Apart from Game Theory, dynamic modelling could be suitable to many prediction problems, specially the ones ... [Read more...]

Economy and dynamic modelling: Haavelmo’s approach

July 25, 2016 | msuzen

Econometrics aims at estimating observables in the economy and their inter-dependencies and testing the estimates against the economic reality. A quantitative approach to express these inter-dependencies appear as simultaneous equations, an i.e. system of linear equations, this is  a mathematical structure of economic relationships that were made possible with ... [Read more...]

S-shaped data: Smoothing with quasibinomial distribution

January 16, 2016 | msuzen

Figure 1: Synthetic data and fitted curves. S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function [1].  Cumulative distribution function of logistic distribution function is a logistic function, i.e., logit.To demonstrate this, in this short example, after generating a synthetic data, ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)