Blog Archives

Probabilistic interpretation of AUC

January 24, 2018
By
Probabilistic interpretation of AUC

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some until I learned that the AUC has a nice probabilistic meaning. What’s AUC anyway? Consider: A dataset : , where is a vector of features collected for the th subject, ...

Read more »

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017
By
Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the intellectual property space; my areas of interest being machine...

Read more »

Freedman’s paradox

June 5, 2017
By
Freedman’s paradox

Recently I came across the classical 1983 paper A note on screening regression equations by David Freedman. Freedman shows in an impressive way the dangers of data reuse in statistical analyses. The potentially dangerous scenarios include those where t...

Read more »

5 ways to measure running time of R code

May 27, 2017
By
5 ways to measure running time of R code

A reviewer asked me to report detailed running times for all (so many :scream:) performed computations in one of my papers, and so I spent a Saturday morning figuring out my favorite way to benchmark R code. This is a quick summary of the options I found to be available. A quick online search revealed at least three R packages...

Read more »

Salaries by alma mater – an interactive visualization with R and plotly

April 27, 2017
By
Salaries by alma mater – an interactive visualization with R and plotly

Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salar...

Read more »

Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

April 4, 2017
By
Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

In many applications, data naturally form an n-way tensor with n __ 2, rather than a “tidy” table. As mentioned in the beginning of my last blog post, a tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor of order two is a matrix, which is...

Read more »

Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

April 2, 2017
By
Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

A tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor of order two is a matrix, which is basically numbers arranged in a rectangle, a tensor of order three looks like numbers arranged in rectangular box (or a cube, if all modes have the...

Read more »

Contours of statistical penalty functions as GIF images

March 17, 2017
By
Contours of statistical penalty functions as GIF images

Many statistical modeling problems reduce to a minimization problem of the general form: or where $f$ is some type of loss function, $\mathbf{X}$ denotes the data, and $g$ is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the way). Of course both, $f$ and $g$, may depend on further...

Read more »

2D contours of several penalty functions in statistics as GIF images

March 13, 2017
By
2D contours of several penalty functions in statistics as GIF images

Many statistical modeling problems reduce to a minimization problem of the general form: or where $f$ is some type of loss function, $\mathbf{X}$ denotes the data, and $g$ is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the way). Of course both, $f$ and $g$, may depend on further...

Read more »

Tired of doing real math 2 — grad school and coffee consumption

February 15, 2017
By
Tired of doing real math 2 — grad school and coffee consumption

Lately I notice a sharp increase in my coffee consumption (reading Howard Schultz’s Starbucks book, which is actually quite good by the way, does not help either :grimacing:). Having recently transitioned into a new PhD program I started wondering wh...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)