Alexej's blog | R-bloggers

Probabilistic interpretation of AUC

January 25, 2018 | Alexej's blog

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some time until I learned that the AUC has a nice probabilistic meaning. What’s AUC anyway? AUC is the area under ... [Read more...]

Probabilistic interpretation of AUC

January 24, 2018 | Alexej's blog

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some until I learned that the AUC has a nice probabilistic meaning. What’s AUC anyway? Consider: A dataset : , where is a ... [Read more...]

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017 | Alexej's blog

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the ... [Read more...]

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017 | Alexej's blog

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the ...

[Read more...]

Freedman’s paradox

June 5, 2017 | Alexej's blog

Recently I came across the classical 1983 paper A note on screening regression equations by David Freedman. Freedman shows in an impressive way the dangers of data reuse in statistical analyses. The potentially dangerous scenarios include those where t...

[Read more...]

Freedman’s paradox

June 5, 2017 | Alexej's blog

Recently I came across the classical 1983 paper A note on screening regression equations by David Freedman. Freedman shows in an impressive way the dangers of data reuse in statistical analyses. The potentially dangerous scenarios include those where t... [Read more...]

5 ways to measure running time of R code

May 27, 2017 | Alexej's blog

A reviewer asked me to report detailed running times for all (so many :scream:) performed computations in one of my papers, and so I spent a Saturday morning figuring out my favorite way to benchmark R code. This is a quick summary of the options I fou...

[Read more...]

5 ways to measure running time of R code

May 27, 2017 | Alexej's blog

A reviewer asked me to report detailed running times for all (so many :scream:) performed computations in one of my papers, and so I spent a Saturday morning figuring out my favorite way to benchmark R code. This is a quick summary of the options I found to be available. ... [Read more...]

Salaries by alma mater – an interactive visualization with R and plotly

April 27, 2017 | Alexej's blog

Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salar... [Read more...]

Salaries by alma mater – an interactive visualization with R and plotly

April 27, 2017 | Alexej's blog

Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salary increase, but more on that later). However, I thought that it ...

[Read more...]

Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

April 4, 2017 | Alexej's blog

In many applications, data naturally form an n-way tensor with n __ 2, rather than a “tidy” table. As mentioned in the beginning of my last blog post, a tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor ... [Read more...]

Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

April 4, 2017 | Alexej's blog

In many applications, data naturally form an n-way tensor with n __ 2, rather than a “tidy” table. As mentioned in the beginning of my last blog post, a tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor ... [Read more...]

Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

April 2, 2017 | Alexej's blog

A tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor of order two is a matrix, which is basically numbers arranged in a rectangle, a tensor of order three looks like numbers arranged in rectangular box (... [Read more...]

Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

April 2, 2017 | Alexej's blog

A tensor is essentially a multi-dimensional array: a tensor of order one is a vector, which simply is a column of numbers, a tensor of order two is a matrix, which is basically numbers arranged in a rectangle, a tensor of order three looks like numbers arranged in rectangular box (... [Read more...]

Contours of statistical penalty functions as GIF images

March 17, 2017 | Alexej's blog

Many statistical modeling problems reduce to a minimization problem of the general form: or where $f$ is some type of loss function, $\mathbf{X}$ denotes the data, and $g$ is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the ... [Read more...]

Contours of statistical penalty functions as GIF images

March 17, 2017 | Alexej's blog

Many statistical modeling problems reduce to a minimization problem of the general form: or where is some type of loss function, denotes the data, and is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the way). Of course both, ... [Read more...]

2D contours of several penalty functions in statistics as GIF images

March 13, 2017 | Alexej's blog

Many statistical modeling problems reduce to a minimization problem of the general form: or where $f$ is some type of loss function, $\mathbf{X}$ denotes the data, and $g$ is a penalty, also referred to by other names, such as “regularization term” (problems (1) and (2-3) are often equivalent by the ... [Read more...]

Tired of doing real math 2 — grad school and coffee consumption

February 15, 2017 | Alexej's blog

Lately I notice a sharp increase in my coffee consumption (reading Howard Schultz’s Starbucks book, which is actually quite good by the way, does not help either :grimacing:). Having recently transitioned into a new PhD program I started wondering whether my increased coffee consumption has something to do with ...

[Read more...]

Tired of doing real math 2 — grad school and coffee consumption

February 15, 2017 | Alexej's blog

Lately I notice a sharp increase in my coffee consumption (reading Howard Schultz’s Starbucks book, which is actually quite good by the way, does not help either :grimacing:). Having recently transitioned into a new PhD program I started wondering wh... [Read more...]

Visualization of MRI data in R

January 27, 2017 | Alexej's blog

Lately I was getting a little bored with genomic data (and then TCGA2STAT started to give me a segfault on my university’s high performance computing facility too :stuck_out_tongue:). So I decided to analyze some brain imaging data that I had lying a... [Read more...]

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by Alexej's blog

Probabilistic interpretation of AUC

Probabilistic interpretation of AUC

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

Freedman’s paradox

Freedman’s paradox

5 ways to measure running time of R code

5 ways to measure running time of R code

Salaries by alma mater – an interactive visualization with R and plotly

Salaries by alma mater – an interactive visualization with R and plotly

Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

Understanding the CANDECOMP/PARAFAC Tensor Decomposition, aka CP; with R code

Contours of statistical penalty functions as GIF images

Contours of statistical penalty functions as GIF images

2D contours of several penalty functions in statistics as GIF images

Tired of doing real math 2 — grad school and coffee consumption

Tired of doing real math 2 — grad school and coffee consumption

Visualization of MRI data in R

Articles by Alexej's blog

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)