Blog Archives

Probabilistic interpretation of AUC

January 25, 2018
By
Probabilistic interpretation of AUC

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some time until I learned that the AUC has a nice probabilistic meaning. What’s AUC anyway? AUC is the area under the ROC curve. The ROC curve is the receiver operating characteristic curve. AUC is...

Read more »

Probabilistic interpretation of AUC

January 24, 2018
By
Probabilistic interpretation of AUC

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some until I learned that the AUC has a nice probabilistic meaning. What’s AUC anyway? Consider: A dataset : , where is a vector of features collected for the th subject, ...

Read more »

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017
By
Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the intellectual property space; my areas of interest being machine...

Read more »

Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

September 21, 2017
By
Mining USPTO full text patent data – Analysis of machine learning and AI related patents granted in 2017 so far – Part 1

The United States Patent and Trademark office (USPTO) provides immense amounts of data (the data I used are in the form of XML files). After coming across these datasets, I thought that it would be a good idea to explore where and how my areas of interest fall into the intellectual property space; my areas of interest being machine...

Read more »

Freedman’s paradox

June 5, 2017
By
Freedman’s paradox

Recently I came across the classical 1983 paper A note on screening regression equations by David Freedman. Freedman shows in an impressive way the dangers of data reuse in statistical analyses. The potentially dangerous scenarios include those where t...

Read more »

Freedman’s paradox

June 5, 2017
By
Freedman’s paradox

Recently I came across the classical 1983 paper A note on screening regression equations by David Freedman. Freedman shows in an impressive way the dangers of data reuse in statistical analyses. The potentially dangerous scenarios include those where t...

Read more »

5 ways to measure running time of R code

May 27, 2017
By
5 ways to measure running time of R code

A reviewer asked me to report detailed running times for all (so many :scream:) performed computations in one of my papers, and so I spent a Saturday morning figuring out my favorite way to benchmark R code. This is a quick summary of the options I fou...

Read more »

5 ways to measure running time of R code

May 27, 2017
By
5 ways to measure running time of R code

A reviewer asked me to report detailed running times for all (so many :scream:) performed computations in one of my papers, and so I spent a Saturday morning figuring out my favorite way to benchmark R code. This is a quick summary of the options I found to be available. A quick online search revealed at least three R packages...

Read more »

Salaries by alma mater – an interactive visualization with R and plotly

April 27, 2017
By
Salaries by alma mater – an interactive visualization with R and plotly

Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salary increase, but more on that later). However, I thought that it would be a lot more informative, if it were...

Read more »

Salaries by alma mater – an interactive visualization with R and plotly

April 27, 2017
By
Salaries by alma mater – an interactive visualization with R and plotly

Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salar...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)