A little bit about me and this blog - Hello everyone! This is the first (ever) post for a new blog about Finance and R. My name is Marcelo Perlin and my day job is assistant professor of Finance in Federal...

I have splitted ggtree to 2 packages, treeio and ggtree. Now ggtree is mainly focus on visualization and annotation, while treeio focus on parsing and exporting tree files. Here is a welcome message from treeio that you can convert ggtree output to tree object which can be exported as newick or nexus file if you want. Thanks to ggplot2, output...

[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through. However, it should be of interest as it touches on some of the fine points of cross-validation that are quite hard to perceive or … Continue...

This note briefly introduces the tidykml package, which turns basic KML geometries into tidy data frames that can be visualized with ggplot2. Summary The tidykml package provides a quick way to import data from Google My Maps into R, in a format that makes it easy to manipulate the data and visualize it with ggplot2. Below is...

You've been able to include user-defined charts using R in Power BI dashboards for a while now, but a recent update to Power BI includes seven new custom charts based on R in the customs visuals gallery. You can see the new chart types by visiting the Power BI Custom Visuals Gallery and clicking on the "R-powered visuals" tab....

I was recently trying various outlier detection algorithms. For me, the best way to understand an algorithm is to tinker with it. I built a shiny app that allows you to play around with various outlier algorithms and wanted to share it with everyone. The shiny app is available on my site, but even better,

I’m always looking for ways to spark my kid’s interest in computers, data, etc. This has proven to be more difficult than I thought it would be (kids these days…). I suspect this may have something to do with the ubiquity of electronic devices that “just work”, making them less novel and less interesting to tinker with, but speculation...

The Riddler of this week has an extinction riddle which summarises as follows: One observes a population of N individuals, each with a probability of 10⁻⁴ to kill the observer each day. From one day to the next, the population decreases by one individual with probability K√N 10⁻⁴ What is the value of K that

using Accept-Reject method - Shifted Gompertz distribution Shifted Gompertz distribution is useful distribution which can be used to describe time needed for adopting new innovation within the market. Recent studies showed that it outperforms Bass model of diffusion in some cases1. Its pdf is given by Below we...

7 Visualizations You Should Learn in R With ever increasing volume of data, it is impossible to tell stories without visualizations. Data visualization is an art of how to turn numbers into useful knowledge. R Programming lets you learn this art by offering a set of inbuilt functions and libraries to build visualizations and present...

As mentioned in the post on classification with linear discriminant analysis, LDA assumes the groups in question have equal covariance matrices . Therefore, often when the groups do not have equal covariance matrices, observations are frequently assigned to groups with large variances on the diagonal of its corresponding covariance matrix... The post Quadratic Discriminant Analysis of Two Groups...

There are more than 15,000 restaurants in Chicago, but fewer than 40 inspectors tasked with making sure they comply with food-safety standards. To help prioritize the facilities targeted for inspection, the City of Chicago used R to create a model that predicts which restaurants are most likely to fail an inspection. Using this model to deploy inspectors, the City...

If you followed through the Basic Decision Tree exercise, this should be useful for you. This is like a continuation but we add so much more. We are working with a bigger and badder datasets. We will be also using techniques we learned from model evaluation and work with ROC, accuracy and other metrics. Answers

Hey R fans! A new episode of DataCamp's DataChats video series is out! In this episode, we interview Jo Hardin. Jo is a Professor of Mathematics at Pomona College with many years of R experience. She has a pure passion for education and has been w...

Solution to Euler Problem 5 in the R Language for Statistical Computing: What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20? Continue reading → The post Euler Problem 5: Smallest Multiple appeared first on The Devil is in the Data.

In this post, I will show you how to optimize your Rcpp loops so that they are 2 to 3 times faster than a standard implementation. Context Real data example For this post, I will use a big.matrix which represents genotypes for 15,283 individuals, corresponding to the number of mutations (0, 1 or 2) at 287,155 different loci. Here, I will use...

Switzerland is a country with lots of mountains, and several large lakes. While the political subdivisions (called municipalities) cover the high mountains and lakes, nothing much of economic interest happens in these places. (Raclette and sailing are wonderful, but don't count for our purposes.) For this reason, the Swiss Federal Statistical Office publishes the boundaries of the "productive" parts...

Yesterday we published our 100th set of exercises on R-exercises. Kudos and many thanks to Avi, Maria Elisa, Euthymios, Francisco, Imtiaz, John, Karolis, Mary Anne, Matteo, Miodrag, Paritosh, Sammy, Siva, Vasileios, and Walter for contributing so much great material to practice R programming! Even more thanks to Onno, who is working (largely) behind the scenes