CRAN Mirror “Security”

March 3, 2019
By
CRAN Mirror “Security”

In the “Changes on CRAN” section of the latest version of the The R Journal (Vol. 10/2, December 2018) had this short blurb entitled “CRAN mirror security”: Currently, there are 100 official CRAN mirrors, 68 of which provide both secure downloads via ‘https’ and use secure mirroring from the CRAN master (via rsync through ssh... Continue reading →

Read more »

Offline visualization of geolocation data from Statcounter logs with R

March 3, 2019
By
Offline visualization of geolocation data from Statcounter logs with R

Statcounter is a nice web traffic analysis tool. It collects ISP and geolocation data of visitors of a tracked site. The data is logged on the Statcounter site and can be downloaded by the tracked site’s owner in XLSX or CSV format. In this article I want to show how I managed to visualize geolocation data from the CSV...

Read more »

Run Remote R Scripts with Mobile Device using E-mail Triggers

March 2, 2019
By
Run Remote R Scripts with Mobile Device using E-mail Triggers

Have you ever been on the road and wished you could run an R script from your mobile device and see the results? Maybe you’re a business person who needs a quick update on a project or production schedule. Or, possibly you need an up-to-the-minute report out for a meeting, and you don’t have a “cloud based solution to...

Read more »

Color palettes inspired by Islamic art

Color palettes inspired by Islamic art

This post is about my new R package IslamicArt, which provides color palettes inspired by Islamic art. Disclaimer: While I accept the Islamic theology, ethnically speaking, I’m not from the Middle East, North Africa, Central Asia, South Asia, or Southeast Asia. However, I do deeply...

Read more »

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Classification of historical newspapers content: a tutorial combining R, bash and Vowpal Wabbit

Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four (1, 2, 3 and 4) blog posts, but there’s still a lot to explore. This blog post uses a new batch of data announced on twitter: For all who love to analyse text, the BnL released half a million of processed newspaper articles. Historical news from 1841-1878. They directly...

Read more »

Efficient MCMC with Caching

March 2, 2019
By

This post is part of a running series on Bayesian MCMC tutorials. For updates, follow @StableMarkets. Metropolis Review Metropolis-Hastings is an MCMC algorithm for drawing samples from a distribution known up to a constant of proportionality, $latex p(\theta | y) \propto p(y|\theta)p(\theta)$. Very briefly, the algorithm works by starting with some initial draw $latex \theta^{(0)}$ then running … Continue reading Efficient...

Read more »

Using the R Package Profvis on a Linear Model

March 2, 2019
By
Using the R Package Profvis on a Linear Model

Not all data scientists were computer scientists who discovered their exceptional data literacy skills. They come from all walks of life, and sometimes that can mean optimizing for data structures and performance isn’t the top priority. That’s perfectly fine! There may come a time where you find yourself executing a chunk of code and consciously … Continue reading Using...

Read more »

rquery Substitution

March 2, 2019
By

The rquery R package has several places where the user can ask for what they have typed in to be substituted for a name or value stored in a variable. This becomes important as many of the rquery commands capture column names from un-executed code. So knowing if something is treated as a symbol/name (which … Continue reading rquery...

Read more »

Creating blazing fast pivot tables from R with data.table – now with subtotals using grouping sets

March 2, 2019
By
Creating blazing fast pivot tables from R with data.table – now with subtotals using grouping sets

Introduction Data manipulation and aggregation is one of the classic tasks anyone working with data will come across. We of course can perform data transformation and aggregation with base R, but when speed and memory efficiency come into play, data.table is my package of choice. In this post we will look at of the fresh and very useful functionality that came...

Read more »

Creating blazing fast pivot tables from R with data.table – now with subtotals using grouping sets

March 2, 2019
By
Creating blazing fast pivot tables from R with data.table – now with subtotals using grouping sets

Introduction Data manipulation and aggregation is one of the classic tasks anyone working with data will come across. We of course can perform data transformation and aggregation with base R, but when speed and memory efficiency come into play, data.table is my package of choice. In this post we will look at of the fresh and very useful functionality that came...

Read more »

How the Victorians Mapped London’s Cholera

March 2, 2019
By
How the Victorians Mapped London’s Cholera

It is, of course, John Snow who is credited with using maps to demonstrate that the clusters of deaths from cholera in London’s Soho during London’s 1854 outbreak were caused by contaminated water. This marked a major shift in thinking away from the disease being transmitted through dirty air: the more widely accepted theory at

Read more »

Visualizing Bike Share Data (NiceRide)

March 1, 2019
By
Visualizing Bike Share Data (NiceRide)

This tutorial will cover exploring and visualizing data through 2018 for the Minneapolis, MN bike sharing service NiceRide. Part of what makes R incredible isContinue ReadingVisualizing Bike Share Data (NiceRide)

Read more »

When principal component is not unique

When principal component is not unique

This quarter, I’m TAing my adviser’s class on computational biology. Though I have taken this class a year ago and got an A, TAing really deepened my understanding of the course material, much of which I have long been using routinely without thinking, such as...

Read more »

R blogs I follow

This page is about R resources. I also have a list of resources about dialogues between science and religion. One of my favorite aspects of R is the vibrant R community. A way to learn from the community - new tools, cool efficient tricks, something to...

Read more »

Tractatus Logico (Phylo)sophicus

March 1, 2019
By
Tractatus Logico (Phylo)sophicus

Over the Christmas holidays, I read "Maths Meets Myths: Quantitative Approaches to Ancient Narratives," from the Springer Understanding Complex Systems collection. The authors present their application of "hard" science techniques to datasets coming from the humanities -- mostly large corpus of texts, legends and myths. One paper in particular uses bioinformatics and phylogenetics to study the spread of a popular folk...

Read more »

My Shiny Dashboard, Milwaukee Beer

March 1, 2019
By
My Shiny Dashboard, Milwaukee Beer

Milwaukee Beer - Inspired by my Job Hunt I’m excited to launch my latest Shiny app - “Milwaukee Beer” - which I made to learn shinydashboard. Due to my decision to return to the USA and hunt for a career in data, I decided to add another project to my portfolio. Milwaukee Beer is a metric tracking dashboard that provides...

Read more »

An architecture for real-time scoring with R

March 1, 2019
By
An architecture for real-time scoring with R

Let's say you've developed a predictive model in R, and you want to embed predictions (scores) from that model into another application (like a mobile or Web app, or some automated service). If you expect a heavy load of requests, R running on a single server isn't going to cut it: you'll need some kind of distributed architecture with...

Read more »

The delta method and its implementation in R

March 1, 2019
By

Suppose that you have a sample of a variable of interest, e.g. the heights of men in certain population, and for some obscured reason you are interest not in the mean height μ but in its square μ². How would you inference on μ², e.g. test a hypothesis or calculate a confidnce interval? The delta … Continue reading The...

Read more »

Powerball demystified

March 1, 2019
By

The US Powerball lottery hysteria took another step when no one won the big jackpot in the last draw that took place on October 20, 2018. So, the total jackpot is now 2.22 billion dollars. I am sure that you want to win this jackpot. I myself want to win it. Actually, there are two different … Continue reading Powerball...

Read more »

R Journal publication

March 1, 2019
By
R Journal publication

The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R. Christoph Weiss, Gernot Roetzer and myself have joined forces to write an R package and the accompanied paper: Forecast... Related posts: R tips and tricks...

Read more »

A brief history of clinical trials

March 1, 2019
By

The earliest report of a clinical trial is probably provided in the Book of Daniel. Daniel and a group of other Jewish people who stayed at the palace of the king of Babylon, did not want to eat the king’s non-Kosher food and preferred a vegetarian diet. To show that vegetarian and Kosher diet is healthier, … Continue reading A...

Read more »

Bayesian state space modelling of the Australian 2019 election by @ellis2013nz

Bayesian state space modelling of the Australian 2019 election by @ellis2013nz

So I’ve been back in Australia for five months now. While things have been very busy in my new role at Nous Group, it’s not so busy that I’ve failed to notice there’s a Federal election due some time by November this year. I’m keen to apply some of the techniques I used in New Zealand in the richer...

Read more »

What is logistic in the logistic regression?

March 1, 2019
By
What is logistic in the logistic regression?

Suppose that you are interviewed for a data scientist role. You are asked about logistic regression, and you answer all sorts of questions: How to run it in Python, how would you perform feature selection, and how would you use it for prediction. For the last question you answer that if you have the estimated of the regression … Continue reading What...

Read more »

Some comments on AB testing implementation

March 1, 2019
By

Many job postings in the field of technology (mainly for Data Scientist jobs, but not only) require knowledge and/or experience in “AB testing”. What is AB testing? A brief inspection at Wikipedia reveals that this is a method for assessing the impact of a certain change when it is carried out. For example, one may sick … Continue reading Some...

Read more »

Binning Data in a Database

February 28, 2019
By
Binning Data in a Database

Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. He compares a case-based approach (where the bin divisions are stuffed into code) with a join based approach. He shares code and timings. Best of all: rquery gets some attention and turns out to be the dominant … Continue reading Binning...

Read more »

Binning Columns in Remote Tables with dplyr and rquery

February 28, 2019
By
Binning Columns in Remote Tables with dplyr and rquery

We’ll benchmark performance on three methods for binning columns in remote database tables in R. The CASE WHEN (dplyr::case_when) statement and a natural join from dplyr with be compared to using a natural join with the rquery package.

Read more »

Creating a Favicon with R

February 28, 2019
By
Creating a Favicon with R

I use the Hugo Coder theme for this website, but I don’t like the default favicon, so I decided to make a new one using ggplot2. For those of you who don’t know, a favicon is the little icon that shows up on your browser tab next to the website name (in most browsers). How hard could it be, right?...

Read more »

Some R Packages for ROC Curves

February 28, 2019
By
Some R Packages for ROC Curves

In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages. Although I began with a few ideas about packages...

Read more »

htmlunitjars Updated to 2.34.0

February 28, 2019
By

The in-dev htmlunit package for javascript-“enabled” web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-“enabled” sites. The Chrome emulation is now also on-par with Chrome 72... Continue reading →

Read more »

Search R-bloggers

Sponsors