Recreating ‘Unknown Pleasures’ graphic

July 14, 2019
By
Recreating ‘Unknown Pleasures’ graphic

For some time I’ve wanted to recreate the cover art from Joy Division’s Unknown Pleasures album. The visualisation depicts successive pulses from the pulsar PSR B1919+21, discovered by Jocelyn Bell in 1967. Album art.Data The first obstacle was acquiring the data. I found a D3 visualisation by Mike Bostock. This in turn pointed me to a CSV file in a gist...

Read more »

Distribution of Headline Sentiment

July 14, 2019
By
Distribution of Headline Sentiment

My web scraping project explored the distribution of headline sentiment by news source. To do this, I scraped the Nasdaq latest market headlines page and applied sentiment analysis to the retrieved text. It should be noted that I only scraped one web page, but this page aggregates headlines from multiple sources. I wanted to see

Read more »

rOpenSci Announces $678K Award from the Sloan Foundation to Expand Software Peer Review

We’re delighted to announce that we have received new funding from the Alfred P. Sloan Foundation. The $678K grant, awarded through the Foundation’s Data & Computational Research program, will be used to expand our efforts in software peer review. Software peer review has become a core part of rOpenSci, helping improve scientific software quality, drive best engineering practices into scientific...

Read more »

Experimenting with Hierarchical Clustering in a galaxy far far away…

July 14, 2019
By
Experimenting with Hierarchical Clustering in a galaxy far far away…

Introduction This post will be taking a bit of an unexpected diversion. As I was experimenting with hierarchical clustering I ran into the issue of how many clusters to assume. From that point I went deep into the rabbit hole and found out some really useful stuff that I wish I’d have known when I wrote my previous post. I’ve discovered...

Read more »

rstudio::conf(2020) is open for registration!

July 14, 2019
By
rstudio::conf(2020) is open for registration!

rstudio::conf, the conference for all things R and RStudio, will take place January 29 and 30, 2020 in San Francisco, California. It will be preceded by Training Days on January 27 and 28. Early Bird registration is now open! Conference: Wednesday-Thursday, Jan 29-30 Join me, your host and Chief Scientist of RStudio, for our keynote speakers: Hilary Parker (Stitch Fix) and Roger Peng...

Read more »

Yet Another R Package for General Regression Neural Network

July 14, 2019
By

Compared with other types of neural networks, General Regression Neural Network (Specht, 1991) is advantageous in several aspects. Being an universal approximation function, GRNN has only one tuning parameter to control the overall generalization The network structure of GRNN is surprisingly simple, with only one hidden layer and the number of neurons equal to the

Read more »

Find the best predictive model using R/caret package/modelgrid

Find the best predictive model using R/caret package/modelgrid

Are you interested in guest posting? Publish at DataScience+ via your editor (i.e., RStudio). Category Advanced Modeling Tags caret Linear Regression R Programming It's tough to make predictions, especially about the future (Yogi Berra), but I think the way to get there shouldn't be. I have built a new shiny application BMuCaret to fit and evaluate multiple classifiers and select the best one, which achieves...

Read more »

Forecast Combination in R – slides

July 14, 2019
By
Forecast Combination in R – slides

The useR! 2019 held in Toulouse ended couple of days ago. I spoke of the recent R journal publication about forecast combinations (joint work with Christoph Weiss and Gernot Roetzer). Slides for the talk can be found here. Related posts: R Journal publication The R Journal is the open access, refereed journal of... Forecast combinations in R Few weeks back...

Read more »

Some Details on Running xgboost

July 14, 2019
By

While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). In doing that I ran into one more avoidable but strange issue in using xgboost: when … Continue reading Some...

Read more »

Writing Functions in R: Example One

July 13, 2019
By

A. Background In previous posts, I covered a number of useful functions and packages for writing reusable code. I wanted to extend on that information by providing a working example of how to put together a function. In particular, I will walk through the process of generating a function that executes evaluation of a time … Continue reading Writing...

Read more »

Back from useR! 2019

July 13, 2019
By

I’m back from useR! 2019!, Toulouse, where I gave one talk and a workshop. Here are the links to the materials. 2019-07-08 Contributing to the R ecosystem useR! newbie session A short talk about things you can do as a beginner to contibute to...

Read more »

Simulating Data in R: Examples in Writing Modular Code

July 13, 2019
By
Simulating Data in R: Examples in Writing Modular Code

Simulating data is an invaluable tool. I use simulations to conduct power analyses, probe how robust methods are to violating assumptions, and examine how different methods handle different types of data. If I’m learning something new or writing a model from scratch, I’ll simulate data so that I know the correct answer—and make sure my model gives me that...

Read more »

Quick Hit: {waffle} 1.0 Font Awesome 5 Pictograms and More

July 12, 2019
By
Quick Hit: {waffle} 1.0 Font Awesome 5 Pictograms and More

The {waffle} package got some 💙 this week and now has a substantially improved geom_waffle() along with a brand new sibling function geom_pictogram() which has all the powerful new features of geom_waffle() but lets you use Font Awesome 5 brand and solid glyphs to make isotype pictograms. A major new feature is that stat_waffle() (which... Continue reading →

Read more »

useR! 2019 Slides on Futures

July 12, 2019
By
useR! 2019 Slides on Futures

Below are the slides for my Future: Simple Parallel and Distributed Processing in R that I presented at the useR! 2019 conference in Toulouse, France on July 9-12, 2019. My talk (25 slides; ~15+3 minutes): Title: Future: Simple Parallel and Dist...

Read more »

Testing the Collatz Conjecture with R

July 12, 2019
By
Testing the Collatz Conjecture with R

Background The Collatz Conjecture is a famous unsolved problem in number theory. If you’re not familiar with it – the conjecture is very simple to understand, yet, no one has been able to mathematically prove that the conjecture is true (though it’s been shown to be true for an enormous number of cases). The conjecture The post Testing the...

Read more »

Cricketr learns new tricks : Performs fine-grained analysis of players

July 12, 2019
By
Cricketr learns new tricks : Performs fine-grained analysis of players

“He felt that his whole life was some kind of dream and he sometimes wondered whose it was and whether they were enjoying it.” “The ships hung in the sky in much the same way that bricks don’t.” “We demand rigidly defined areas of doubt and uncertainty!” “For a moment, nothing happened. Then, after a … Continue reading Cricketr...

Read more »

Numerical integration over an infinite interval in Rcpp (part 2)

July 11, 2019
By

In a previous post I have shown that without intervention RcppNumerical does not handle integration over infinite ranges. In this post I want to generalize the method to integrals where only one of the limits is infinite. In addition, I want to make it more user friendly, since I would like to write something like // ] // ] #include namespace rstub...

Read more »

a non-riddle

July 11, 2019
By

Unless I missed a point in the last riddle from the Riddler, there is very little to say about it: Given N ocre balls, N aquamarine balls, and two urns, what is the optimal way to allocate the balls to the urns towards drawing an ocre ball with no urn being empty? Both my reasoning

Read more »

Common Ensemble Models can be Biased

July 11, 2019
By
Common Ensemble Models can be Biased

In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a biased model may be preferable; biased models may … Continue reading Common...

Read more »

useR!2019 in Toulouse, France

July 11, 2019
By

Salut mes amis! Today I’ve presented my package at the useR!2019 conference in Toulouse, France. This is a nice conference, focused on specific solutions to specific problems. Here, people tend to present functions from their packages (not underlying models, like, for example, at ISF). On one hand, this has its own limitations, but on

Read more »

A Twitter network of members of the 19th German Bundestag – part II

July 11, 2019
By
A Twitter network of members of the 19th German Bundestag – part II

This is the second part about my project that deals with the Twitter network of members of the Bundestag. After … Read More →

Read more »

A Twitter network of members of the 19th German Bundestag – part I

July 11, 2019
By
A Twitter network of members of the 19th German Bundestag – part I

For the R tutorial that I gave at the WZB in the previous semester, I gave an introduction on how … Read More →

Read more »

Community Call – Reproducible Research with R

Community Call – Reproducible Research with R

Our 1-hour Call on Reproducible Research with R will include three speakers and 20 minutes for Q & A. Ben Marwick will introduce you to a research compendium, which accompanies, enhances, or is a scientific publication providin...

Read more »

Pairwise Bayesian Comparisons – even faster

Pairwise Bayesian Comparisons – even faster

This post builds upon two earlier posts: Comparing Frequentist, Bayesian and Simulation methods and conclusions More Bayes and multiple comparisons Background This all started with a nice post from Anindya Mozumdar on the R Bloggers feed. The topic material was fun for me (analyzing the performance of male 100m sprinters and the fastest man on earth), as well as exploring bayesian methods. Last post in this series I made use...

Read more »

My 2 cents on the “R vs Python” squabble

July 10, 2019
By
My 2 cents on the “R vs Python” squabble

Intro In this post I’ll make an exception and instead of sharing my research I’ll chime in on the never ending “R vs Python” squabble. The tl;dr version is: R is superior to Python for doing data science If you’re a newcomer to data ...

Read more »

Rmd first: When development starts with documentation

July 10, 2019
By
Rmd first: When development starts with documentation

Documentation matters ! Think about future you and others. Whatever is the aim of your script and analyses, you should think about documentation. The way I see it, R package structure is made for that. Let me try to convince you. At use’R 2019 in Toulouse, I did a presentation entitled: ‘The “Rmd first” method: when projects start with...

Read more »

EARLy bird ticket sales end 31 July

July 10, 2019
By

EARL London 2019 is getting closer! We’ve got a great line up of speakers from a huge range of industries and three fantastic keynote speakers – Helen Hunter – Sainsbury’s, Julia Silge – Stack Overflow and Tim Paulden – ATASS Sports. Our early bird ticket offer is coming to an end on 31 July, this is your last chance...

Read more »

Bang Bang – How to program with dplyr

July 10, 2019
By
Bang Bang – How to program with dplyr

Never heard of non-standard evaluation? Then our colleague Markus has the perfect answer for you: Bang Bang! In this blog post, Markus introduces meta-programming when using dplyr. Der Beitrag Bang Bang – How to program with dplyr erschien zuerst auf STATWORX.

Read more »

R 3.6.1 is now available

July 10, 2019
By

On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror. R 3.6.1 is a minor update to R that fixes a few bugs. As usual with a minor release, this version is backwards-compatible...

Read more »

Search R-bloggers

Sponsors