With our powers combined! xgboost and pipelearner

February 6, 2017
By
With our powers combined! xgboost and pipelearner

@drsimonj here to show you how to use xgboost (extreme gradient boosting) models in pipelearner.  Why a post on xgboost and pipelearner? xgboost is one of the most powerful machine-learning libraries, so there’s a good reason to use it. pipelearner helps to create machine-learning pipelines that make it easy to do cross-fold validation, hyperparameter grid searching, and more....

Read more »

sjPlot-update: b&w-Figures for Print Journals and Package Vignettes #rstats #dataviz

February 6, 2017
By
sjPlot-update: b&w-Figures for Print Journals and Package Vignettes #rstats #dataviz

My sjPlot-package was just updated on CRAN with some – as I think – useful new features. First, I have added some vignettes to the package (based on the existing online-documentation) that cover some core features and principles of the sjPlot-package, so you have direct access to these manuals within R. The vignettes are also

Read more »

How successful can an R meetup be? meet(R) in Tricity! – RSelenium and Big Data processing

February 6, 2017
By
How successful can an R meetup be? meet(R) in Tricity! – RSelenium and Big Data processing

At Thursday (12.01.2017) we had a chance to attend the first TriCity R Users Group (Pomerania, Poland) meeting. The meetup was unexpectedly very successful! The success can be measured in the time attendees spent on ardently comments and questions aft...

Read more »

Share your knowledge at EARL 2017 – call for abstracts, San Francisco and London

February 6, 2017
By

We invite users and developers of R to submit an abstract for one or more of this year’s EARL Conferences. If you have a real-world business case use of R and you’re proud to share your experience, we want to … Continue reading →

Read more »

From a million nested `ifelse`s to the plater package

February 6, 2017
By
From a million nested `ifelse`s to the plater package

As a lab scientist, I do almost all of my experiments in microtiter plates. These tools are an efficient means of organizing many parallel experimental conditions. It's not always easy, however, to translate between the physical plate and a useful data structure for analysis. My first attempts to solve this problem--nesting one ifelse call inside of the next...

Read more »

My Book is out!

February 6, 2017
By
My Book is out!

I am happy to announce that my book about R and Finance (in portuguese) is finally available! The idea of writing a book about R started back in the end of 2015, when I decided to try something different than...

Read more »

Generate correlation matrices with complex survey data in R

February 5, 2017
By
Generate correlation matrices with complex survey data in R

The survey package is one of R’s best tools for those working in the social sciences. For many, it saves you from needing to use commercial software for research that uses survey data. However, it lacks one function that many academic researchers often need to report in publications: correlations. The svycor function in jtools (more info) helps to fill that gap. An initial...

Read more »

Strung Out On String Ops – A Brief Comparison of stringi and stringr

February 5, 2017
By
Strung Out On String Ops – A Brief Comparison of stringi and stringr

I made a promise to someone that my next blog would be about stringi vs stringr and I intend to keep said promise. stringr and stringi do “string operations”: find, replace, match, extract, convert, transform, etc. The stringr package is now part of the tidyverse and is 100% focused on string processing and is pretty... Continue reading...

Read more »

Evolving R Tools and Practices

February 5, 2017
By
Evolving R Tools and Practices

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best … Continue...

Read more »

Male mortality in Russia and Japan

February 5, 2017
By
Male mortality in Russia and Japan

Russia is sadly notorious for its ridiculously high adult male mortality. According to Human Mortality Database data (2010), the probability for a Russian men to survive from 20 to 60 was just 0.64 1. For women the probability is 0.87. This huge gender disproportion in mortality results in a peculiar sex ratio profile (see my old DemoTrends post and...

Read more »

Introduction to ggraph: Nodes

February 5, 2017
By
Introduction to ggraph: Nodes

This is the second post in my series of ggraph introductions. The first post introduced the concept of layouts, which is simply a specification on how nodes should be placed on a plane. This post will dive into how the nodes are drawn, once a layout ...

Read more »

Introduction to ggraph: Layouts

February 5, 2017
By
Introduction to ggraph: Layouts

I will soon submit ggraph to CRAN - I swear! But in the meantime I’ve decided to build up anticipation for the great event by publishing a range of blog posts describing the central parts of ggraph: Layouts, Nodes, Edges, and Connections. All of th...

Read more »

Scratching the Surface of Gender Biases

February 5, 2017
By
Scratching the Surface of Gender Biases

Today, I want to share my analysis of the World Gender Statistics dataset. Last week I already introduced my Shiny app, where you can explore 160 measurements for 164 countries over 56 years. This week I’ve included a statistical analysis of these c...

Read more »

random 0.2.6

February 5, 2017
By

A pure maintenance release of the random package for truly (hardware-based) random numbers as provided by random.org is now on CRAN. As requested by CRAN, we made running tests optional. Not running tests is clearly one way of not getting (spurious, ...

Read more »

Naming Uncertainty by the Bootstrap

February 5, 2017
By
Naming Uncertainty by the Bootstrap

Abstract Data on the names of all newborn babies in Berlin 2016 are used to illustrate how a scientific treatment of chance could enhance rank statements in, e.g., onomastics investigations. For this purpose, we first identify different stages of the naming-your-baby process, which are influenced by chance. Second, we compute confidence intervals for the ranks based on a bootstrap procedure...

Read more »

Data Science for Doctors – Part 2 : Descriptive Statistics

February 5, 2017
By
Data Science for Doctors – Part 2 : Descriptive Statistics

Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills. We will work with a health related Related exercise sets:

Read more »

Hubway Bike Share: Ridership Patterns

February 5, 2017
By
Hubway Bike Share: Ridership Patterns

Contributed by Thomas Kassel. He is currently enrolled in the NYC Data Science Academy 17-week remote bootcamp program taking place from January-April 2017. This post is based on his first class project, Exploratory The post Hubway Bike Share: Ridership Patterns appeared first on NYC Data Science Academy Blog.

Read more »

Using R to study the evolution of Tennis

February 5, 2017
By
Using R to study the evolution of Tennis

An analysis of point by point data - I’m a big fan of Tennis. When I’m not working in the university, you can probably find me in my favourite tennis club, Sogipa. What is so great about Tennis? It is a sport...

Read more »

Sex ratios in all countries from Human Mortality Database

February 4, 2017
By
Sex ratios in all countries from Human Mortality Database

Sex ratios reflect the two basic regularities of human demographics: 1) there are always more boys being born; 2) males experience higher mortality throughout their life-course. The sex ratio at birth does not vary dramatically1 and is more or less constant at the level of 105-106 boys per 100 girls. Hence, differences in the sex ratio profiles of countries...

Read more »

The animals of #actuallivingscientists

February 4, 2017
By
The animals of #actuallivingscientists

These last days a trending Twitter hashtag was “#actuallivingscientist”, whose origin can be find in this convo and whose original goal was to allow scientists to present themselves to everyone, a sort of #scicomm action. A great initiative, becaus...

Read more »

Starry Night Plots

February 4, 2017
By
Starry Night Plots

I think good data science reads like a good story. In that it flows. Has an arc. And is compelling. But data science has a dirty secret. For every piece that works, there are about nine others that didn’t. Nine other stories that look like they were...

Read more »

RcppCCTZ 0.2.1

February 4, 2017
By

A new minor version 0.2.1, of RcppCCTZ is now on CRAN. It corrects a possible shortcoming and rounding in the conversion from internal representation (in C++11 using int64_t) to the two double values for seconds and nanoseconds handed to R. Two other...

Read more »

nanotime 0.1.1

February 4, 2017
By

A new version of the nanotime package for working with nanosecond timestamps is now on CRAN. nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting, and the bit64 package for the actual integer64 arithmetic...

Read more »

Science of The Super Bowl

February 4, 2017
By
Science of The Super Bowl

A couple days ago, I participated in a Science of the Super Bowl Panel discussion organized by Newswise. I was asked to give a 5 (which turned more into about 10) minute overview, so I focused on answering 3 questions. What is data science? How is data science used in the NFL? How might data Read...

Read more »

Nice graphic? Are they taking the p…

February 4, 2017
By
Nice graphic? Are they taking the p…

Yes, it started with a tweet: Nice graphic on urine components via https://t.co/sfuXNB02sF pic.twitter.com/vhVLahQ8su — Metabolomics (@metabolomics) January 31, 2017 By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D … Continue...

Read more »

Quant Screening Backtesting: Turnaround Stocks

February 3, 2017
By
Quant Screening Backtesting: Turnaround Stocks

Introduction: What are turnaround stocks? Turnaround investing is the process of looking for investment opportunities in down-and-out companies that are poised to experience a financial recovery. The post Quant Screening Backtesting: Turnaround Stocks appeared first on NYC Data Science Academy Blog.

Read more »

RevoScaleR package for Microsoft R

February 3, 2017
By
RevoScaleR package for Microsoft R

RevoscaleR Package for R language is  package for scalable, distributed and parallel computation, available along with Microsoft R Server (and in-Database R Services). It solves many of limitations that R language is facing when run from a client machine. RevoScaleR Package addresses several of these issues: memory based data access model -> dataset can be … Continue...

Read more »

Superheat: supercharged heatmaps for R

February 3, 2017
By
Superheat: supercharged heatmaps for R

The heatmap is a useful graphical tool in any data scientist's arsenal. It's a useful way of representing data that naturally aligns to numeric data in a 2-dimensional grid, where the value of each cell in the grid is represented by a color. It's a natural fit for data that's in a grid already (say, a correlation matrix). But...

Read more »

Reproducible Finance with R: Sector Correlations Shiny App

February 3, 2017
By

by Jonathan Regenstein In a previous post, we built an R Notebook that pulled in data on sector ETFs and allowed us to calculate the rolling correlation between a sector ETF and the S&P 500 ETF, whose ticker is SPY. Today, we’ll wrap that into a Shiny app that allows the user to choose a

Read more »

Sponsors

Mango solutions









Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

ODSC1

ODSC2

datasociety

http://www.eoda.de







CRC R books series







Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.