52 Vis Week 1 Winners!

April 13, 2016
By
52 Vis Week 1 Winners!

The response to 52Vis has exceeded expectations and there have been great entries for both weeks. It’s time to award some prizes! Week 1 – Send in the Drones I’ll take this week in comment submission order (remember, the rules changed to submission via PR in Week 2). NOTE: WordPress seems to have “eaten” the... Continue reading →

Read more »

How to sort a list of dataframes

April 13, 2016
By

A method to gather data from different sources, sort them and keep a reference to the origin of each subset, plus some efficiency considerations The post How to sort a list of dataframes appeared first on MilanoR.

Read more »

Microsoft Data Science VM now available as a Linux instance

April 13, 2016
By
Microsoft Data Science VM now available as a Linux instance

Microsoft's Linux Data Science Virtual Machine is now available for use on the Azure Marketplace. Like the Windows-based instance of the Data Science VM, this pre-built system based on Linux CentOS 7.2 includes all the tools you'll need to analyze data, including Microsoft R Open, Anaconda Python, Jupyter Notebooks and a PostgreSQL database instance. It also includes a suite...

Read more »

eRum 2016 (european R users meeting) – invited speakers

April 13, 2016
By
eRum 2016 (european R users meeting) – invited speakers

  eRum 2016 will take place in the beautiful city of Poznań, Poland, betweenOctober 12th and 14th, and we already have confirmed invited speakers such as Rasmus Bååth (Lund University Cognitive Science) http://www.sumsar.net https://twitter.com/rabaath Romain Francois (r-enthusiasts) https://github.com/romainfrancois https://twitter.com/romain_francois Ulrike Grömping (Beuth University of Applied Sciences Berlin) https://prof.beuth-hochschule.de/groemping/ Matthias Templ (Vienna University of Technology) http://www.statistik.tuwien.ac.at/public/templ/mtempl/?page_id=2 http://www.data-analysis.at/de_DE/ Heather Turner (University of Warwick) https://twitter.com/HeathrTurnr http://www.heatherturner.net as well...

Read more »

Learn R By Intensive Practice – Part 2

April 13, 2016
By
Learn R By Intensive Practice – Part 2

This is a continuation of part 1 of the Learn R By Intensive Practice Series. In this part, we step up the game and learn a number of key concepts such as lists, sampling, data frames etc. At the end of each video, you will be solving a practice challenge based on what you learnt 11. Get specific...

Read more »

KEGG Module Enrichment Analysis

April 13, 2016
By

KEGG MODULE is a collection of manually defined functional units, called KEGG modules and identified by the M numbers, used for annotation and biological interpretation of sequenced genomes. There are four types of KEGG modules: pathway modules – representing tight functional units in KEGG metabolic pathway maps, such as M00002 (Glycolysis, core module involving three-carbon compounds) structural complexes – often forming...

Read more »

Desktop DeployR

April 13, 2016
By

I'm going to be giving a talk this Thursday at my local R/Data Science Meetupabout my method for deploying self contained desktop R applications. Since my original post on the subject (over 2 years ago!) I've made manyof improvements thanks to the many useful comments I received and my own "dog-fooding".So many in fact that the framework...

Read more »

Beating lollipops into dumbbells

April 12, 2016
By
Beating lollipops into dumbbells

Shortly after I added lollipop charts to ggalt I had a few requests for a dumbbell geom. It wasn’t difficult to do modify the underlying lollipop Geoms to make a geom_dumbbell(). Here it is in action: library(ggplot2) library(ggalt) # devtools::install_github("hrbrmstr/ggalt") library(dplyr)   # from: https://plot.ly/r/dumbbell-plots/ URL <- "https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv" fil <- basename(URL) if (!file.exists(fil)) download.file(URL, fil)... Continue reading →

Read more »

Determining the Number of Factors with Parallel Analysis in R

April 12, 2016
By
Determining the Number of Factors with Parallel Analysis in R

Tom Schmitt April 12, 2016 As discussed on page 308 and illustrated on page 312 of Schmitt (2011), a first essential step in Factor Analysis is to determine the appropriate number of factors with Parallel Analysis in R. The data consists of 26 psychological tests administered by Holzinger and Swineford (1939) to 145 students and Continue Reading.. The post...

Read more »

Using Travis? Make sure you use a Github PAT

April 12, 2016
By
Using Travis? Make sure you use a Github PAT

We’re in the fantastic situation where lots of people are using Travis-CI to test their R packages or use it to test and deploy their analytics/ documentation / anything really. It’s popularity has been having a negative side-effect recently though! GitHub rate limits API access to 5000 requests per hour so sometimes there are more The post

Read more »

Accessing a Neo4j Graph Database Server from RStudio and Jupyter R Notebooks Using Docker Containers

April 12, 2016
By
Accessing a Neo4j Graph Database Server from RStudio and Jupyter R Notebooks Using Docker Containers

In Getting Started With the Neo4j Graph Database – Linking Neo4j and Jupyter SciPy Docker Containers Using Docker Compose I posted a recipe demonstrating how to link a Jupyter notebook container with a neo4j container to provide a quick way to get up an running with neo4j from a Python environment. It struck me that

Read more »

New Course: “Shapefiles for R Programmers”

April 12, 2016
By
New Course: “Shapefiles for R Programmers”

Today my new course, Shapefiles for R Programmers, is available for preorder! This course is designed to open up new doors of data analysis for R programmers by teaching them how to work with shapefiles, using both GIS programs and R. Shapefiles are the most common method of storing maps, and learning how to work The post

Read more »

Introducing cricket package yorkr:Part 4-In the block hole!

April 11, 2016
By
Introducing cricket package yorkr:Part 4-In the block hole!

Introduction “The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our apple pies were made in the interiors of collapsing stars. We are made of starstuff.” “If you wish to make an apple pie from scratch, you must first invent the universe.” “We are like butterflies

Read more »

yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams

April 11, 2016
By
yorkr pads up for the Twenty20s: Part 2-Head to head confrontation between teams

Alice:“How long is forever”? White Rabbit:“Sometimes, just one second.” Alice :“Where should I go?” The Cheshire Cat: “That depends on where you want to end up.” “I’m not strange, weird, off, nor crazy, my reality is just different from yours.” Alice through the looking glass - Lewis Caroll Introduction In this post, my R package

Read more »

Adding motion to choropleths

April 11, 2016
By
Adding motion to choropleths

Time ago @hrbrmstr show how to replicate a visualization made by New York Times with R. The @nytgraphics folks are spiffy.so is #rstats https://t.co/zc1gIx6cyE https://t.co/XAmVPDLfC7 #tigrisviridisggplot pic.twitter.com/9ZK6wvYDnh— boB Rudis (@hrbrmstr) March 25, 2016 The result we hope is like this: I really like small multiples and this is a good...

Read more »

Does weather cause accidents – part 2

April 11, 2016
By
Does weather cause accidents – part 2

In part 1 I showed how to grab data from the forecast.io, now that we have all of that I want to use it to investigate the effects of weather on accidents. First, I realised after playing around a little that one possible way of doing this was as follows; In part 1 we grabbed weather data associated with...

Read more »

The FBI’s aerial surveillance program, visualized with R

April 11, 2016
By
The FBI’s aerial surveillance program, visualized with R

Buzzfeed's Peter Aldhous and Charles Seife broke a major news story last week: the US Federal Bureau of Investigation and Department of Homeland Security operate more than 200 small aircraft (mainly Cessnas and some helicopters) which routinely circle various sites near US cities, presumably to gather data with onboard cameras and electonic equipment. The data behind the story weren't...

Read more »

Predicting Wine Quality with Azure ML and R

April 11, 2016
By
Predicting Wine Quality with Azure ML and R

by Shaheen Gauher, PhD, Data Scientist at Microsoft In machine learning, the problem of classification entails correctly identifying to which class or group a new observation belongs, by learning from observations whose classes are already known. In what follows, I will build a classification experiment in Azure ML Studio to predict wine quality based on physicochemical data. Several classification...

Read more »

Registration for R/Finance 2016 is open!

April 11, 2016
By

You can find registration information and agenda details on the conference website.  Or you can go directly to the Cvent registration page.Note that registration fees will increase by 50% at the end of early regi...

Read more »

yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!

April 11, 2016
By
yorkr pads up for the Twenty20s:Part 3:Overall team performance against all oppositions!

Introduction “So in war, the way is to avoid what is strong, and strike at what is weak.” “Thus the expert in battle moves the enemy, and is not moved by him.” “Appear weak when you are strong, and strong when you are weak.” The Art of War - Sun Tzu This post is a

Read more »

R in Finance and other events

April 11, 2016
By
R in Finance and other events

Highlighted R in Finance 2016 May 20-21, Chicago. 2 days, limited space, 50 speakers, including: Pat Burns on “Some Linguistics of Quantitative Finance” Abstract: How can the abstract be written for a talk with an ambiguous and possibly misleading title without itself being vague and misleading? I don’t know, but perhaps: A quest to discover how markets work … Continue reading...

Read more »

Simulating queueing systems with simmer

April 11, 2016
By
Simulating queueing systems with simmer

We are very pleased to announce that a new release of simmer, the Discrete-Event Simulator for R, is on CRAN. There are quite a few changes and fixes, with the support of preemption as a star new feature. Check out the complete set of release notes here. Let’s simmer for a bit and see how this package can be...

Read more »

The opinionated estimator

April 11, 2016
By
The opinionated estimator

You have been lied to. By me. I taught once a programming class and introduced my students to the notion of an unbiased estimator of the variance of a population. The problem can be stated as follows: given a set of observations $(x_1, x_2, …, x_n)$, what can you say about the variance of the

Read more »

Clandestine DNS lookups with gdns

April 10, 2016
By
Clandestine DNS lookups with gdns

Google recently announced their DNS-over-HTTPS API, which “enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups”. The REST API they provided was pretty simple to wrap into a package and I tossed in some SPF functions that I had lying around to bulk it... Continue reading →

Read more »

Supervised Machine Learning with R Workshop on April 30th

April 10, 2016
By
Supervised Machine Learning with R Workshop on April 30th

Data Community DC and District Data Labs are hosting a Supervised Machine Learning with R workshop on Saturday April 30th. Come out and learn about R's capabilities for regression and classification, how to perform inference with these models, and how to use out-of-sample evaluation methods for your models!

Read more »

Le Monde puzzle [#958]

April 10, 2016
By
Le Monde puzzle [#958]

A knapsack Le Monde mathematical puzzle: Given n packages weighting each at most 5.8kg for a total weight of 300kg, is it always possible to allocate these packages  to 12 separate boxes weighting at most 30kg each? weighting at most 29kg each? This can be checked by brute force using the following R code and

Read more »

An awesome list of network analysis resources

Inspired by the awesome R list that I mentioned a few months ago, I have started the awesome-network-analysis list, which features a large section on R packages. Building a list specifically dedicated to network analysis presents the opportunity to ci...

Read more »

Free data science video lecture: debugging in R

April 9, 2016
By

We are pleased to release a new free data science video lecture: Debugging R code using R, RStudio and wrapper functions. In this 8 minute video we demonstrate the incredible power of R using wrapper functions to catch errors for later reproduction and debugging. If you haven’t tried these techniques this will really improve your … Continue reading...

Read more »

Try’in to 3D network: Quest (shiny + plotly)

April 8, 2016
By
Try’in to 3D network: Quest (shiny + plotly)

I have an unnatural obsession with 4-dimensional networks. It might have started with a dream, but VR  might make it a reality one day. For now I will settle for  3D networks in Plotly. Presentation: R users group (more) More: networkly

Read more »

Sponsors