Launch into space

March 22, 2015
By
Launch into space

I saw a link to a list with all rocket launches into space the other day. This post contains some plots concerning day of launch made from that.DataData is a fixed format file with eleven columns. Reading fixed format is not very difficult, however, it...

Read more »

Parsing Dates and Times

March 21, 2015
By
Parsing Dates and Times

Motivation R has excellent for dates and times via the built-in Date and POSIXt classes. Their usage, however, is not always as straightforward as one would want. Certain conversions are more cumbersome than we would like: while as.Date("2015-03-22"), would it not be nice if as.Date("20150322") (a format often used in logfiles) also worked, or for that matter as.Date(20150322L) using an integer variable, or...

Read more »

scientific notation for R/latex

March 21, 2015
By
scientific notation for R/latex

Motivation Using R within a latex document can be a component of reproducible research, offering (a) some assurance against typographical errors in transcribing results to the latex file and (b) the ability for others to reproduce the results. For example, one might like to explain how close the computed integral of the Witch of Agnesi function 1 2woa <- function(x, a=1) 8 * a^3 / (x^2 +...

Read more »

I’m all about that bootstrap (’bout that bootstrap)

I’m all about that bootstrap (’bout that bootstrap)

As some of my regular readers may know, I'm in the middle of writing a book on introductory data analysis with R. I'm at the point in the writing of the book now where I have to make some hard… Continue reading →

Read more »

Ensemble Learning with Cubist Model

March 20, 2015
By
Ensemble Learning with Cubist Model

The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by developing a series of trees sequentially with adjusted weights. However, the final prediction is the simple average of predictions from all “committee” members, an idea

Read more »

Fixing Colors & Proportions in Jerusalem Post Election Graphics

March 20, 2015
By
Fixing Colors & Proportions in Jerusalem Post Election Graphics

Vis expert Naomi Robbins did an excellent critique of the graphics that went along with an article on Israeli election in the Jerusalem Post. Non-uniform and color-blind-unfriendly categorical colors and disproportionate arc sizes are definitely three substantial issues in that series of visualizations. We can rectify all of them with two new packages of mine:

Read more »

NYC is a city that does sleep, a bit

March 20, 2015
By
NYC is a city that does sleep, a bit

The On Broadway project collected more than 600,000 photographs taken near Broadway in New York City during a six-month period in 2014. If you're in New York, you can explore the images in an interactive installation at the New York Public Library though the end of this year. You can also explore them in your browser using this online...

Read more »

What Consumers Learn Before Deciding to Buy: Representation Learning

March 20, 2015
By
What Consumers Learn Before Deciding to Buy: Representation Learning

Features form the basis for much of our preference modeling. When asked to explain one's preferences, features are typically accepted as appropriate reasons: this job paid more, that candidate supports tax reform, or it was closer to home. We believe t...

Read more »

Rolling Sharpe Ratios

March 20, 2015
By
Rolling Sharpe Ratios

Similar to my rolling cumulative returns from last post, in this post, I will present a way to compute and … Continue reading →

Read more »

Digital Data Collection course

March 20, 2015
By

Another year, another web scraping course. Taught through SSRMC at the University of Cambridge. Below are slides from all three sessions.In the course I tried to achieve the following:- Show how to connect R to resources online- Use loops and functions...

Read more »

New Online Tool for Seasonal Adjustment

March 20, 2015
By
New Online Tool for Seasonal Adjustment

A new website is showcasing the use of seasonal and allows for online seasonal adjustment of time series.

Read more »

It’s Analytics Survey Time!

March 20, 2015
By
It’s Analytics Survey Time!

Every other year Rexer Analytics surveys Data Analysts, Predictive Modelers, Data Scientists, Data Miners, and all other types of analytic professionals, students, and academics regarding the software they use.  I then update the main results in The Popularity of Data Analysis … Continue reading →

Read more »

Tips & Tricks 7: Plotting PCA with TPS grids

March 19, 2015
By
Tips & Tricks 7: Plotting PCA with TPS grids

Geomorph users,Our function plotTangentSpace() performs a Principal Components Analysis (PCA) of shape variation and plots two dimensions of tangent space for a set of Procrustes-aligned specimens and also returns the shape cha...

Read more »

Just how many retracted articles are there in PubMed anyway?

March 19, 2015
By
Just how many retracted articles are there in PubMed anyway?

I am forever returning to PubMed data, downloaded as XML, trying to extract information from it and becoming deeply confused in the process. Take the seemingly-simple question “how many retracted articles are there in PubMed?” Well, one way is to search for records with the publication type “Retracted Article”. As of right now, that returns

Read more »

Solar eclipse

March 19, 2015
By
Solar eclipse

Introduction Today there was a solar eclipse that was not visible on my side of the Atlantic, but was seen on the European side, either as a partial eclipse, towards the south, or a total one, towards the north . Eclipses being rare and solar power being a new thing, this event caused unprecedented reduction of solar power . A good spot for viewing the...

Read more »

The synoptic problem and statistics [book review]

March 19, 2015
By
The synoptic problem and statistics [book review]

A book that came to me for review in CHANCE and that came completely unannounced is Andris Abakuks’ The Synoptic Problem and Statistics.  “Unannounced” in that I had not heard so far of the synoptic problem. This problem is one of ordering and connecting the gospels in the New Testament, more precisely the “synoptic” gospels

Read more »

broom: a package for tidying statistical models into data frames

March 19, 2015
By
broom: a package for tidying statistical models into data frames

The concept of “tidy data”, as introduced by Hadley Wickham, offers a powerful framework for data manipulation, analysis, and visualization. Popular packages like dplyr, tidyr and ggplot2 take great advantage of this framework, as explored in several recent posts by others. But there’s an important step in a tidy data workflow that so...

Read more »

A first look at rxBTrees

March 19, 2015
By
A first look at rxBTrees

by Joseph Rickert The gradient boosting machine as developed by Friedman, Hastie, Tibshirani and others, has become an extremely successful algorithm for dealing with both classification and regression problems and is now an essential feature of any machine learning toolbox. R’s gbm() function (gbm package) is a particularly well crafted implementation of the gradient boosting machine that served as...

Read more »

Model Segmentation with Cubist

March 18, 2015
By
Model Segmentation with Cubist

Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate of cubist() model with the classic Boston housing data.

Read more »

Forecast, Automatic Routines vs. Experience

March 18, 2015
By
Forecast, Automatic Routines vs. Experience

This morning, in our Time Series course, we’ve been playing with some data I got from google.ca/trends/. Actually, we’ve been playing on some old version, downloaded 18 months ago (discussed in a previous post, in French). > urls = "http://freakonometrics.free.fr/report-headphones-2015.csv" > report=read.table( + urls,skip=4,header=TRUE,sep=",",nrows=585) > tail(report) Semaine headphones 580 2015-02-08 - 2015-02-14 53 581 2015-02-15 - 2015-02-21 52 582...

Read more »

Seven Ways You Can Use A Linear, Polynomial, Gaussian, & Exponential Line Of Best Fit

March 18, 2015
By
Seven Ways You Can Use A Linear, Polynomial, Gaussian, & Exponential Line Of Best Fit

A line of best fit lets you model, predict, forecast, and explain data. This post shows how you can use a line of best fit to explain college tuition, rats, turkeys, burritos, and the NHL draft. Read on or see our tutorials for more. Contact us if you’re interested in a trial of plotly on-premise....

Read more »

shinyData – GUI for data analysis and reporting

March 18, 2015
By
shinyData – GUI for data analysis and reporting

Some people find very hard to start using R because it has no GUI. There exists some GUIs which offers some of the functionality of R. In this post I would like to focus on one such GUI, a very new shiny application called shinyData. I hope the app will make it easier for some to get into R environment. Also...

Read more »

Making waffle charts in R (with the new ‘waffle’ package)

March 18, 2015
By
Making waffle charts in R (with the new ‘waffle’ package)

My disdain for pie charts is fairly well-known, but I do concede that there are times one needs to communicate parts of a whole graphically verses using just words or a table. When that need arises, I’m partial to “waffle charts” or “square pie charts”. @eagereyes did a great post a while ago on them

Read more »

Growing some Trees

March 18, 2015
By
Growing some Trees

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features), > MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + header=TRUE,sep=";") The default classification tree is > arbre = rpart(factor(PRONO)~.,data=MYOCARDE) > rpart.plot(arbre,type=4,extra=6) We can change the options here, such as the minimum number of observations, per node > arbre = rpart(factor(PRONO)~.,data=MYOCARDE, + control=rpart.control(minsplit=10)) > rpart.plot(arbre,type=4,extra=6) or...

Read more »

Analyze LinkedIn with R

March 18, 2015
By
Analyze LinkedIn with R

 If you have any questions to this tutorial or find some problems please feel free to create a topic in the forum: http://thinktostart.com/forums/forum/questions-tutorials/analyze-linkedin-with-r/ Some time ago I saw an interesting post in a R... The post Analyze LinkedIn with R appeared first on ThinkToStart.

Read more »

Updated checkpoint package: faster reproducibility with more feedback

March 18, 2015
By

A new version of the checkpoint package for R has just been released on CRAN. With the checkpoint package, you can easily: Write R scripts or projects using CRAN package versions from a specific point in time; Share R scripts with others that will automatically install the appropriate package versions (no need to manually install CRAN packages); Write R...

Read more »

the vim cheat sheet

March 18, 2015
By
the vim cheat sheet

Filed under: Kids, Linux, R, University life, Wines Tagged: An Evil Guest, editor, unix, vim

Read more »

Dark themes for writing

March 17, 2015
By
Dark themes for writing

I spend much of my day sitting in front of a screen, coding or writing. To limit the strain on my eyes, I use a dark theme as much as possible. That is, I write with light colored text on a dark background. I don’t know why this is not the default in more software

Read more »

Finding Similar European Soccer Clubs (with R & Shiny)

March 17, 2015
By
Finding Similar European Soccer Clubs (with R & Shiny)

Are you a die-hard supporter of one European soccer (football) team (club)? Having a rough season, or just want to watch more matches with passion?This European Team Finder analyzed 126 attributes of the top-flight teams in the marquee n...

Read more »