10 R packages every data scientist should know about

February 18, 2013
By

The yhat blog lists 10 R packages they wish they'd known about earlier. Drew Conway calls them "10 reasons to always start your analysis in R". They're all very useful R packages that every data scientist should be aware of. They are: sqldf (for selecting from data frames using SQL) forecast (for easy forecasting of time series) plyr (data...

Read more »

Predictors, responses and residuals: What really needs to be normally distributed?

February 18, 2013
By
Predictors, responses and residuals: What really needs to be normally distributed?

Introduction Many scientists are concerned about normality or non-normality of variables in statistical analyses. The following and similar sentiments are often expressed, published or taught: "If you want to do statistics, then everything needs to be normally distributed." "We normalized…Read more →

Read more »

Saving R Objects in Oracle Database using Oracle R Enterprise 1.3 Datastore

February 18, 2013
By
Saving R Objects in Oracle Database using Oracle R Enterprise 1.3 Datastore

Normal 0 false false false EN-US X-NONE X-NONE ...

Read more »

#15 Alkali Silica Template

February 18, 2013
By
#15 Alkali Silica Template

Does what it says on the tin. DOWNLOAD THE CODE #------------------------------ #-------- INFORMATION --------- #------------------------------ # Plotting points from Hugh # Rallinson's "Using Geochemical # Data" book. Code compiled by # Darren J. Wilkinson, # Grant Inst. Earth Science # The University of Edinburgh # [email protected] #------------------------------ # -------- CONTROLS ---------- y.max = 16 x.min

Read more »

RQuantLib 0.3.10

February 18, 2013
By

A new minor release RQuantLib 0.3.10 is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. The discount curve building code in QuantLib has s...

Read more »

Simple tests of predicted returns

February 18, 2013
By
Simple tests of predicted returns

Some ways to explore how good a method of predicting returns is. Data and model The universe is 443 large cap US stocks that have data back to the beginning of 2004.  The daily (adjusted) close was used. The model that is used as an example is the default signal from the MACD function of … Continue reading...

Read more »

Reshaping Horse Import/Export Data to Fit a Sankey Diagram

February 18, 2013
By
Reshaping Horse Import/Export Data to Fit a Sankey Diagram

As the food labeling and substituted horsemeat saga rolls on, I’ve been surprised at how little use has been made of “data” to put the structure of the food chain into some sort of context* (or maybe I’ve just missed those stories?). One place that can almost always be guaranteed to post a few related

Read more »

Improving the graph gallery

February 18, 2013
By
Improving the graph gallery

I'm trying to make improvements to the R Graph Gallery, I'm looking for suggestions from users of the website. I've started a question on the website's facebook page. Please take a few seconds to vote to existing improvements possibilities...

Read more »

dbetabinom versions

February 17, 2013
By
dbetabinom versions

I got this email from a student: (1) I used the following R function in package “emdbook“ more precisely I did (2) instead I use the following R function in package “VGAM“ more precisely I did and I get two different curves! Sad! to which I replied only the following as the beta-binomial density is

Read more »

Displaying Isotopic Abundance Percentages with Bar Charts and Pie Charts

Displaying Isotopic Abundance Percentages with Bar Charts and Pie Charts

The Structure of an Atom An atom consists of a nucleus at the centre and electrons moving around it.  The nucleus contains a mixture of protons and neutrons.  For most purposes in chemistry, the two most important properties about these 3 types of particles are their masses and charges.  In terms of charge, protons are

Read more »

Change fonts in ggplot2, and create xkcd style graphs

February 17, 2013
By
Change fonts in ggplot2, and create xkcd style graphs

Installing and changing fonts in your plots comes now easy with the extrafonts-package. There is a excellent tutorial on the extrafonts github site, still I will shortly demonstrate how it worked for me. First, install the package and load it. You can now install the desired system fonts (at the moment only TrueType fonts): The

Read more »

Temporal network model – Barabási-Albert model with the library igraph

February 17, 2013
By

I found a golden website. The blog of Esteban Moro. He uses R to work on networks. In particular he has done a really nice code to make some great videos of networks. This post is purely a copy of his code. I just changed a few arguments to change colors and to do my own network.To...

Read more »

Run production, one team at a time

February 17, 2013
By

In a previous post, I used R to process data from the Lahman database to calculate index values that compare a team's run production to the league average for that year.  For the purpose of that exercise, I started the sequence at 1947, but for what follows I re-ran the code with the time period...

Read more »

Automatic spatial interpolation with R: the automap package

February 17, 2013
By
Automatic spatial interpolation with R: the automap package

In case of continuously collected data, e.g. observations from a monitoring network, spatial interpolation of this data cannot be done manually. Instead, the interpolation should be done automatically. To achieve this goal, I developed the automap package. automap builds on… See more ›

Read more »

A look at strucchange and segmented

February 17, 2013
By
A look at strucchange and segmented

After last week's post it was commented that strucchange and segmented would be more suitable for my purpose. I had a look at both. Strucchange can find a jump in a time series, which was what I was looking for. In contrast segmented is more suitable f...

Read more »

Contribute to The R Journal with LyX/knitr

February 17, 2013
By
Contribute to The R Journal with LyX/knitr

(This paragraph is pure rant; feel free to skip it) I have been looking forward to the one-column LaTeX style of The R Journal, and it has arrived eventually. Last time I mentioned "it does not make sense to sell the cooked shrimps"; actually there is ...

Read more »

Gist for previous posts

February 17, 2013
By

The more I use it, the more I understand the benefits and value of Github as a code-sharing resource. The gist found here is the R code for my posts on run scoring trends by league (found here, here, and here).  I will continue to use Github for t...

Read more »

Interactive stage-structured population model

February 16, 2013
By
Interactive stage-structured population model

This is an example of interfacing R and shiny to allow users to explore a biological model often encountered in an introductory ecology class. We are interested the growth of a population that is composed of multiple, discrete stages or age classes. Patrick H. Leslie provides an in-depth derivation of the model in his 1945 paper “On the...

Read more »

Finding outliers in numerical data

Finding outliers in numerical data

One of the topics emphasized in Exploring Data in Engineering, the Sciences and Medicine is the damage outliers can do to traditional data characterizations.  Consequently, one of the procedures to be included in the ExploringData package is FindOutliers, described in this post.  Given a vector of numeric values, this procedure supports four different methods for identifying possible outliers.Before...

Read more »

Some of Excel’s Finance Functions in R

February 16, 2013
By

Last year I took a free online class on finance by Gautam Kaul. I recommend it, although there are other classes I can not compare it to. The instructor took great efforts in motivating the concepts, structuring the material, and enable critical thinking / intuition. I believe this is an advantage of video...

Read more »

digest 0.6.3

February 16, 2013
By

digest version 0.6.3 is now on CRAN, and I'll upload the Debian package in a minute. This is a minor bug release regarding just the recently-added sha512 support. Turns out the wrong initial buffer size was used on the R side. Hannes fixed that with...

Read more »

Google Statistician uses R and other programming tools

February 16, 2013
By

A great interview on the Simply Statistics blog with Google's Nick Chamandy, Phd in Statistics.  Explains that he mainly uses R among other tools to perform his work at Google.  Also of note is the active data science community within Google ...

Read more »

More Haskell: a bootstrap

February 16, 2013
By
More Haskell: a bootstrap

So my playing around with Haskell goes on. You can follow the progress of the little bootstrap exercise on github. Now it’s gotten to the point where it actually does a bootstrap interval for the mean of a sample. Consider the following R script: 10.31 2.5% 97.5% 9.72475 10.85200 So, that was a simple

Read more »

Interactive stage-structured population model

February 16, 2013
By
Interactive stage-structured population model

This is an example of interfacing R and shiny to allow users to explore a biological model often encountered in an introductory ecology class. We are interested the growth of a population that is composed of multiple, discrete stages or age classes. Patrick H. Leslie provides an in-depth derivation of the model in his 1945 paper “On the...

Read more »

Market Filter Back Test Shiny web application

February 15, 2013
By
Market Filter Back Test Shiny web application

Today, I want to share the Market Filter Back Test application (code at GitHub). This is the forth application in the series of examples (I plan to share 5 examples) that will demonstrate the amazing Shiny framework and Systematic Investor Toolbox to analyze stocks, make back-tests, and create summary reports. The motivation for this series

Read more »

Video: Data Mining with R

February 15, 2013
By

Yesterday's Introduction to R for Data Mining webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, I've embedded the video replay below, and Joe's slides (with links to many useful resources) are also available. During the webinar, Joe demoed several examples of...

Read more »

Incorporating Preference Construction into the Choice Modeling Process

February 15, 2013
By

Statistical modeling often begins with the response generation process because data analysis is a combination of mathematics and substantive theory.  It is a theory of how things work that determines how we ought to collect and analyze&n...

Read more »

Clustering Loss Development Factors

February 15, 2013
By
Clustering Loss Development Factors

  Anytime I get a new hammer, I waste no time in trying to find something to bash with it. Prior to last year, I wouldn’t have known what a cluster was, other than the first half of a slang term used to describe a poor decision-making process. Now I’ve seen it in action a

Read more »

Zurich, Feb 2013 – Basic R Course

February 15, 2013
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave,...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.