Hard drive occupation prediction with R – part 2 – Getting the probability distribution

Hard drive occupation prediction with R – part 2 – Getting the probability distribution

On the first article, we saw a quick-and-dirty method to predict disk space exhaustion when the usage pattern is rigorously linear. We did that by importing our data into R and making a linear regression. In this article we will see the problems with that method, and deploy a more robust solution. Besides robustness, we will also see how we can generate...

Read more »

Hard drive occupation prediction with R – part 2

Hard drive occupation prediction with R – part 2

On the first article, we saw a quick-and-dirty method to predict disk space exhaustion when the usage pattern is rigorously linear. We did that by importing our data into R and making a linear regression. In this article we will see the problems wit...

Read more »

Volcanic Solar Dimming, ENSO and Temperature Anomalies

January 21, 2011
By
Volcanic Solar Dimming, ENSO and Temperature Anomalies

In previous posts I have shown plots of global temperature anomaly, volcano and Nino34 trends (here , here). In this post , I want to further  explore the role of volcanic eruptions and Nino34 phases (El Nino, La Nina) on … Continue reading →

Read more »

Learning R through baseball: sab-R-metrics

January 21, 2011
By
Learning R through baseball: sab-R-metrics

The words "statistics" and "baseball" are often found near each other, but there's a lot more to statistics than dividing the number of hits by the number of swings to get a batting average. And there's a lot more to sabermetrics -- the statistical analysis of baseball -- than averages, too. Many baseball fans are also stats geeks (and...

Read more »

Embedding a time series with time delay in R

January 21, 2011
By
Embedding a time series with time delay in R

I’ve recently been looking at Martin Trauth‘s book MATLAB® Recipes for Earth Sciences to try to understand what some of my palaeoceanography colleagues are doing with their data analyses (lots of frequency domain time series techniques and a preponderance of … Continue reading →

Read more »

Relationship Between SAT & College Retention

January 21, 2011
By
Relationship Between SAT & College Retention

Here is a quick analysis of the relationship between SAT score and student retention. The data is from the Integrated Postsecondary Education Data System (IPEDS) and analyzed using R. This was a quick analysis and would be careful about making any strong conclusions. The source for running this analysis along with some additional graphics that

Read more »

Interesting volatility measurement, part 2

January 21, 2011
By
Interesting volatility measurement, part 2

A few weeks ago I have mentioned about an interesting volatility prediction. It is based on two periods of historical volatility (standard deviation). The remaining question was – does it really works? I could not give the answer, because I didn’t have VIX futures data at that time. Later on, I was contacted by Brian

Read more »

Model for nothing – and the bootstrap for free

January 21, 2011
By
Model for nothing – and the bootstrap for free

Reconstructing phylogenies is an interesting task, sadly one that often requires to navigate between a multitude of software. To add an unnecessary layer of complexity to the whole thing, most of these softwares speaks different languages, and requires the user to do endless conversions from fasta to phylip to nexus to whatever new format they

Read more »

Disable auto-update from R (Windows)

January 21, 2011
By

There are two major threats to complex MCMC estimations:Wrong energy settings (hibernate after 2 hours of inactivity)Automatic Updates (install updates at 3 a.m.)I thought about the latter threat. At times, you may hand some R code to other co-workers,...

Read more »

Disable auto-update from R (Windows)

January 21, 2011
By

There are two major threats to complex MCMC estimations:Wrong energy settings (hibernate after 2 hours of inactivity)Automatic Updates (install updates at 3 a.m.)I thought about the latter threat. At times, you may hand some R code to other co-workers,...

Read more »

Ultraedit to R

January 21, 2011
By

My favorite text editor on Windows is Ultraedit, but it does not have a nice interface to R in the same vein as Emacs/ESS, Tinn-R, or Eclipse. (I have never used Eclipse.) Ultraedit is powerful enough to submit whole R programs...

Read more »

Stop your figures jumping about in odfWeave

January 21, 2011
By

If you use odfWeave to produce figures, you will probably find they jump about when scrolling through the document – because the figures and figure frames are anchored in openoffice to the paragraph and not “as character”. The only way to fix this in a finished document is to right-click on the figures and select

Read more »

Stop your figures jumping about in odfWeave

January 21, 2011
By

If you use odfWeave to produce figures, you will probably find they jump about when scrolling through the document – because the figures and figure frames are anchored in openoffice to the paragraph and not “as character”. ...

Read more »

How do you explain reproducible research to clients?

January 21, 2011
By

Most of the statistics work I do now is reproducible research – this can offer a big advantage for clients but of course that doesn’t necessarily mean they realise it … Below is a text we have been pasting in at the bottom of the source documents (and which therefore appears in the pdf’s) to

Read more »

How do you explain reproducible research to clients?

January 21, 2011
By

Most of the statistics work I do now is reproducible research - this can offer a big advantage for clients but of course that doesn't necessarily mean they realise it ... Below is a text we have been pasting in at the bottom of the source d...

Read more »

Embedding a time series with time delay in R

January 21, 2011
By

I’ve recently been looking at (http://www.geo.uni-potsdam.de/member-details/show/108.html ‘Martin Trauth’s web page at The University of Potsdam Institute of Earth and Environmental Science’)'s book MATLAB® Recipes for Earth Sciences to try to understand what some of my palaeoceanography colleagues are doing with their data analyses (lots of frequency domain time series techniques and a...

Read more »

Inconsistencies in Bayesian Models of Decision-Making

January 20, 2011
By

But modeling devices that make sense for an unbiased decisionmaker may not make sense for a biased one. For example, why would individuals have priors and posteriors if they are destined to apply Bayes’ law incorrectly?1 A question I often ask myself. Wolfgang Pesendorfer : Behavioral Economics Comes of Age: A Review Essay on Advances

Read more »

Trip to Lyon

January 20, 2011
By
Trip to Lyon

This was my first trip to Lyon in about… 35 years, I think, but I did not have much time to tour the city! My original plan was to go climbing with Ivan near La Meije right after the talk, but our respective knees were hurting for the past week at least (since Utah in

Read more »

Will I ever be a bayesian statistician ? (part 1)

January 20, 2011
By
Will I ever be a bayesian statistician ? (part 1)

Last week, during the workshop on Statistical Methods for Meteorology and Climate Change (here), I discovered how powerful bayesian techniques could be, and that there were more and more bayesian statisticians. So, if I was to fully understand app...

Read more »

Bad kitty!

January 20, 2011
By
Bad kitty!

The cat function bugs me a little. There are two quirks in particular that I find irritating on occasions that I use it. Firstly, almost everything that I want displayed onscreen, I want on its own line. > cat("cat messes up my command prompt position") cat messes up my command prompt position> So it would

Read more »

sab-R-metrics: Intermediate Boxplots and Histograms

January 20, 2011
By
sab-R-metrics: Intermediate Boxplots and Histograms

Last week, I began talking about using the base graphics in R. Those graphics were pretty bland, and my hope for the next two posts is to introduce some interesting additions to the basic graphics that come from R: color, legends, lines, shapes, multiple graphs side-by-side, text, point types, and custom axes. If you have missed...

Read more »

sab-R-metrics: Intermediate Boxplots and Histograms

January 20, 2011
By
sab-R-metrics: Intermediate Boxplots and Histograms

Last week, I began talking about using the base graphics in R. Those graphics were pretty bland, and my hope for the next two posts is to introduce some interesting additions to the basic graphics that come from R: color, legends, lines, shapes, multiple graphs side-by-side, text, point types, and custom axes. If you have missed...

Read more »

Call for proposals for writing a book about R (via Chapman & Hall/CRC)

January 20, 2011
By

Rob Calver wrote an interesting invitation on the R mailing list today, inviting potential authors to submit their vision of the next great book about R. The announcement originated from the Chapman & Hall/CRC publishing houses, backed up by an impressive team of R celebrities, chosen as the editors of this new R books series,

Read more »

40 Fascinating Blogs for the Ultimate Statistics Geek!

January 20, 2011
By

I am happy to report that ByteMining is listed on “40 Fascinating Blogs for the Ultimate Statistics Geek“! Some of the ones that I frequently read, or are written by Twitter friends/followers (in no particular order): R-bloggers, an aggregate site containing blog posts tagged as posts about R. High quality content. Statistical modeling, causal inference and social science. This one is...

Read more »

Submit your talks for the R user conference

January 19, 2011
By

useR! 2011, the annual R user conference supported by the R Foundation for Statistical Computing, will be held this year in the United Kingdom at the University of Warwick (which is located, oddly enough, in Coventry). Last year's conference featured dozens of presentations on R's use in pretty much every domain of analysis, science and research. There were talks...

Read more »

The ultimate exam excuse…

January 19, 2011
By
The ultimate exam excuse…

A few days after my R exam (available in French here), I received this email from the department secretary: Pour information cet étudiant est venu me trouver jeudi soir, catastrophé car il venait de constater que son stylo “PILOT Frixion”, stylo semble-t-il recommandé par certains enseignants, ne se contentait pas d’effacer la page sur son

Read more »

pgfSweave 1.1.3 and Beyond

January 19, 2011
By

pgfSweave 1.1.3 is now on CRAN. This release adds a few new features and fixes some bugs The pesky Rplots.pdf file is not generated anymore Brand new vignette by Hans Ekbrand on the use of pgfSweave with large data sets based on this site. New example on using caching by Yihui Xie Reusing code chunks

Read more »

RcppBDT 0.1.0

January 18, 2011
By

The family of Rcpp packages just grew by one: the first 0.1.0 release of RcppBDT is now on CRAN. RcppBDT stands for Rcpp Boost Date_Time. It employs what we call Rcpp modules: a mechanism which provides easier ways to expose C++ functions and classe...

Read more »

A simple test to predict coronary artery disease

January 18, 2011
By

Coronary artery disease (CAD) results in blockages to the blood vessels that supply the heart and, if left untreated, can lead to heart attacks and even death. In fact, CAD is the leading cause of death in North America and many other countries. It's important to detect CAD as soon as possible, to improve the chances of a successful...

Read more »