[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The National Hurricane Center (NHC) collects datasets with all  storms in North Atlantic, the North Atlantic Hurricane Database (HURDAT). For all sorms, we have the location of the storm, every six jours (at midnight, six a.m., noon and six p.m.). Note that we have also the date, the maximal wind speed – on a 6 hour window – and the pressure in the eye of the storm.

It is possible to run the following function

library(XML)
extract.track=function(year=2012,p=TRUE){

First, we need to extract the list of all storms

loc <- paste("http://weather.unisys.com/hurricane/atlantic/",year,"/index.php",sep="")
storms <- unlist(strsplit(as.character(
tabs[[1]]$Name),split=" ")) index <- storms%in%c("Tropical","Storm",paste("Hurricane-",1:6,sep=""),"Depression","Subtropical","Extratropical","Low",paste("Storm-",1:6,sep=""), "Xxx") nstorms <- storms[!index] and then, for all storms, we go to the appropriate page, and we extract the dataset, TRACK=NULL for(i in length(nstorms):1){ loc=paste("http://weather.unisys.com/hurricane/atlantic/",year,"/",nstorms[i],"/track.dat",sep="") track=read.fwf(loc,skip=3,widths = c(4,6,8,12,4,6,20)) names(track)=c("ADV","LAT","LON","TIME","WIND","PR","STAT") track$LAT=as.numeric(as.character(track$LAT)) track$LON=as.numeric(as.character(track$LON)) track$WIND=as.numeric(as.character(track$WIND)) track$PR=as.numeric(as.character(track$PR)) track$year=year
track$name=nstorms[i] TRACK=rbind(TRACK,track) if(p==TRUE){ cat(year,i,nstorms[i],nrow(track),"n")}} return(TRACK)} For instance, for 2012, we have > m=extract.track(2012) 2012 19 TONY 20 2012 18 SANDY 45 2012 17 RAFAEL 56 2012 16 PATTY 11 2012 15 OSCAR 13 2012 14 NADINE 96 2012 13 MICHAEL 42 2012 12 LESLIE 60 2012 11 KIRK 22 2012 10 JOYCE 12 2012 9 ISAAC 51 2012 8 GORDON 26 2012 7 HELENE 38 2012 6 FLORENCE 21 2012 5 ERNESTO 39 2012 4 DEBBY 18 2012 3 CHRIS 31 2012 2 BERYL 33 2012 1 ALBERTO 20 > head(m) ADV LAT LON TIME WIND PR STAT year name 1 1 20.1 -50.8 10/21/18Z 25 1011 LOW 2012 TONY 2 2 20.4 -51.2 10/22/00Z 25 1011 LOW 2012 TONY 3 3 20.8 -51.5 10/22/06Z 25 1010 LOW 2012 TONY 4 4 21.3 -51.7 10/22/12Z 30 1009 LOW 2012 TONY With a simple loop we can extract all the years (note some manual corrections are necessary, since there are about ten typos in the lists, on the hurricane names). TOTTRACK=NULL for(y in 2012:1851){ TOTTRACK=rbind(TOTTRACK,extract.track(y)) } We can visualize all those trajectories, library(maps) map("world",xlim=c(-80,-40),ylim=c(10,50),col="light yellow",fill=TRUE) for(n in unique(TOTTRACK$name)){
lines(TOTTRACK$LON[TOTTRACK$name==n],TOTTRACK$LAT[TOTTRACK$name==n],lwd=.5,col="red")}

If we look at the distribution of those points, we have

library(ks)
U=TOTTRACK[,c("LON","LAT")]
U=U[!is.na(U$LON),] H=diag(c(.2,.2)) fat=kde(U,H,xmin=c(min(U[,1]),min(U[,2])),xmax=c(max(U[,1]),max(U[,2]))) z=fat$estimate
image(z)
map("world",add=TRUE)

Almost a year ago, Christophe Denuse-Baillon – in his actuarial thesis – suggested to use a Markov space process.  I do not want to fit a parametric model, but for all locations, it is possible to use a nonparametric approach. For all locations, on a given grid, it is possible to get all past movements (if any). To be more specific, given a location (a truncated latitude and a
truncated longitude), let us extract two informations,

• the proportion of trajectories which ended at this location
• for all trajectories that did not end at this location, we can extract location 6 hours lated, after being at this location.

The code will be

mtransition=function(i,j){
MOVEMENT=NA
sumstop=0
p=1
idx=which( (TOTTRACK$LON>=gridx[i])&(TOTTRACK$LON=gridy[j])&(TOTTRACK$LAT0){ MOVEMENT=data.frame(LON=rep(NA,length(idx)),LAT=rep(NA,length(idx)),D=rep(NA,length(idx))) for(s in 1:length(idx)){ stops=TRUE if((is.na(TOTTRACK$name[idx[s]+1])==FALSE)&(is.na(TOTTRACK$name[idx[s]])==FALSE)){ stops=TOTTRACK$name[idx[s]+1]!=TOTTRACK$name[idx[s]]} if(stops==TRUE ){sumstop=sumstop+1} if(stops==FALSE){ d=(TOTTRACK$LON[idx[s]+1]-TOTTRACK$LON[idx[s]])^2+(TOTTRACK$LAT[idx[s]+1]-TOTTRACK$LAT[idx[s]])^2 locx=locy=NA if((d<90)& TOTTRACK$LON[idx[s]+1]<0){
locx=floor(TOTTRACK$LON[idx[s]+1]/pasgrid) locy=floor(TOTTRACK$LAT[idx[s]+1]/pasgrid)
}
MOVEMENT[s,]=c(locx,locy,d)
}}
p=sumstop/length(idx)}
return(list(listemouv=MOVEMENT,probstop=p))}

For convenience, for all values on a grid, let us store all those values in a big list

ListTransMat=list()
ListStopMat =list()
for(i in 1:(length(gridx)-1)){
for(j in 1:(length(gridy)-1)){
nom=abs(gridx[i])*1000+abs(gridy[j])
M=matricetransition(i,j)
ListTransMat[[nom]] = M$listemouv ListStopMat[[nom]] = M$probstop
}}

Now we can start some code to generate a trajectory. First, we need a starting point, from one we did observe.

n=nrow(TOTTRACK)
idx=which(TOTTRACK$name[1:(n-1)]!=TOTTRACK$name[2:n])
STARTINGVALUE=TOTTRACK[c language="(1,idx+1),c("LON","LAT")"][/c]

If we plot them on the map above, we have

abline(v=gridx,col="white",lwd=.5)
abline(h=gridy,col="white",lwd=.5)
points(STARTINGVALUE$LON,STARTINGVALUE$LAT,
pch=19,col="blue")

From that location we look at possible trajectories, and then, we generate two random variables. One to see if we move, or if the trajectory ended where we were. And in the case we move, we draw
randomly among past trajectories. The idea is the following. Start at some specific location

and  then, using the previous function, look were you might be, 6 hours later,

Draw one possible location, from those observe previously (note that we might reamin in the same square)

sim.mouv=function(LOC){
lon=LOC[1]
lat=LOC[2]
nom=as.numeric(abs(lon)*1000+abs(lat))
M=ListTransMat[[nom]]
M=M[!is.na(M\$LON),]
stop=ListStopMat[[nom]]
u=runif(1)
if(u<=stop) LOC2=NA
if(u>stop) LOC2=M[sample(1:nrow(M),size=1),c("LON","LAT")]
return(LOC2)
}

And then, from that new location

we move again (according to some previous trajectory, observed in the past)

Actually, keep in mind that it might also be possible to stop here. The code is – more or less – the following

sim.traj=function(){
i=sample(1:nrow(STARTINGVALUE),size=1)
LOC=round(STARTINGVALUE[i,])
MLOC=LOC
stop=FALSE
while(stop==FALSE){
LOC2=sim.mouv(LOC)
if(is.na(LOC2))  stop=TRUE
if(!is.na(LOC2)) LOC=LOC2; MLOC=rbind(MLOC,LOC)
}
return(MLOC)
}

Below, we can visualize two possible trajectories, for hurricanes that we did generate,

Nice, isn’t it? Now we need to add additional information, about the strength of the hurricaine, again using past experience… But let’s keep that for another post!