# Wildfires Comparison with ggplot2 dual Y-axis and Forecasting with KNN

**DataGeeek**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In recent days, Turkey has been escalated wildfires. The government and the people were In recent days, Turkey has been escalated wildfires. The government and the people were on the alert and mobilized to put out the fires but, they couldn’t for days. A significant part of people thought that the simultaneous fires had no make sense, and there was probably some kind of terrorist actions beyond these fires. But there had also been wildfires in the other parts of the world at the same time. Hence, I have just decided to build a function to compare the numbers.

The countries that we will compare are Greece, Italy, and Turkey, which are all in the same Mediterranean climate zone and had brutal damage of fires. We will compare the hectares(ha) of damaged areas and the number of fires. The data set has annual observations between the 2009-2021 periods. Of course, the data for this year had to be taken until 20 august based on the EFFIS database; but it doesn’t change the underlying concept of this article.

library(tidyverse) library(tsfknn) library(forecast) df <- read_excel("fires_statistics.xlsx")

For comparison purposes, we will use the second y-axis in our plot. We have to use transformation rate to create a second axis because the ** sec.axis** function, which we are going to use for building the second y-axis, can not build an independent y-axis; it creates related to the main y-axis via some mathematical transformations. The transformation rate could be taken as approximate

*.*

**max(y1)/max(y2)**fun <- function(country="Greece"){ df %>% .[.$country==country,] %>% ggplot(aes(x=year))+ geom_bar(aes(y=burnt_areas_ha),stat = "identity",fill="orange")+ #Bar labels(burnt_areas_ha) geom_text(aes(y=burnt_areas_ha,label=burnt_areas_ha),vjust=-0.2,color="orange")+ #multiplied by transformation rate to tune in with the main y-axis(Burnt Areas) geom_line(aes(y=fires_number*100),size=2,color="lightblue")+ #Line labels(fires_number) geom_text(aes(y=fires_number,label=fires_number),vjust=1,color="lightblue")+ scale_x_continuous(breaks = seq(2009,2021,1))+ scale_y_continuous( #for the main y-axis labels = scales::label_number_si(), breaks = scales::pretty_breaks(n=7), #for the second y-axis sec.axis = sec_axis(~./100,#divided by transformation rate, in order to be represented based on the first y-axis name = "Number of Fires", breaks = scales::pretty_breaks(7)), )+ xlab("")+ylab("Burnt Areas(ha)")+ ggtitle(paste("The comparison of the volume(ha) and numbers of Forest Fires in", country))+ theme_minimal()+ theme( #Main y-axis axis.title.y = element_text(color = "orange", #puts distance with the main axis labels margin = margin(r=0.5,unit = "cm"), hjust = 1),#sets the main axis title top left #Second y-axis axis.title.y.right = element_text(color = "lightblue", #puts distance with the second axis labels margin = margin(l=0.5,unit = "cm"), hjust = 0.01),#sets the second axis title bottom right #adjusts the date labels to avoid overlap line numbers axis.text.x = element_text(angle = 30, vjust = 1, hjust = 1), panel.grid.minor = element_blank(),#removes the minor grid of panel plot.title=element_text(face = "bold.italic",hjust = 0.5) ) } fun("Turkey")

fun("Italy")

When we take a look above plots, the damaged areas have been quite increased compared to the past years for all countries. Probably the underlying reason, that confused Turkish people was the high ratio of the damaged areas to the number of fires. The other two countries also seem to have the same problem, compared to the past years. Undoubtedly, the reason beyond that is global warming. Of course, the insufficient treats of the governments to the fires had also played a part in the issue.

Now let’s forecast the number of fires this year for determining if there is an anomaly in Turkey. Because our data set is too small, it would be convenient to choose the k-nearest neighbor(KNN) algorithm. We will set ** k **to 3 and 4 for optimization because the squared root of the number of observations is approximately 3,6.

Since the PACF order of the pattern are zero, we set the ** lags** parameter as 1 to 5. In order to determine the order of PACF of the model, we first check the stationary of the data.

#Checking the pattern if it is stationary df %>% filter(country=="Turkey") %>% ggplot(aes(x=year,y=fires_number))+ geom_line(size=1)+ xlab("")+ylab("")+ theme_minimal()

There seems to be exponential growth in the above graph. Hence, we will take the first difference of the data to stabilize the mean.

#Making the time series stationary and determining the order of ACF and PACF df %>% filter(country=="Turkey") %>% select(fires_number) %>% pull() %>% #converts the dataframe column to a vector ts(start = 2009,frequency = 1) %>% #converts vector to a time series diff() %>% #gets first difference of the time series ggtsdisplay()# gets ACF/PACF plots

It seems the mean is stabilized, so we don’t have to take a second difference. To the PACF plot, there is no significant spike beyond the limits; that was why we took the order of PACF as zero.

#Forecasting with KNN df %>% filter(country=="Turkey") %>% select(fires_number) %>% pull() %>% #converts the dataframe column to a vector head(-1) %>% #removes the last observation for prediction ts(start = 2009,frequency = 1) %>% #converts the vector to a time series knn_forecasting(h=1,lags = 1:5, k=c(3,4)) %>% #.$pred (gets prediction value for 2021, which is 424.575) plot() #actual value(the last observation) points(x = 2021, y = last(df$fires_number[df$country=="Turkey"]), pch = 16, col="blue") legend("topleft", legend = c("Predicted Value","Actual Value"), col = c("red","blue"), pch = c(16,16) )

When we analyze the plot, we see that the predicted value for this year is quite higher than the current value, which shows the speculation about the number of fires in Turkey seems to be baseless.

**leave a comment**for the author, please follow the link and comment on their blog:

**DataGeeek**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.