In this post we’re going to use the dataset obtained in our previous post to achieve the following goals:
- Try to answer some common questions about the nature of this data writing custom R functions and
- Make a nice animated GIF with it using R and the command line in Linux.
The whole script, if you want to reproduce completely this post is here.
Ok, now if you’re reading this post for the first time and don’t want go back to the previous one to understand what data we are talking about, a brief summary of it is:
The Pesos-Dollars dataset is a dataset containing the exchange rate ARS/USD from 2002-01-11 to 2018-06-05.
In our previous post we could spot some spikes in the plot of Exchange rate vs. Date.
These spikes means that some sudden changes in the price of the reference currency occurred. But now, how can we measure these changes?
Analyzing our data with a custom function in R
The first idea that comes into mind is to measure a percentage change from one day to another in the price of the USD. So, this Daily Percentage Change (DPC) could give us an idea of how sudden a change in the price is. The higher the number, the higher the daily variation.
So we need to compute the following for every pair of values in our dataset:
where is the price in the day i and is the price in the day after.
We wrote the following R code that gives us the function to calculate the DPC using our dataset.
The next thing to do is to run our function and plot the results:
# Run the function dailyPercentageChange <- DPC(input_data = pesoDollarDataSet,index = 2) # Choose the colors to plot colores <- c("orange3", "steelblue4", "red","yellowgreen","gray20") # Set the colors palette(colores) # Scatterplot plot(x = dailyPercentageChange$fecha, y = dailyPercentageChange$variation, ylab = "Percentage Change (%)", xlab = "Date", main = "Daily percentage change (DPC) for the exchange rate ARS/USD \n in the period 2002-01-11 to 2018-06-05", cex.main = 1.0, type = "o", lty = 3, col = dailyPercentageChange$Presidencia, cex = 0.7) # Add legend legend("topleft", legend = levels(dailyPercentageChange$Presidencia), col = 1:5, lwd = c(3,3,3,3), cex = 0.6, box.lty = 0)
Now we can see that the sudden variations in the price of the USD are more evident in this plot. The next thing we can do is to ask ourselves the following questions:
- Which were the values of the top ten DPC increases during this period?
- During which presidencies did they occur?
The first question could be easily answered using an R base function called
order() to sort out our dataframe and immediately after that indexing the resulting dataframe from rows 1 up to 10:
topTenIncrease <- dailyPercentageChange[order(dailyPercentageChange$variation, decreasing = T),][1:10,] # Print print(topTenIncrease) fecha divisa_venta Presidencia variation 3561 2015-12-16 9.826 MM 41.970283 45 2002-03-21 2.380 ED 23.949580 3 2002-01-15 1.700 ED 11.764706 3100 2014-01-22 7.140 CFK2 8.543417 4421 2018-05-02 21.200 MM 8.490566 20 2002-02-14 1.800 ED 8.333333 60 2002-04-17 2.870 ED 8.013937 4432 2018-05-13 23.260 MM 7.437661 46 2002-03-22 2.950 ED 6.779661 2 2002-01-14 1.600 ED 6.250000
Now if we want to know during which presidencies these top-10 highest DPC’s did occur then we can simply use another R base function, in this case
table() will give us the answer.
> table(topTenIncrease$Presidencia) CFK1 CFK2 ED MM NK 0 1 6 3 0
Up to this point we know that during this period 6 out of the top-ten’s DPC corresponds to Duhalde’s presidency, 3 to the current Macri’s presidency and 1 to the Fernandez’s second term. Moreover at the fourth column named ‘variation’ we can see that these DPC’s values are sparse ranging from 6% to 41% approximately. So the next thing that we could do to further clarify this results is to count these values using an histogram.
hist(x = topTenIncrease$variation,
ylab = “Frequency”,
xlab = “% DPC increase”,
col = “gray70”,
main = “”)
From this histogram we can see that 7 out of the top-ten highest DPC’s range from 0%-10% and only one for the others intervals. But again, we don’t know during which presidencies these highest and lowest variations occurred.
Now we are going to need a new function that given our dataset and the desired ranges of the intervals gives us a new dataset with the counts for every president in every interval. So we named this function
countNumberInBreaks() and the script is the following:
The idea here is to plot something quite similar to the histogram but not exactly an histogram but a stacked barplot of counts by president given the intervals used by the histogram. Let’s see how it looks.
# Run countNumberInBreaks() topTenIncreaseCountedByPresidents <- countNumberInBreaks() # Print the results print(topTenIncreaseCountedByPresidents) CFK1 CFK2 ED MM NK 0-10 0 1 4 2 0 10-20 0 0 1 0 0 20-30 0 0 1 0 0 30-40 0 0 0 0 0 40-50 0 0 0 1 0 # Choose the colors colores <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><- c("orange3", "steelblue4", "gray20", "red","yellowgreen") # Set the colors palette(colores) # Barplot barplot(height = t(as.matrix(topTenIncreaseCountedByPresidents)), col = 1:5, ylab = "Frequency", xlab = "% DPC increase", border = NA) # Add legend legend("topright", legend = c("Macri", "Duhalde", "Fernandez 2"), # legend = levels(dailyPercentageChange$Presidencia), col = c(4,3,2), cex = 1, pch = 15, box.lty = 0)
From this plot now we can see that during the 2002-2018 period the top-ten highest DPC‘s were:
- During Duhalde’s government (from 2002-Jan-02 to 2003-May-25) we had 6/10 of the highest DPCs in the period.
- 4 of this DPC increases were from 0% – 10%
- 1 from 10% – 20%
- 1 from 20% – 30%
- During Fernandez’s second term government (from 2011-Dec-10 to 2015-Dec-10):
- 1 DPC increase in the range from 0% – 10%
- During Macri’s government (from 2015-Dec-10 to 2018-Jun-05)
- 2 DPC increases from 0 % – 10 %
- 1 DPC increase from 40 % – 50 %
Well, it’s enough for now. We are not going to draw any conclusions from this or to make any interpretation. That’s maybe, if you want your task. We also invite you to ask here other questions and maybe we can answer it.
Making an animated plot
Now, to finish this post we are going to do a (.GIF) animated file with ImageMagick from the terminal in Linux.
The idea behind the animation is to make a set of multiple .png or .jpeg files using a loop in R, save this individual files (think of them as if they were film frames) and then from the terminal use convert of ImageMagick to make the animated plot.
So to start we write a function in R called
plotPNGs() that basically is a loop that saves locally multiple plots that will be used later with
convert to create the .GIF file.
Then, we create a directory “plot1” to save the files and run the function:
> plotPNGs(dailyPercentageChange, steps = 23, outputPath = "plot1/")
We choose the parameter
steps = 23 because we decide to use approximately 200 frames in our animation, so 23 comes out from
This gives us 194 .PNG files. Finally we go to the directory where the files were created and use
convert to make the .GIF file.
convert *.png gifPlot1.gif
We repeat this process for another data and we get:
And now we have two animated plots that show us the evolution of the ARS/USD exchange rate in the considered period where we can clearly see the sudden changes in its value.