To do this I will create a prediction of the open values for Bitcoin in the next 3 days.
The process I follow is based on CRISP-DM methodology: https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome
1.- Planning the activities.
To plan the activities I use a spread sheet document, below I show the spread sheet sample, if you would like the document, please go to the next link:
|Activity||Activity Description||DueDate||Activity Owner||Status||Comments|
|Functional Requirement Specification||A Text Document explaining the objectives of this project.||4/19/2018||Carlos Kassab||Done|
|Get Data For Analysis||Get initial data to create feasibility analysis||4/19/2018||Carlos Kassab||Done|
|ETL Development||ETL to get final data for next analysis||4/20/2018||Carlos Kassab||Done||2018/04/19: In this case, there is not ETL needed, the dataset was downloaded from kaggle: https://www.kaggle.com/vivekchamp/bitcoin/data|
|Exploratory Data Analysis||Dataset summary and histogram to know the data normalization||4/20/2018||Carlos Kassab||In progress|
|Variables frequency||Frequency of variable occurrence( frequency of values change, etc. )||4/20/2018||Carlos Kassab||Done||2018/04/19: We have already seen that our values change every day.|
|Outliers Analysis||Analysis of variability in numeric variables, show it in charts and grids..||4/20/2018||Carlos Kassab||In progress|
|Time Series Decomposition||– Getting metric charts, raw data, seasonality, trend and remainder.||4/20/2018||Carlos Kassab||In progress|
|Modelling||Create the analytics model||4/25/2018||Carlos Kassab||Not Started|
|SQL View Development||For Training, Validation And Testing||NA||Carlos Kassab||Not Started||2018/04/19: No SQL view needed, everything is done inside the R script.|
|Model Selection||By using random parameters search algorithm, to find the right model to be used for this data.||4/25/2018||Carlos Kassab||Not Started|
|Model fine tunning.||After finding the right algorithm, find the right model parameters to be used.||4/25/2018||Carlos Kassab||Not Started|
|Chart Development||Final data chart development||4/25/2018||Carlos Kassab||Not Started|
|Data Validation||Run the analytics model at least 2 weeks daily in order to see its behavior.||NA||Carlos Kassab||Not Started|
|Deployment||Schedule the automatic execution of the R code.||NA||Carlos Kassab||Not Started|
The first activity is the functional specification, this would be similar to business understanding in the crisp-dm methodology.
I use a text document, for this analysis, you can get the document from this link:
Now, the next step is to get the data for analysis and create the ETL script, in this case we just got the data from kaggle.com as mentioned in the documentation but, no ETL script was needed.
So, we have our data, now we are going to analyze it, we will do all the activities mentioned in yellow in the grid above. I did this in a MarkDown document, here you can see the HTML output:
Note. At the end, in order to have everything together I included the time series algorithms in the same rmd file that creates BitcoinDataAnalysis.html file.
You can get all the sources from:
More information about R: