Basic Tree 1 Exercises

December 9, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

treeplanting
Using the knowledge you acquired in the previous exercises on sampling and selecting(here), we will now go through an entire data analysis process. You will be using what you know as crutches to solve the problems. Don’t worry. It might look intimidating but follow the sequence and you will see that modeling a decision tree is the best decision you made today. We will take you through all stages of the data pipeline. From Data loading,feature selection, sampling, plotting, modelling and evaluating a decision tree.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Use read.csv() command to load the lenses.csv data and store it in lens. Use the str() command to see lens. Download the dataset from here

Exercise 2
Notice there are no column names. The column names are as follows
index, age, spec_pres, astigmatic, tpr. Use one line code to change the column names to the aforementioned names.

Exercise 3
Given the meta data

age: (1) young, (2) pre-presbyopic, (3) presbyopic
spec_pres: (1) myope, (2) hypermetrope
astigmatic: (1) no, (2) yes
tpr: (1) reduced, (2) normal
class: (1) patient needs hard contact lens, (2) patient needs soft contact lens, (3) patient does not need contact lens

Type the code lens$age[lens$age == "1"]="young"
Use the same format to change all the data to its names for the age and spec_pres variables.

Exercise 4
Use the str() command to see the changes. Also notice that the astigmatic column is a factor that is also storing numbers as characters. To get all of them in the same format, lets convert it to character. Use the code as.character() to convert this column data type to character.

Exercise 5
Now change the astigmatic column data to the right names

Exercise 6
Use the following code to replace the 1 with “reduced in the tpr column

lens$tpr[lens$tpr==1]="reduced"

Now type str(lens) to see the dataframe. Notice that the tpr column data type change to character from integer. Anytime you introduce something that is not a number in a number dataframe, it will become a character.

Exercise 7
Go ahead and replace 2 in the tpr column with “normal”

Exercise 8
use the table() command to see the counts of each data type

Exercise 9
Notice that there is a g in the count. That could possibly be a typo. We can go ahead and remove that row since there is only one row with that typo. Hint: You can select all rows that does not have that typo and store it back in the lens dataframe.

Exercise 10
Great Work. We realized that the index column is not necessary for our modeling purposes. So lets remove the index column.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)