Data Hacking with RDSTK 3

February 16, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

RDSTK is a very versatile package. It includes functions to help you convert IP address to geo locations and derive statistics from them. It also allows you to input a body of text and convert it into sentiments.

This is a continuation from the last exercise RDSTK 2
We are going to use the function that we created in our last exercise to have a programmatic way to derive statistics using the coordinates2statistics() function. Last week we talked about local and global variables. This is important to understand before proceeding. Also refresh on ip2coordinates() function.

This package provides an R interface to Pete Warden’s Data Science Toolkit. See for more information click here.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
This week we will give you bigger and badder list to work with. Its a list of more a dozen proxy ip-addresses from the internet. Run the code


list=c("97.77.104.22","104.199.228.65","50.93.204.169","107.189.46.5","104.154.142.10","104.131.255.12","209.212.253.44","70.248.28.23","52.119.20.75","192.169.168.15","47.88.31.75 80","107.178.4.109","152.160.35.171","104.236.54.196","50.93.197.102","159.203.117.1","206.125.41.132","50.93.201.28","8.21.67.248 31","104.28.16.199")

Exercise 2

Remember how we used iterators to run through each location and derive the stats with ip2coordinates() function in the first rdstk exercise. Lets do the same here. Store the results in df

Exercise 3

If you came this far, great. Lets recall the function that we created in exercise 2. If you do not remember the function, here is the code for it. Run the code below and then run stat_maker(“population_density”). You should see a new column called pop

stat_maker=function(s2){
s1="statistics"
s3="value"
s2=as.character(s2)
for (i in 1:nrow(df)) {
df$pop[i] <<-coordinates2statistics(df[i,3],df[i,6],s2)[paste(s1,s2,s3, sep = ".")]
assign("test2",50,envir = .GlobalEnv)

}
}

You should see an output in the format “statistics.hello.value”

Exercise 4

Modify the function so that the function accepts a string and returns out a global variable that holds the elements of that string statistic. For example if you input elevation, the function will create a global variable called elevation with the results from the for loop stored

Exercise 5

Test out the function.


stat_maker("elevation")

Exercise 6

Test the function stat_maker. stat_maker(“population_density”). Notice it did not explicitly make the changes to the df but just returned it once you called the function. This is because we did not define df as a global variable. But thats okay. We will learn it later

Exercise 7

Great. Now before we modify our function, lets learn how we can make a global variable inside a function. Use the same code from exercise 5 but this time instead of defining df$pop2 as a local variable, define it as a global variable. Run the function and test it again.

Exercise 8

Run the code

stat_maker("us_population_poverty")

Notice that our function does not work for this case. This is because anything with the prefix us_population will return a dataframe with a column value like statistics.us_population.value
So you need to modify the function a little to accomodate for this.

Exercise 9

Run the following commands. You can also use any string starting with us_population fo this function. But the goal is to make global variables that hold this data. You can refer to the whole list of statistic funtions at www.datasciencetoolkit.org

stat_maker("us_population")
stat_maker("us_population_poverty")
stat_maker("us_population_asian")
stat_maker("us_population_bachelors_degree")
stat_maker("us_population_black_or_african_american")
stat_maker("us_population_black_or_african_american_not_hispanic ")
stat_maker("us_population_eighteen_to_twenty_four_years_old")
stat_maker("us_population_five_to_seventeen_years_old")
stat_maker("us_population_foreign_born")
stat_maker("us_population_hispanic_or_latino")

Exercise 10

Use cbind command to bind all the global variables into the df. Print the results of df.

Note: You can chose to make this df in other ways but this method was used to guide through modifying functions, global/local variables and working with strings.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)