R Weekly Bulletin Vol – IV

April 15, 2017
By

(This article was first published on R programming, and kindly contributed to R-bloggers)

This week’s R bulletin will cover topics like removing duplicate rows, finding row number, sorting a data frame in the same order, sorting a data frame in different order, and creating two tabs in an excel workbook. We will also cover functions like Sys.time, julian, and the second & minute function from the lubridate package. Hope you like this R weekly bulletin. Enjoy reading!

Shortcut Keys

1. To open a file in R – Ctrl+O
2. To open a new R script – Ctrl+Shift+N
3. To save an R script – Ctrl+S

Problem Solving Ideas

Remove duplicate rows in R

Consider the following data frame “df” comprising of 3 rows of a 1-minute OHC of a stock.

Example:

open = c(221, 221.25, 221.25) 
high = c(221.75, 221.45, 221.45) 
close = c(221.35, 221.2, 221.2) 
df = data.frame(open, high, close) 
print(df)

As can be seen from the output, the 2nd and the 3rd rows are duplicates. If we want to retain only the unique rows, we can use the duplicated function in the following manner:

df[!duplicated(df), ]

This will remove the 3rd duplicate row, and retain the first 2 rows. In case we want the duplicate row, we can do so in the following way:

df[duplicated(df), ]

Return row number for a particular value in a column

To find the row number for a particular value in a column in a vector/data frame we can use the “which” function.

Example 1:

day = c("Mon", "Tue", "Wed", "Thurs", "Fri", "Sat", "Sun") 
row_number = which(day == "Sat") 
print(row_number)

[1] 6

Example 2: Consider a 2 column data frame “df” comprising of the stock symbols and their respective closing price for the day. To find the row number corresponding to the HCL stock we call the “which” function on the Ticker column with its value selected as HCL.

Ticker = c("INFY", "TCS", "HCL") 
ClosePrice = c(2021, 2294, 910) 
data = data.frame(Ticker, ClosePrice) 

row_number = which(data$Ticker == "HCL") 
print(row_number)

[1] 3

Sorting a data frame by two columns in the same order

To sort a data frame by two columns in the same order, we can use the “order” function and the “with” function. Consider a data frame comprising of stock symbols, their categorization, and the percentage change in price. We first sort the data frame based on category and then sort based on the percentage change in price.

The order function by default sorts in an ascending manner. Hence to sort both the columns in descending order we keep the decreasing argument as TRUE.

Example – Sorting a data frame by two columns

# Create a data frame 
Ticker = c("INFY", "TCS", "HCLTECH", "SBIN") 
Category = c(1, 1, 1, 2) 
Percent_Change = c(2.3, -0.25, 0.5, 0.25) 

df = data.frame(Ticker, Category, Percent_Change) 
print(df)

# Sorting by Category column first and then the Percent_Change column: 
df_sort = df[with(df, order(Category, Percent_Change, decreasing = TRUE)), ] 
print(df_sort)

Sorting a data frame by two columns in different order

To sort a data frame by two columns in different order, we can use the “order” function along with the “with” function.

Consider a data frame comprising of stock symbols, their categorization, and the percentage change in price. Assume that we want to sort first in an ascending order by column “Category”, and then by column “Percent_Change” in a descending order.

The order function by default sorts in an ascending manner. Hence, to sort the “Category” column we mention it as the first variable in the order function without prepending it with any sign. To sort the “Percent_Change” column in a descending order we prepend it with a negative sign.

Example – Sorting a data frame by two columns

# Create a data frame 
Ticker = c("INFY", "TCS", "HCLTECH", "SBIN") 
Category = c(1, 1, 1, 2) 
Percent_Change = c(2.3, -0.25, 0.5, 0.25) 

df = data.frame(Ticker, Category, Percent_Change) 
print(df)

# Sort by Category column first and then the Percent_Change column: 
df_sort = df[with(df, order(Category, -Percent_Change)), ] 
print(df_sort)

Creating two tabs in the output excel workbook

At times we want to write & save the multiple results generated after running our R script in an excel workbook on separate worksheets. To do so, we can make use of the “append” argument in the write.xlsx function.To write to an excel file, one must first install and load the xlsx package in R.

Example: In this example, we are creating two worksheets in the “Stocks.xlsx” workbook. In the worksheet named, “Top Gainers”, we save the table_1 output, while in the “Top Losers” worksheet we save the table_2 output. To create a second worksheet in the same workbook, we keep the append argument as TRUE in the second line.

write.xlsx(table_1, "Stocks.xlsx", sheetName = "Top Gainers", append = FALSE) 
write.xlsx(table_2, " Stocks.xlsx", sheetName = "Top Losers", append = TRUE)

Functions Demystified

Sys.time function

The Sys.time function gives the current date and time.

Example:

date_time = Sys.time() 
print(date_time)

[1] “2017-04-15 16:25:38 IST”

The function can be used to find the time required to run a code by placing this function at the start and the end of the code. The difference between the start and the end will give the time taken to execute the code.

julian function

The julian function is used to extract the Julian date, which is the number of days since January 1, 1970. Given a date, the syntax of the function is given as:

julian(date)

Example:

date = as.Date("2010-03-15") 
julian(date)

[1] 14683
attr(,”origin”)
[1] “1970-01-01”

Alternatively one can use the as.integer function to get the same result

second and minute functions

These functions are part of the lubridate package. The second function retrieves/sets the second component of a date-time object, while the minute function retrieves/sets the minute component. It allows Date-time objects like those belonging to the POSIXct, POSIXlt, Date, zoo, xts, and the timeSeries objects.

Example:

library(lubridate) 
# Retrieving the seconds We have used the ymd_hms function to parse the given object. 
x = ymd_hms("2016-06-01 12:23:45") 
second(x)

[1] 45

minute function:

library(lubridate) 
# Retrieving the minute from a date-time object 
x = ymd_hms("2016-06-01 12:23:45") 
minute(x)

[1] 23

# Retrieving the minute from a time object. We have used the hms function to parse the given object.
x = hms("15:29:06") 
minute(x)

[1] 29

Next Step

We hope you liked this bulletin. In the next weekly bulletin, we will list more interesting ways and methods plus R functions for our readers.

 

 

 

The post R Weekly Bulletin Vol – IV appeared first on .

To leave a comment for the author, please follow the link and comment on their blog: R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)