FII and DII turnover with effect on Nifty Downloader

August 8, 2011
By

(This article was first published on My Paper Trades, and kindly contributed to R-bloggers)

My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and automate this process in R. Wow! Some more learning of R, which would eventually help me in building my strategies!

Unlike in NSE EOD Bhavcopy and BSE EOD Bhavcopy downloaders, I tried a different approach to download FII + DII stats from NSEIndia website (as if I had a choice), along with the index Nifty values and change over previous day. The National Stock Exchange of India has different structures for different pages. To download index values, you have to refer to the link http://www.nseindia.com/content/indices/ind_histvalues.htm whereas for FII and DII stats, you need to visit http://www.nseindia.com/content/equities/eq_fiidii_archives.htm.

Both of these pages are HTML forms, you have to enter parameters like index name, start and end dates. Another challenge is that each of these pages displays a maximum of 100 rows in table and if you need all rows (more than 100), you have to download a csv (dynamically generated in temp location). I faced problem in accessing the csv, as for older dates, the link generated was not valid. Hence, I decided to read the page itself, parse the table to be consumed as data frame. For instances, where the parameters generate more than 100 rows, I decided to use while loop.

OK, now here is the solution for it.

I am using the packages RHTMLForms, RCurl, and XML. While RCurl and XML packages are available at R repositories, RHTMLForms package is maintained by omegahat, and can be installed by using the following command

install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")

Another peculiar problem faced is that NSE India website has enforced check for useragent, which if not specified explicitly in R, would not allow access to the desired data.

This can be achieved by defining RCurl settings before loading the library. The command is

options(RCurlOptions = list(useragent = "R"))
library(RCurl)

NOTE: 9-Aug-2011, I have modified the code a bit to enable this script with optional input parameters, like Start Date as 16-April-2007, from which both FII and DII stats are available, along with End Date as current date. Another change that has been made is to check, if file already exist; if it does not exist, the script creates the file, if it exists, it reads the last entry in the file, and starts downloading records beyond that to current date.


outputDir = "D:\\FII Stats"
filename = "FII-DII-Nifty.csv"


startDate = as.Date("2007-04-16", order="ymd")
endDate =Sys.Date()


The code can be downloaded from here.

#install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")

 
library(RHTMLForms)
options(RCurlOptions = list(useragent = "R"))
library(RCurl)
library(XML)
library(timeDate)
 
blnfileExist=FALSE
 
######################################################################
# User Input
outputDir = "D:\\FII Stats"
filename = "FII-DII-Nifty.csv"
######################################################################
 
 
######################################################################
# Optional User Input
startDate = as.Date("2007-04-16", order="ymd")
endDate =Sys.Date()
 
######################################################################
 
 
 
# If file exists, read the file to retrieve dates for
# which information already exists
 
if (file.exists(file.path(outputDir, filename)))
{
existingStats <- read.csv(file.path(outputDir, filename), header=TRUE, sep=",")
 
# Read Last date in the csv and add 1 to begin with next date range
startDate = end(as.timeDate(existingStats$Date))
startDate = as.Date(startDate)+1
blnfileExist = TRUE
}else
{
# User defined startDate
existingStats <- NULL
blnfileExist = FALSE
}
 
 
# Read the HTML page since we cannot use htmlParse() directly
# as it does not specify the user agent or an
# Accept:*.*
 
urlNifty <- "http://www.nseindia.com/content/indices/ind_histvalues.htm";
urlFIIDIIEq <- "http://www.nseindia.com/content/equities/eq_fiidii_archives.htm"
 
contentNifty = getURLContent(urlNifty)
contentFIIDIIEq = getURLContent(urlFIIDIIEq)
 
# Now that we have the page, parse it and use the RHTMLForms
# package to create an R function that will act as an interface
# to the form.
 
docNifty = htmlParse(contentNifty, asText = TRUE)
docFIIDIIEq = htmlParse(contentFIIDIIEq, asText = TRUE)
 
# need to set the URL for this document since we read it from
# text, rather than from the URL directly
 
docName(docNifty) = urlNifty
docName(docFIIDIIEq) = urlFIIDIIEq
 
# Create the form description and generate the R
# function "call" the
 
formNifty = getHTMLFormDescription(docNifty)[[1]]
funNifty = createFunction(formNifty)
 
formFIIDIIEq = getHTMLFormDescription(docFIIDIIEq)[[1]]
funFIIDIIEq = createFunction(formFIIDIIEq)
 
# now we can invoke the form from R. We only need 2
# inputs - FromDate and ToDate
 
#Since the NSE URLs only display a maximum of 100 records,
# we shall attempt the range in loop
 
myStDt = startDate
 
while (myStDt <= endDate){
 
if (endDate<= myStDt + 50){
myEnDt = endDate
}else
{
myEnDt = myStDt + 50
}
 
print(paste("Downloading from",myStDt,"to", myEnDt ))
 
Nifty = funNifty(fromDate = as.character(myStDt-5, "%d-%m-%Y"),
toDate = as.character(myEnDt, "%d-%m-%Y"), indexType="S&P CNX NIFTY")
FIIDIIEq = funFIIDIIEq (fromDate = as.character(myStDt, "%d-%m-%Y"),
toDate = as.character(myEnDt, "%d-%m-%Y"), category="all")
 
# Having looked at the tables, I think we want the the 4th one.
tableNifty = readHTMLTable(htmlParse(Nifty, asText = TRUE),
which = 4,
skip.rows = 3,
trim=TRUE,
as.data.frame = TRUE,
header = TRUE,
stringsAsFactors = FALSE)
 
# Having looked at the tables, I think we want the the 4th one.
tableFIIDIIEq = readHTMLTable(htmlParse(FIIDIIEq, asText = TRUE),
which = 4,
trim=TRUE,
as.data.frame = TRUE,
header = TRUE,
stringsAsFactors = FALSE)
 
#Select only FII stats
 
#Format Date Column from string to Date type
tableFIIDIIEq$Date <- as.Date(tableFIIDIIEq$Date, format="%d-%b-%Y")
 
#Order by Date in Ascending Order
tableFIIDIIEq<-tableFIIDIIEq[order(tableFIIDIIEq$Date,decreasing = TRUE),]
 
dfFIIEq <-subset(tableFIIDIIEq, Category=="FII")
colnames(dfFIIEq)[3] <- "FIIBuyValue"
colnames(dfFIIEq)[4] <- "FIISellValue"
colnames(dfFIIEq)[5] <- "FIINetValue"
 
#Select only DII stats
dfDIIEq <-subset(tableFIIDIIEq, Category=="DII")
colnames(dfDIIEq)[3] <- "DIIBuyValue"
colnames(dfDIIEq)[4] <- "DIISellValue"
colnames(dfDIIEq)[5] <- "DIINetValue"
 
#Merge FII and DII Stats
dfFIIDIIEq<-merge(dfFIIEq,dfDIIEq, by.x="Date", by.y="Date")
 
#Convert into numeric for addition
dfFIIDIIEq$DIINetValue <- as.numeric(dfFIIDIIEq$DIINetValue)
dfFIIDIIEq$FIINetValue <- as.numeric(dfFIIDIIEq$FIINetValue)
 
# Find Effective Inflow
dfFIIDIIEq$EffectiveInflow <- dfFIIDIIEq$FIINetValue + dfFIIDIIEq$DIINetValue
 
#Process Nifty
dfNifty <- tableNifty
colnames(dfNifty)[6] <- "Volume"
colnames(dfNifty)[7] <- "TurnoverInCr"
 
#Format Date Column from string to Date type
dfNifty$Date <- as.Date(dfNifty$Date, format="%d-%b-%Y")
 
#Format CLose Column from string to Numeric
dfNifty$Close <- as.numeric(dfNifty$Close)
 
#Order by Date in Ascending Order
dfNifty <-dfNifty[order(dfNifty$Date,decreasing = TRUE),]
 
#Create a function to find out Change in Nifty Closing over previous day
FUN=function(a) c(diff(a), NA)
 
dfNifty$Change<- FUN(dfNifty$Close)*-1
 
#Store the merged dataframe in temp and then join it with earlier results
temp <- merge(dfNifty,dfFIIDIIEq, by.x="Date", by.y="Date")
 
if (startDate == myStDt){
dfNiftyFIIDIIEq = temp
}else
{
dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,temp)
}
 
myStDt = myStDt + 51
}
 
closeAllConnections()
 
######################################################################
#Get Rid of unwanted columns
 
dfNiftyFIIDIIEq$Category.x <- NULL
dfNiftyFIIDIIEq$Category.y <- NULL
# dfNiftyFIIDIIEq$Open <- NULL
# dfNiftyFIIDIIEq$High <- NULL
# dfNiftyFIIDIIEq$Low <- NULL
# dfNiftyFIIDIIEq$Volume <- NULL
# dfNiftyFIIDIIEq$TurnoverInCr<- NULL
 
# If file exists, merge the retrieved records
 
if (blnfileExist)
{
print("Appending to existing file")
#existingStats$Date <- as.Date(existingStats$Date, format = "%d-%m-%Y")
dfNiftyFIIDIIEq$Date <- as.Date(dfNiftyFIIDIIEq$Date, format = "%d-%m-%Y")
dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,existingStats)
dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]
}else
{
print("Writing to a new file")
dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]
}
#Finally write the csv file
write.csv(dfNiftyFIIDIIEq ,file=file.path(outputDir, filename),row.names = FALSE)
Created by Pretty R at inside-R.org
Next steps, add more indices in this list and try to figure out any co-relation.

To leave a comment for the author, please follow the link and comment on his blog: My Paper Trades.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.