FII and DII turnover with effect on Nifty Downloader

[This article was first published on My Paper Trades, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and automate this process in R. Wow! Some more learning of R, which would eventually help me in building my strategies!
Unlike in NSE EOD Bhavcopy and BSE EOD Bhavcopy downloaders, I tried a different approach to download FII + DII stats from NSEIndia website (as if I had a choice), along with the index Nifty values and change over previous day. The National Stock Exchange of India has different structures for different pages. To download index values, you have to refer to the link http://www.nseindia.com/content/indices/ind_histvalues.htm whereas for FII and DII stats, you need to visit http://www.nseindia.com/content/equities/eq_fiidii_archives.htm.
Both of these pages are HTML forms, you have to enter parameters like index name, start and end dates. Another challenge is that each of these pages displays a maximum of 100 rows in table and if you need all rows (more than 100), you have to download a csv (dynamically generated in temp location). I faced problem in accessing the csv, as for older dates, the link generated was not valid. Hence, I decided to read the page itself, parse the table to be consumed as data frame. For instances, where the parameters generate more than 100 rows, I decided to use while loop.
OK, now here is the solution for it.

I am using the packages RHTMLForms, RCurl, and XML. While RCurl and XML packages are available at R repositories, RHTMLForms package is maintained by omegahat, and can be installed by using the following command
install.packages(‘RHTMLForms’, repos = “http://www.omegahat.org/R”)

Another peculiar problem faced is that NSE India website has enforced check for useragent, which if not specified explicitly in R, would not allow access to the desired data.
This can be achieved by defining RCurl settings before loading the library. The command is

options(RCurlOptions = list(useragent = “R”))
library(RCurl)

NOTE: 9-Aug-2011, I have modified the code a bit to enable this script with optional input parameters, like Start Date as 16-April-2007, from which both FII and DII stats are available, along with End Date as current date. Another change that has been made is to check, if file already exist; if it does not exist, the script creates the file, if it exists, it reads the last entry in the file, and starts downloading records beyond that to current date.


outputDir = “D:\\FII Stats”
filename = “FII-DII-Nifty.csv”


startDate = as.Date(“2007-04-16″, order=”ymd”)
endDate =Sys.Date()

The code can be downloaded from here.


#install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")

 

library(RHTMLForms)

options(RCurlOptions = list(useragent = "R"))

library(RCurl) 

library(XML)  

library(timeDate)

 

blnfileExist=FALSE

 

######################################################################

# User Input

outputDir = "D:\\FII Stats"

filename = "FII-DII-Nifty.csv"

######################################################################

 

 

######################################################################

# Optional User Input

startDate = as.Date("2007-04-16", order="ymd")

endDate =Sys.Date()

 

######################################################################

 

 

 

# If file exists, read the file to retrieve dates for 

# which information already exists

 

	if (file.exists(file.path(outputDir, filename)))

	{

		existingStats <- read.csv(file.path(outputDir, filename), header=TRUE, sep=",")

 

		# Read Last date in the csv and add 1 to begin with next date range

		startDate = end(as.timeDate(existingStats$Date))

		startDate = as.Date(startDate)+1

		blnfileExist = TRUE

	}else

	{

		# User defined startDate 

		existingStats <- NULL

		blnfileExist = FALSE

	}

 

 

 # Read the HTML page since we cannot use htmlParse() directly

 # as it does not specify the user agent or an

 # Accept:*.*

 

urlNifty <- "http://www.nseindia.com/content/indices/ind_histvalues.htm";

urlFIIDIIEq <- "http://www.nseindia.com/content/equities/eq_fiidii_archives.htm"

 

contentNifty = getURLContent(urlNifty)

contentFIIDIIEq = getURLContent(urlFIIDIIEq)

 

 # Now that we have the page, parse it and use the RHTMLForms

 # package to create an R function that will act as an interface

 # to the form.

 

docNifty = htmlParse(contentNifty, asText = TRUE)

docFIIDIIEq = htmlParse(contentFIIDIIEq, asText = TRUE)

 

  # need to set the URL for this document since we read it from

  # text, rather than from the URL directly

 

docName(docNifty) = urlNifty

docName(docFIIDIIEq) = urlFIIDIIEq

 

  # Create the form description and generate the R

  # function "call" the

 

formNifty = getHTMLFormDescription(docNifty)[[1]]

funNifty = createFunction(formNifty)

 

formFIIDIIEq = getHTMLFormDescription(docFIIDIIEq)[[1]]

funFIIDIIEq = createFunction(formFIIDIIEq)

 

  # now we can invoke the form from R. We only need 2

  # inputs  - FromDate and ToDate

 

#Since the NSE URLs only display a maximum of 100 records, 

#	we shall attempt the range in loop 

 

myStDt = startDate

 

while (myStDt <= endDate){

 

	if (endDate<= myStDt + 50){ 

		myEnDt = endDate

	}else

	{

		myEnDt = myStDt + 50

	}

 

	print(paste("Downloading from",myStDt,"to", myEnDt ))

 

	Nifty = funNifty(fromDate = as.character(myStDt-5, "%d-%m-%Y"), 

		toDate = as.character(myEnDt, "%d-%m-%Y"), indexType="S&P CNX NIFTY")

	FIIDIIEq = funFIIDIIEq (fromDate = as.character(myStDt, "%d-%m-%Y"), 

		toDate = as.character(myEnDt, "%d-%m-%Y"), category="all")

 

  # Having looked at the tables, I think we want the the 4th one.

	tableNifty = readHTMLTable(htmlParse(Nifty, asText = TRUE),

                        which = 4,

				skip.rows = 3, 

				trim=TRUE,

				as.data.frame = TRUE,

                        header = TRUE,

				stringsAsFactors = FALSE)

 

  # Having looked at the tables, I think we want the the 4th one.

	tableFIIDIIEq = readHTMLTable(htmlParse(FIIDIIEq, asText = TRUE),

                        which = 4,

				trim=TRUE,

				as.data.frame = TRUE,

                        header = TRUE,

				stringsAsFactors = FALSE)

 

	#Select only FII stats

 

	#Format Date Column from string to Date type

	tableFIIDIIEq$Date <- as.Date(tableFIIDIIEq$Date, format="%d-%b-%Y")

 

	#Order by Date in Ascending Order

	tableFIIDIIEq<-tableFIIDIIEq[order(tableFIIDIIEq$Date,decreasing = TRUE),]

 

	dfFIIEq <-subset(tableFIIDIIEq, Category=="FII")

	colnames(dfFIIEq)[3] <- "FIIBuyValue"

	colnames(dfFIIEq)[4] <- "FIISellValue"

	colnames(dfFIIEq)[5] <- "FIINetValue"

 

	#Select only DII stats

	dfDIIEq <-subset(tableFIIDIIEq, Category=="DII")

	colnames(dfDIIEq)[3] <- "DIIBuyValue"

	colnames(dfDIIEq)[4] <- "DIISellValue"

	colnames(dfDIIEq)[5] <- "DIINetValue"

 

	#Merge FII and DII Stats

	dfFIIDIIEq<-merge(dfFIIEq,dfDIIEq, by.x="Date", by.y="Date")

 

	#Convert into numeric for addition

	dfFIIDIIEq$DIINetValue <- as.numeric(dfFIIDIIEq$DIINetValue)

	dfFIIDIIEq$FIINetValue <- as.numeric(dfFIIDIIEq$FIINetValue)

 

	# Find Effective Inflow

	dfFIIDIIEq$EffectiveInflow <- dfFIIDIIEq$FIINetValue + dfFIIDIIEq$DIINetValue

 

	#Process Nifty

	dfNifty <- tableNifty 

	colnames(dfNifty)[6] <- "Volume"

	colnames(dfNifty)[7] <- "TurnoverInCr"

 

	#Format Date Column from string to Date type

	dfNifty$Date <- as.Date(dfNifty$Date, format="%d-%b-%Y")

 

	#Format CLose Column from string to Numeric

	dfNifty$Close <- as.numeric(dfNifty$Close)

 

	#Order by Date in Ascending Order

	dfNifty <-dfNifty[order(dfNifty$Date,decreasing = TRUE),]

 

	#Create a function to find out Change in Nifty Closing over previous day

	FUN=function(a) c(diff(a), NA)

 

	dfNifty$Change<- FUN(dfNifty$Close)*-1

 

	#Store the merged dataframe in temp and then join it with earlier results

	temp <- merge(dfNifty,dfFIIDIIEq, by.x="Date", by.y="Date")

 

	if (startDate == myStDt){ 

		dfNiftyFIIDIIEq = temp

	}else

	{

		dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,temp)

	}

 

	myStDt = myStDt + 51

}

 

closeAllConnections() 

 

######################################################################

#Get Rid of unwanted columns

 

	dfNiftyFIIDIIEq$Category.x <- NULL

	dfNiftyFIIDIIEq$Category.y <- NULL

#	dfNiftyFIIDIIEq$Open <- NULL

#	dfNiftyFIIDIIEq$High <- NULL

#	dfNiftyFIIDIIEq$Low <- NULL

#	dfNiftyFIIDIIEq$Volume <- NULL

#	dfNiftyFIIDIIEq$TurnoverInCr<- NULL

 

# If file exists, merge the retrieved records

 

	if (blnfileExist)

	{

		print("Appending to existing file")

		#existingStats$Date <- as.Date(existingStats$Date, format = "%d-%m-%Y")

		dfNiftyFIIDIIEq$Date <- as.Date(dfNiftyFIIDIIEq$Date, format = "%d-%m-%Y")

		dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,existingStats)

		dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]

	}else

	{

		print("Writing to a new file")

		dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]

	} 

#Finally write the csv file

	write.csv(dfNiftyFIIDIIEq ,file=file.path(outputDir, filename),row.names = FALSE)

Next steps, add more indices in this list and try to figure out any co-relation.

To leave a comment for the author, please follow the link and comment on their blog: My Paper Trades.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)