Exploratory data analysis on P/E ratio of Indian Stocks

Posted on March 31, 2014 by saptarsi goswami in R bloggers | 0 Comments

[This article was first published on meet Saptarsi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Price Earnings ratio (P/E) is one of the very popular ratios reported with all stocks. Very simply this is thought as – Current Market Price / Earning per Share. An operational definition of Earning per Share would be Total profit divided by # of Shares . I will redirect interested readers for further reading to

www.investopedia.com/terms/p/price–earningsratio.asp

In this post, I would just like to show, how we can grab P/E data from Web and create some visualizations on it. My focus right now is Indian stocks and I intend to use the below website

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/

So my first step is gearing up for the data extraction and essentially that is the most non-trivial task. As shown in the figure below, there is separate pages for each sector and we need to click on individual links , to go to that page and get the P/E ratios.

Here is something , I did outside ‘r’ , creating a csv file with the sector names , using delimiters while importing text and paste special as transpose , here is how my csv file would look. I would never discourage using multiple tools as this would be required to solve real world issues

So now I can import this in a dataset and read one row at a time and go to necessary URLs , but god have different plans J , it’s not that straightforward

Case 1 : Single word sector names :

We have sector as ‘Banks’and the sector link is as below

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/Banks-Sector

Again it is a no brainer , we can pick up the base url , append the sector name after a forward slash and then append the string ‘-Sector’ , this is true for most single word sector names like ‘FMCG’ , ‘Tyres’ , ‘Heathcare’ etc

Case 2: Multiple words without ‘-‘ , ‘&’ and ‘/’

We have sector as ‘Tobacco Products’ and the sector link is as below

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/Tobacco-Products-Sector

This is also not that difficult apart from adding the ‘-Sector’ we need replace the spaces by a ‘-‘ .

Case 3: Multiple words with a ‘-‘

We have sector name as ‘IT-Software’, where we have to remove other spaces if exiting. There can be several other cases, but for discussion sake , I will limit myself here

Case 4: Multiple words with a ‘/‘

We have sector name as ‘Stock/ Commodity Brokers’, so the “/” needs to be removed

# Reading in dataset

sectorsv1 <- read.csv("C:/Users/user/Desktop/Datasets/sectorsv1.csv")

# Converting to a matrix , this is a practice generally I follow

sectorvm<-as.matrix(sectorsv1)

we can access individual sectors by , sectorvm[rowno,colon]

pe<-c()

cname<-c()

cnt<-0

baseurl<-'http://www.indiainfoline.com/MarketStatistics/PE-Ratios/'

sectorvm<-as.matrix(sectorsv1)

for(i in 1:nrow(sectorvm))

{

securl<-sectorvm[i,1]

# Fixed true indicated the string is to matched as is and is not a regular expression

# Substitution of the different cases as we explained , we will point out using gsub instead of sub

# else only the first instance will be replaced

if(length(grep(‘ ‘,securl,fixed=TRUE))!=1)

{

securl<-paste(securl,'-Sector', sep="")

}

else

{

securl<-gsub(' ', '-', securl, ignore.case =FALSE, fixed=TRUE)

if(length(grep(‘—‘,securl,fixed=TRUE))==1)

{

securl<-gsub(' ---', '-', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep(‘&’,securl,fixed=TRUE))==1)

{

securl<-gsub('&', 'and', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep(‘/’,securl,fixed=TRUE))==1)

{

securl<-gsub('/', '', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep(‘,’,securl,fixed=TRUE))==1)

{

securl<-gsub(',', '', securl, ignore.case =FALSE, fixed=TRUE)

}

securl<-paste(securl,'-Sector', sep="")

}

fullurl<-paste(baseurl,securl, sep="")

print(fullurl)

if (url.exists(fullurl))

{

petbls<-readHTMLTable(fullurl)

# Exploring the tables we found out relevant information on table 2

# Also the data is getting stored as factor , just doing an as.numeric will not suffice

# we need to do an as.character and then an as.numeric

pe<-c(pe,as.numeric(as.character(petbls[[2]]$PE)))

cname<-c(cname, as.character (petbls[[2]]$Company))

cnt = cnt + 1

}

Different functions that we have used are explained as below

readHTMLTables -> Given a url , this function can retrieve the contents of the tag from html page. We need to use appropriate no. for the same. Like in this case we have used table no 2.

Grep, Paste, Gsub are normal string functions, grep finds occurrence of a string in another, paste concatenates and gsub does the act of replacing.

As.numeric(as.character()) had a lasting impressing on my mind as an innocuous and intuitive as.numeric would have left me only with the ranks.

url.exists :-> it is a good idea , to check the existence of the url , given we are dynamically forming the URLs.

Now playing with summary statistics:

We use the describe function from psych package

n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
1797	59.71	76.92	20.09	46.64	29.79	0	587.5	587.5	2.15	7.25	1.81

hist(pe,col='blue',main='P/E Distribution')

We get the below histogram for the P/E ratio , which shows it is nowhere near a normal distribution , with it’s peakedness and skew as confirmed from the summary statistics as well

We will never the less do a normalty test

shapiro.test(pe)
 
        Shapiro-Wilk normality test
 
data:  pe 
W = 0.7496, p-value < 2.2e-16

Basically the null hypothesis is, the values come from a normal distribution and we see the p value to be very insignificant and hence we can easily reject the null.

Drawing a box plot on the P/E ratios

boxplot(pe,col='blue')

Finding the outliers

boxplot.stats(pe)$out
 
 
484.33 327.91 587.50
 
cname[which(pe %in% boxplot.stats(pe)$out)]

[1] "Bajaj Electrical" "BF Utilities"     "Ruchi Infrastr."

Of course no prize guessing we should stay out of these stocks

So if we summarize this is kind of exploratory data analysis on PE ratio of Indian stocks

· We saw, we can get content out of url and html tables

· We added them in a data frame

· Looked at summary statistics , histogram and did a normality test

· Plotted a box plot and found the outliers

To leave a comment for the author, please follow the link and comment on their blog: meet Saptarsi.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Exploratory data analysis on P/E ratio of Indian Stocks

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)