Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Subscribe to TheAutomatic.net via the area on the right side of the page.

The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples.

All of the functions in yahoo_fin are contained within a single module inside yahoo_fin, called stock_info.

You can import all the functions at once like this:

from yahoo_fin.stock_info import *



One of the core functions available is called get_data, which retrieves historical price data for an individual stock. To call this function, just pass whatever ticker you want:

get_data("nflx") # gets Netflix's data

get_data("aapl") # gets Apple's data

get_data("amzn") # gets Amazon's data



You can also pull data for a specific date range, like below:

get_data("amzn", start_date = "01/01/2017", end_date = "01/31/2017")



Now, suppose you want to pull the price data for all the stocks in the S&P 500. This might take a few minutes, depending on your internet connection, but it can be done like this:

# get list of S&P 500 tickers
sp = tickers_sp500()

# pull data for each S&P stock
price_data = {ticker : get_data(ticker) for ticker in sp}



The above code will create a dictionary where the keys are the S&P tickers, while the values are the corresponding price datasets. If you want to combine the various datasets into a single data frame, you could use functools:

from functools import reduce

combined = reduce(lambda x,y: x.append(y), price_data.values())



This uses the reduce function from functools to collapse the collection of stock price data frames into a single data frame.

## Scraping Financials

Financials, such as income statements, balance sheets, and cash flows can be scraped using yahoo_fin.

Let’s scrape this information for Amazon.

income_statement = get_income_statement("amzn")

balance_sheet = get_balance_sheet("amzn")

cash_flow = get_cash_flow("amzn")



Now, the income_statement variable contains a data frame scraped from here. If we wanted to use this to see how much revenue has changed for Amazon from year-end 2014 to year-end 2016, we could do this:

# get total revenue by year for last three years
revenues = income_statement[income_statement.Revenue == "Total Revenue"].\
iloc[0][1:].map(int).tolist()

# see change in revenue over two-year period
100 * (revenues[0] / revenues[2] - 1)



Let’s look at the balance sheet result. If you print the balance_sheet variable, you’ll see it contains a data frame scraped from this link.

Suppose you want to see how Amazon’s inventory has changed over the last three year-ends. This information is available on its balance sheet, so you could figure that out using the below code:

inventory = balance_sheet[balance_sheet["Period Ending"] == "Inventory"].\
iloc[0][1:].map(int).tolist()

100 * (inventory[0] / inventory[2] - 1)



Parsing information from the cash flow result is similar to the above examples.

## Getting major stock holders

Getting the major holders of a stock can be done with the get_holders function.

holders = get_holders("amzn")



If you run the above line of code, you’ll see it returns a dictionary. The keys of the dictionary correspond with the headers on the Holders Page i.e. “Major Holders”, “Top Mutual Fund Holders”, “Top Institutional Holders” etc. The values are the corresponding tables that are shown beneath these respective headers on the Holders Page.

For instance, if you want to get the largest institutional holder, you could write this:

info = holders["Top Institutional Holders"]

print(info.Holder[0])



The institutional holders are sorted in decreasing order by number of shares owned, so the 0th-indexed record contains the holder with the most number of shares. If you want to get the average number of shares owned by the ten top ten institutional holders, or average value invested, you could run the below code:

# get average number of shares owned by top 10
# institutional investors
info.Shares.mean()

# similarly, get average value invested
info.Value.mean()



## Pulling analysts information

Data from the Analysts page can be scraped using the get_analysts_info function.

analysts_data = get_analysts_info("amzn")



Similarly to pulling holder information, the get_analysts_info function will return a dictionary. In this case, the keys are also the headers of the webpage the data is being scraped from — i.e. the Analysts page for the particular stock. For the Analysts page, this means the keys include “Earnings Estimate”, “Revenue Estimate”, “EPS Trends” etc. Once again, the values are the corresponding tables beneath each of these respective headers.

## Ticker lists

Yahoo_fin also provides functions to pull ticker lists. An example earlier in this post showed how to get the tickers in the S&P 500, but you can also pull the ones comprising the Dow Jones, or the NASDAQ.

# get list of Dow stocks
dow_list = tickers_dow()

# get list of NASDAQ stocks
nasdaq_list = tickers_nasdaq()



Please subscribe to my website via the subscription area on the right side of the page. For other web scraping articles on this site, please see here.

The post Coding with the Yahoo_fin Package appeared first on Open Source Automation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.