How to Download Stock Data Using Python¶
M. Fawcett - 06/02/2021
This post shows how to download stock and bond data from Yahoo! Finance using Python and the yfinance package.
The motivation behind this was wanting to have better information about the investments offered in my employer's 403b program. Specifically, I wanted to be able to compare day by day price changes with other indexes such as the Standard and Poor's 500. Doing this manually using the 403b's Web site involved many mouse clicks, waiting for screens to update and then copying and pasting into Excel, so I looked to find an easier method.
Outline of this project¶
- Project setup
- yfinance package basics
- Download data for one ticker symbol
- Download data for a portfolio
- Plot the data
- Compare portfolio with Standard and Poor 500
Project setup¶
You can do an inline install in a Jupyter Notebook of the yfinance package with the following code:
### Load packages
import pandas as pd
import numpy as np
import yfinance as yf
import datetime as dt
import matplotlib.pyplot as plt
from datetime import timedelta
### Magic command to allow inline display of plots in a jupyter notebook.
%matplotlib inline
### Settings to allow the lists of results to not be truncated.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 140)
pd.set_option('display.max_columns', None)
### Make exponentiated values appear as 2 decimal place float.
pd.options.display.float_format = '{:.2f}'.format
yfinance package basics¶
Complete documentation for the yfinance package can be found here: https://pypi.org/project/yfinance/. I'll cover the basics, which was really all I needed to accomplish my goals.
yfinance is pretty good and it's free, but it's not something you should expect to build a commercial app around. The author himself admits that it can be fragile and changes that Yahoo! makes to their Website can break it. Also, you could find yourself throttled or even blacklisted by Yahoo! if you used it to make a massive volume of data requests. That being said, a daily data pull of your portfolio of ticker symbols should not cause a problem.
To get your data, the main thing you need to provide is the ticker symbol. I'll demonstrate using "SPY", the ticker symbol for the Standard and Poor 500 ETF.
The first step is to instantiate a yfinance "Ticker" object by passing a symbol, like this.
tkr = yf.Ticker("SPY")
#tkr is now a yfinance Ticker object for SPY.
Once you have a Ticker object you can start making requests for data. Simply doing the following will return the last 30 days of price activity for the symbol.
spy_data_df = tkr.history()
spy_data_df.head()
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
2021-05-10 | 422.50 | 422.74 | 417.81 | 417.94 | 81852400 | 0 | 0 |
2021-05-11 | 413.10 | 415.27 | 410.06 | 414.21 | 116888000 | 0 | 0 |
2021-05-12 | 411.23 | 412.59 | 404.00 | 405.41 | 134811000 | 0 | 0 |
2021-05-13 | 407.07 | 412.35 | 407.02 | 410.28 | 106394000 | 0 | 0 |
2021-05-14 | 413.21 | 417.49 | 413.18 | 416.58 | 82123100 | 0 | 0 |
Other features of yfinance¶
You can obtain a lot more data for the ticker symbol besides prices. For my purposes the ticker "history" provided all I needed, but you can use other commands like "info" and "dividends" to get useful information.
For example, the "info" attribute returns a dictionary containing dozens of data points about the ticker.
tkr.info
{'previousClose': 421.65, 'regularMarketOpen': 422.96, 'twoHundredDayAverage': 392.55035, 'trailingAnnualDividendYield': None, 'payoutRatio': None, 'volume24Hr': None, 'regularMarketDayHigh': 424.62, 'navPrice': 421.53, 'averageDailyVolume10Day': 52061657, 'totalAssets': 361937436672, 'regularMarketPreviousClose': 421.65, 'fiftyDayAverage': 417.1997, 'trailingAnnualDividendRate': None, 'open': 422.96, 'toCurrency': None, 'averageVolume10days': 52061657, 'expireDate': None, 'yield': 0.013300001, 'algorithm': None, 'dividendRate': None, 'exDividendDate': None, 'beta': None, 'circulatingSupply': None, 'startDate': None, 'regularMarketDayLow': 421.56, 'priceHint': 2, 'currency': 'USD', 'regularMarketVolume': 18413511, 'lastMarket': None, 'maxSupply': None, 'openInterest': None, 'marketCap': None, 'volumeAllCurrencies': None, 'strikePrice': None, 'averageVolume': 76217779, 'priceToSalesTrailing12Months': None, 'dayLow': 421.56, 'ask': 422.17, 'ytdReturn': None, 'askSize': 900, 'volume': 18413511, 'fiftyTwoWeekHigh': 424.62, 'forwardPE': None, 'maxAge': 1, 'fromCurrency': None, 'fiveYearAvgDividendYield': None, 'fiftyTwoWeekLow': 296.74, 'bid': 422.37, 'tradeable': False, 'dividendYield': None, 'bidSize': 900, 'dayHigh': 424.62, 'exchange': 'PCX', 'shortName': 'SPDR S&P 500', 'longName': 'SPDR S&P 500 ETF Trust', 'exchangeTimezoneName': 'America/New_York', 'exchangeTimezoneShortName': 'EDT', 'isEsgPopulated': False, 'gmtOffSetMilliseconds': '-14400000', 'quoteType': 'ETF', 'symbol': 'SPY', 'messageBoardId': 'finmb_6160262', 'market': 'us_market', 'annualHoldingsTurnover': None, 'enterpriseToRevenue': None, 'beta3Year': 1, 'profitMargins': None, 'enterpriseToEbitda': None, '52WeekChange': None, 'morningStarRiskRating': None, 'forwardEps': None, 'revenueQuarterlyGrowth': None, 'fundInceptionDate': 727660800, 'annualReportExpenseRatio': None, 'bookValue': None, 'fundFamily': 'SPDR State Street Global Advisors', 'lastFiscalYearEnd': None, 'netIncomeToCommon': None, 'trailingEps': None, 'lastDividendValue': None, 'SandP52WeekChange': None, 'priceToBook': None, 'nextFiscalYearEnd': None, 'mostRecentQuarter': None, 'enterpriseValue': None, 'threeYearAverageReturn': 0.1696, 'lastSplitDate': None, 'lastSplitFactor': None, 'legalType': 'Exchange Traded Fund', 'lastDividendDate': None, 'morningStarOverallRating': None, 'earningsQuarterlyGrowth': None, 'pegRatio': None, 'lastCapGain': None, 'category': 'Large Blend', 'fiveYearAverageReturn': 0.1691, 'phone': '866-732-8673', 'longBusinessSummary': 'The investment seeks to provide investment results that, before expenses, correspond generally to the price and yield performance of the S&P 500® Index.\n The Trust seeks to achieve its investment objective by holding a portfolio of the common stocks that are included in the index (the â\x80\x9cPortfolioâ\x80\x9d), with the weight of each stock in the Portfolio substantially corresponding to the weight of such stock in the index.', 'companyOfficers': [], 'regularMarketPrice': 422.96, 'logo_url': ''}
# To see a list of all the dividends that have been paid.
tkr.dividends
Series([], Name: Dividends, dtype: int64)
You can also use an "interval" setting to set the granularity of the results and get a minute by minute price history, although you will be limited to 60 days worth of data if you do this. See the yfinance documentation for more information for this and other features.
Download data for one ticker symbol¶
To complete this demonstration of getting price data for one ticker I'll use period = "max" to get the entire history of daily closing prices up to the most recent close.
spy_data_df = tkr.history(period = "max", auto_adjust = False)
spy_data_df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 7143 entries, 1993-01-29 to 2021-06-10 Data columns (total 8 columns): Open 7143 non-null float64 High 7143 non-null float64 Low 7143 non-null float64 Close 7143 non-null float64 Adj Close 7143 non-null float64 Volume 7143 non-null int64 Dividends 7143 non-null float64 Stock Splits 7143 non-null int64 dtypes: float64(6), int64(2) memory usage: 502.2 KB
Note that the daily date values come back in the form of a datetime index on the dataframe.
The ticker symbol itself does not come back as part of the results and you will need to add it into your dataframe yourself.
# Insert a Symbol column
spy_data_df.insert(0, "Symbol", value = "SPY")
spy_data_df.head()
Symbol | Open | High | Low | Close | Adj Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
1993-01-29 | SPY | 43.97 | 43.97 | 43.75 | 43.94 | 25.88 | 1003200 | 0.00 | 0 |
1993-02-01 | SPY | 43.97 | 44.25 | 43.97 | 44.25 | 26.07 | 480500 | 0.00 | 0 |
1993-02-02 | SPY | 44.22 | 44.38 | 44.12 | 44.34 | 26.12 | 201300 | 0.00 | 0 |
1993-02-03 | SPY | 44.41 | 44.84 | 44.38 | 44.81 | 26.40 | 529400 | 0.00 | 0 |
1993-02-04 | SPY | 44.97 | 45.09 | 44.47 | 45.00 | 26.51 | 531500 | 0.00 | 0 |
"Adjusted" prices are important¶
Using auto_adjust = False causes the adjusted closing price to be returned along with the actual daily closing price. Adjusted prices account for stock splits and other business actions. This is important to know when evaluating historical stock prices.
When a stock splits 2 for 1 for example, it's actual price gets cut in half. If you only considered "actual" prices in an analysis, it would look like the stock took a terrible one day nosedive. To make historical comparisons valid, the prior adjusted prices are also cut in half. For more clarity on this, check out the following: https://www.investopedia.com/terms/a/adjusted_closing_price.asp
You will probably want to collect both the adjusted and the actual prices. The adjusted price will be needed for the above mentioned reasons. The actual price will be needed if you want to calculate the percent return of dividends, or to calculate how many shares of stock those dividends could have purchased in a DRIP program. Dividends are not adjusted for stock splits like the the stock price is, so you would need to divide the dividend by the actual price of the stock at the time the dividend was paid. Using the adjusted price in that calculation would overstate the percent return if later splits had occured.
Download data for a portfolio¶
Now that the basics of downloading stock prices are understood, I can write some code that will download price data for a portfolio containing multiple assets.
Rather than hard-code the symbols into my program, I created a text file to hold them and named it "Symbols.txt"
Each line in the file consists of a symbol followed by a comma followed by the name, like this:
AMDVX,Am Cntry Midcp Value
FLCNX,Fdlty Cntrfnd
MDIZX,Mfs Intl Dvrsfctn
Here is my complete list of symbols:
### Load the list of stock symbols into a dataframe.
symbols_df = pd.read_csv("Symbols.txt", header = None, names = ["Symbol", "Name"])
symbols_df
Symbol | Name | |
---|---|---|
0 | AMDVX | Am Cntry Midcp Value |
1 | FLCNX | Fdlty Cntrfnd |
2 | MDIZX | Mfs Intl Dvrsfctn |
3 | PMEGX | T Rowe Midcp Eqty Grwth |
4 | SPY | S & P 500 ETF |
5 | TIREX | Tiaa Crf Real Estat Secrt |
6 | VEVRX | Victory Sycmre Estbls Value |
7 | VEIRX | Vngrd Eqty Incm |
8 | VIEIX | Vngrd Extnd Mrkt Indx |
9 | VWEAX | Vngrd High Yld Corp |
10 | VIIIX | Vngrd Indx |
11 | VIPIX | Vngrd Infl Prtct Secrt |
12 | VSCIX | Vngrd Smcp Indx |
13 | VUSXX | Vngrd Trsy Mny Mrkt |
14 | VBTIX | Vngrd Ttl Bond Mrkt Indx |
15 | VTSNX | Vngrd Ttl Intl Stck Indx |
16 | WACSX | Wstrn Asst Core Bond |
Yahoo! Finance will not return data for every ticker symbol you throw at it. A good example of this is the symbol VUSXX, a treasury money market fund. Using the debug = False setting like I do below, is a simple way to avoid errors caused by missing data.
# Initialize dataframe variable that will hold the results.
all_data_df = None
# Get the price data for each stock in the symbols list and append results to the dataframe
for index, row in symbols_df.iterrows():
STOCK_FUND_SYMBOL = row["Symbol"]
STOCK_FUND_NAME = row["Name"]
tkr = yf.Ticker(STOCK_FUND_SYMBOL)
stock_price_df = tkr.history(period = "max", auto_adjust = False, debug = False)
# debug = False, prevents error messages from being streamed to the console.
# Insert a Symbol column
stock_price_df.insert(0, "Symbol", value = STOCK_FUND_SYMBOL)
# Append the symbol results to a single dataframe so all data is in one dataset.
if all_data_df is None:
all_data_df = stock_price_df
else:
all_data_df = all_data_df.append(stock_price_df, sort = True)
all_data_df.head()
Adj Close | Close | Dividends | High | Low | Open | Stock Splits | Symbol | Volume | |
---|---|---|---|---|---|---|---|---|---|
Date | |||||||||
2013-07-26 | 13.68 | 15.66 | 0.00 | 15.66 | 15.66 | 15.66 | 0.00 | AMDVX | 0.00 |
2013-07-29 | 13.61 | 15.58 | 0.00 | 15.58 | 15.58 | 15.58 | 0.00 | AMDVX | 0.00 |
2013-07-31 | 13.62 | 15.60 | 0.00 | 15.60 | 15.60 | 15.60 | 0.00 | AMDVX | 0.00 |
2013-08-01 | 13.82 | 15.82 | 0.00 | 15.82 | 15.82 | 15.82 | 0.00 | AMDVX | 0.00 |
2013-08-02 | 13.77 | 15.77 | 0.00 | 15.77 | 15.77 | 15.77 | 0.00 | AMDVX | 0.00 |
# See which Symbols do not have data
dataframe1 = pd.DataFrame(symbols_df["Symbol"])
dataframe2 = pd.DataFrame(pd.unique(all_data_df['Symbol']))
dataframe2.columns = ["Symbol"]
common = dataframe1.merge(dataframe2, on=["Symbol"])
nodata = dataframe1[~dataframe1.Symbol.isin(common.Symbol)]
nodata
Symbol | |
---|---|
13 | VUSXX |
Plot the adjusted prices¶
If the following code generates a "FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import...", you can either ignore it or use this code to explicitly register a datetime converter.
import pandas.plotting._converter as pandacnv
pandacnv.register()
# Create a list of symbols
symbols_lst = pd.unique(all_data_df['Symbol'])
# Get the names of the stocks to be plotted using the symbols_df dataframe
names_lst = list(symbols_df[symbols_df.Symbol.isin(symbols_lst)]["Name"])
# Loop though the data and plot each symbol
numrows = 8 # Arrange the plots in this number of rows
numcols = 2 # Arrange the plots in this number of columns
sym = 0 # symbol counter
fig, axes = plt.subplots(numrows, numcols, figsize=(30,60))
for ro in range(numrows):
for co in range(numcols):
price_df = pd.DataFrame(all_data_df.loc[all_data_df["Symbol"] == symbols_lst[sym]]['Adj Close'])
# Highlight the SPY chart using red.
if symbols_lst[sym] == 'SPY':
axcolor = 'red'
else:
axcolor = 'black'
axes[ro, co].plot(price_df,
color = axcolor)
axes[ro, co].set_title(symbols_lst[sym] + ' - ' + names_lst[sym],
fontsize = 28,
color = axcolor)
axes[ro, co].set_ylabel('Adjusted Close', fontsize = 24,
color = axcolor)
axes[ro, co].tick_params(direction = 'out',
length = 6,
width =2 ,
labelsize = 20,
colors = axcolor)
sym += 1
if sym >= len(symbols_lst):
break
/Users/mitchellfawcett/anaconda3/lib/python3.7/site-packages/pandas/plotting/_converter.py:129: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters. To register the converters: >>> from pandas.plotting import register_matplotlib_converters >>> register_matplotlib_converters() warnings.warn(msg, FutureWarning)
Comparing returns¶
The charts of all the investment alternatives have tended to go up over time, so you would have made money with any of them if you held them long enough. But which ones were best? Examining the charts alone make this hard to determine because they have different time periods and different price ranges.
One way of making comparisons between investments is with their compound annual growth rate. Basically what the CAGR tells you is, what annual percentage rate of growth it would take to get from a particular starting price to an ending price. However, it ignores the degree of "ups and downs" (volatility) the price may have gone through, which is important in assessing its risk.
CAGR is not the only measure of comparison you should be using, but it does provide a useful baseline.
Compound Annual Growth Rate formula¶
CAGR = [(Ending Value / Beginning Value) raised to the power of (1 / Number of Years)] minus 1
Example:
Ending value = 15,000
Beginning value = 10,000
Number of years = 7
[(15000 / 10000) ^ (1/7)] - 1 = 0.059, or 5.9%.
# Get the starting and ending dates (which is the row index) of the price data for each symbol.
# Cite: jezrael
# At https://stackoverflow.com/questions/47115448/pandas-get-index-of-first-and-last-element-by-group
fund_startend_date_df = all_data_df.index.to_series().groupby(all_data_df['Symbol']).agg(['first','last']) # .reset_index()
# Examine results
fund_startend_date_df
first | last | |
---|---|---|
Symbol | ||
AMDVX | 2013-07-26 | 2021-06-09 |
FLCNX | 2017-05-25 | 2021-06-09 |
MDIZX | 2017-10-05 | 2021-06-09 |
PMEGX | 1996-07-30 | 2021-06-09 |
SPY | 1993-01-29 | 2021-06-10 |
TIREX | 2002-09-30 | 2021-06-09 |
VBTIX | 1995-09-18 | 2021-06-09 |
VEIRX | 2001-08-13 | 2021-06-09 |
VEVRX | 2014-03-10 | 2021-06-09 |
VIEIX | 1997-07-07 | 2021-06-09 |
VIIIX | 1997-07-07 | 2021-06-09 |
VIPIX | 2003-12-15 | 2021-06-09 |
VSCIX | 1997-07-07 | 2021-06-09 |
VTSNX | 2010-11-29 | 2021-06-09 |
VWEAX | 2001-11-12 | 2021-06-09 |
WACSX | 2008-08-29 | 2021-06-09 |
# Get the starting and ending price for each symbol
fund_startend_price_df = all_data_df.groupby("Symbol")['Adj Close'].agg(['first', 'last'])
# Examine results
fund_startend_price_df
first | last | |
---|---|---|
Symbol | ||
AMDVX | 13.68 | 20.09 |
FLCNX | 9.90 | 20.75 |
MDIZX | 17.97 | 25.99 |
PMEGX | 9.83 | 76.94 |
SPY | 25.88 | 422.51 |
TIREX | 6.11 | 21.63 |
VBTIX | 3.33 | 11.30 |
VEIRX | 29.01 | 92.72 |
VEVRX | 32.29 | 51.29 |
VIEIX | 21.48 | 140.92 |
VIIIX | 57.99 | 369.57 |
VIPIX | 5.85 | 11.65 |
VSCIX | 16.11 | 108.45 |
VTSNX | 74.28 | 144.06 |
VWEAX | 1.91 | 5.97 |
WACSX | 6.84 | 13.24 |
### Calculate the CAGR
# Ratio of change in price from beginning of data to end.
price_ratio = (fund_startend_price_df['last'] / fund_startend_price_df['first'])
# Number of years and fraction of year each symbol's data covers.
num_years = ((fund_startend_date_df['last'] - fund_startend_date_df['first']) / timedelta(days=365))
# CAGR
cagr_df = pd.DataFrame((price_ratio ** (1 / num_years) - 1.0) * 100, columns = ['CAGR %'])
# (fund_startend_price_df['last'] / fund_startend_price_df['first']) ** (1 / ((fund_startend_date_df['last'] - fund_startend_date_df['first']) / timedelta(days=365))) - 1.0
cagr_df
CAGR % | |
---|---|
Symbol | |
AMDVX | 5.00 |
FLCNX | 20.09 |
MDIZX | 10.55 |
PMEGX | 8.62 |
SPY | 10.34 |
TIREX | 6.99 |
VBTIX | 4.86 |
VEIRX | 6.03 |
VEVRX | 6.58 |
VIEIX | 8.18 |
VIIIX | 8.04 |
VIPIX | 4.02 |
VSCIX | 8.29 |
VTSNX | 6.49 |
VWEAX | 5.98 |
WACSX | 5.30 |
Conclusion¶
The CAGR provides a nice compact comparison of the average annual return each investment had during it's lifetime, with the SPY ETF performing well at 10.34% per year average growth rate.
The FLCNX seems to be a real superstar though, with 20% annual growth. However it has only been around for five years and a look at its top holdings show that it is heavy in high-flying tech stocks like Amazon, Microsoft, Facebook, Google and so on. It's an example of what I would consider a "high risk" investment due to a lack of diversity, but that's as far as I will go with investment opinion.
I must point out that these calculations do not take into account the dividends paid by these investments which, over time, can be a significant factor in their total return.
I hope this has been helpful, so feel free to use whatever code you find here in your own projects.