In this notebok we provide the code to estimate factor exposures (\(\beta\)’s) and risk premia (\(\lambda\)’s) in the Chen, Roll and Ross (1986) model.
Chen, Roll, and Ross (1986) model includes the following five macroeconomic risk factors:
DEI: changes in expected inflation
MP: monthly industrial production
UI: unexpected inlfation
UPR: interest risk premium
UTS: term structure risk premium
The Fama-MacBeth two-step regression approach measures how correctly these risk factors predict asset returns. The aim of the Fama-MacBeth regression is to determine the risk premium associated with the exposure to these risk factors.
The first step is to regress the return of every asset against 5 risk factors using a time-series approach. We obtain the return exposure to each factor called the “betas”.
The second step is to regress the returns of all assets against the asset betas obtained in Step 1 using a cross-section approach. We obtain the risk premium for each factor.
Lastly, Fama and MacBeth assess the expected premium over time for a unit exposure to each risk factor by averaging these coefficients once for each element.
We load stock returns data from WRDS, and create size portfolios from stock returns, and use 20 size portfolios as test assets to estimate the risk premiums in the Chen, Roll and Ross (1986) model.
We fistly check if required packages are installed.
# run this to check if required packages are installedimport subprocessimport sysdef install_packages(packages):""" Ensure that all specified packages are installed. If a package is not installed, it will be installed automatically. Args: packages (list): A list of package names to ensure are installed. Returns: None """for package in packages:try:__import__(package)exceptImportError:print(f"Installing {package}...") subprocess.check_call([sys.executable, "-m", "pip", "install", package])else:print(f"{package} is already installed.")required_packages = ["numpy","scipy","pandas","wrds","datetime","statsmodels","matplotlib","pathlib","pandas_datareader","sklearn.linear_model","warnings","itertools","joblib","certifi",]install_packages(required_packages)
numpy is already installed.
scipy is already installed.
pandas is already installed.
wrds is already installed.
datetime is already installed.
statsmodels is already installed.
matplotlib is already installed.
pathlib is already installed.
pandas_datareader is already installed.
sklearn.linear_model is already installed.
warnings is already installed.
itertools is already installed.
joblib is already installed.
certifi is already installed.
We then import required packages:
### The following packages are for user-defined packages #### load user-defined packagesimport sysfrom pathlib import Pathcurrent_dir = Path().resolve()sys.path.append(str(current_dir))# load user-defined packages: these packages are for Fama-MacBeth regressionsfrom tools.main import clean_test_asset_returnsfrom tools.main import fama_macbeth_timeseries_estimate_betafrom tools.main import fama_macbeth_crosssection_estimate_premiumfrom tools.main import fama_macbeth_crosssection_premium_statfrom tools.main import fama_macbeth_regression# run this to make sure that the modules used are the most updated versionimport importlibimport tools.main importlib.reload(tools.main)### The following packages are generally used #### load package WRDSimport wrds# load package for dataframe operationimport pandas as pdimport numpy as npfrom datetime import datetime# !pip install --upgrade certifiimport sslssl._create_default_https_context = ssl._create_unverified_context# handle warnings in wrdsimport warningswarnings.filterwarnings("ignore")
Then, we enter credentials and connect to WRDS. In the following code, update your user name to connect to WRDS:
# WRDS connection# update this line wrds_username = 'your_username'wrds_username ='eileenbc'try:print("Establishing connection to WRDS database...") params = {'wrds_hostname': wrds.sql.WRDS_POSTGRES_HOST,'wrds_port': wrds.sql.WRDS_POSTGRES_PORT,'wrds_dbname': wrds.sql.WRDS_POSTGRES_DB,'wrds_username': wrds_username,'wrds_connect_args': wrds.sql.WRDS_CONNECT_ARGS, } conn = wrds.Connection(autoconnect=True, **params)print("Successfully connected to WRDS database.")exceptExceptionas e:print(f"Failed to connect to WRDS database: {e}")
Establishing connection to WRDS database...
Loading library list...
Done
Successfully connected to WRDS database.
Now we have connected to WRDS and we can query data tables from WRDS. CRSP is a data table that includes stock returns, and Fama-French factors is a data table that includes risk-free rate, market, size, and value factors.
To load data from CRSP, we firstly set the start date and end date of the sample data. Here, the start date and end date define the start and end dates for test assets in Fama-MacBeth regression.
# define start date and end date for Fama-MacBeth regression# the most recent date for Fama-French risk factors is: 11/01/2024start_date ="01/01/1958"end_date ="12/01/1984"
The following code loads stock returns from CRSP. CRSP contains stock returns on a monthly basis. The stock returns are the total returns including dividends.
# connect to crspcrsp_monthly = conn.raw_sql( f"""SELECT permno, siccd, mthcaldt, mthret, mthretx, mthcap,ticker FROM crsp.msf_v2 as msf WHERE msf.mthcaldt BETWEEN '{start_date}' AND '{end_date}'""",date_cols=['mthcaldt'])# add year and month to crsp monthlycrsp_monthly = (crsp_monthly.assign(year=lambda x: pd.DatetimeIndex(x["mthcaldt"]).year) .assign(month=lambda x: pd.DatetimeIndex(x["mthcaldt"]).month) )crsp_monthly.head(3)
permno
siccd
mthcaldt
mthret
mthretx
mthcap
ticker
year
month
0
10006
3743
1958-01-31
0.137584
0.137584
60045.38
None
1958
1
1
10014
3714
1958-01-31
0.117647
0.117647
3531.63
None
1958
1
2
10022
3420
1958-01-31
0.071429
0.071429
11362.50
None
1958
1
Then we load Fama-French risk factors and merge CRSP with Fama-French risk factors.
# define start date and end date of fama-french factors# the most recent date for Fama-French factors is '11/01/2024'start_date ="01/01/1958"end_date ="12/01/1984"# load ff 3 factors from wrdsff3_factors_monthly = conn.raw_sql(f"""SELECT date, mktrf, smb, hml, rf FROM ff.factors_monthly WHERE date BETWEEN '{start_date}' AND '{end_date}' """, date_cols=['date'])# add year, month, and rename mktrf to mkt_excessff3_factors_monthly = (ff3_factors_monthly.assign(year=lambda x: pd.DatetimeIndex(x["date"]).year) .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month) .rename(columns={"mktrf": "mkt_excess"}))# merge crsp monthly with fama-french factorscrsp_ff3_monthly = ( crsp_monthly.merge(ff3_factors_monthly, how="left", on=['year', 'month']))# calculate excess return for each stockcrsp_ff3_monthly['ret_excess'] = crsp_ff3_monthly['mthret'] - crsp_ff3_monthly['rf']# rename permno to be 'test_id' for Fama-MacBeth regressioncrsp_ff3_monthly.rename(columns={'permno':'test_id'}, inplace =True)# take a look at the data after mergecrsp_ff3_monthly.head(3)
test_id
siccd
mthcaldt
mthret
mthretx
mthcap
ticker
year
month
date
mkt_excess
smb
hml
rf
ret_excess
0
10006
3743
1958-01-31
0.137584
0.137584
60045.38
None
1958
1
1958-01-01
0.0466
0.0439
0.0419
0.0028
0.134784
1
10014
3714
1958-01-31
0.117647
0.117647
3531.63
None
1958
1
1958-01-01
0.0466
0.0439
0.0419
0.0028
0.114847
2
10022
3420
1958-01-31
0.071429
0.071429
11362.50
None
1958
1
1958-01-01
0.0466
0.0439
0.0419
0.0028
0.068629
Now we have merged the stock returns with Fama-French risk factors. Next we implement Fama-MacBeth method to Chen, Roll, and Ross model and test the risk premiums.
Fama-MachBeth regression
This part implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.
Firstly we build a function to create size portfolios based on single stock returns in WRDS.
# this function loads stock returns data and create size portfolios# size portfolios are defined per market capdef create_size_portfolios(returns_data, N):""" returns_data include: - stock id: test_id - market cap: mthcap - return: mthret - date: mthcaldt or date N: the number of portfolios to be created function output: - portfolio returns per date, portfolio (1 - N: from the smallest size to the largest size) """ returns_data = returns_data.sort_values(by=['test_id', 'date'])# create a lag variable of market value returns_data['mthcap_lag'] = returns_data.groupby('test_id')['mthcap'].shift(1) returns_data = returns_data.dropna(subset=['mthcap_lag']) # drop if mthcap_lag is NaN# assign portfolios based on market cap of previous period (month) num_portfolios = N size_labels =range(1, N+1) # create a portolio lables# assign size portfolios based on lag market cap for each date returns_data['portfolio'] = returns_data.groupby('date')['mthcap_lag'].transform(lambda x: pd.qcut(x, q=num_portfolios, labels=size_labels) )# Calculate equally weighted returns for each portfolio, date returns_data['e_w_port_size_ret'] = returns_data.groupby(['date','portfolio'])['mthret'].transform('mean')# Aggregate portfolio returns per date, portfolio portfolio_returns = ( returns_data[['date', 'portfolio', 'e_w_port_size_ret']] .sort_values(by=['date', 'portfolio']) # Sort by date and portfolio .drop_duplicates() # Drop duplicate rows .assign(year=lambda x: pd.DatetimeIndex(x["date"]).year) .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month) .reset_index(drop=True) # Reset index for clean output )return portfolio_returns
For example, if N = 3, we create the size portfolios (small, medium, large) by loading the function.
Next we implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.
We firstly define test assets and risk factors. The test assets are 20 size portfolios, risk factors are Chen, Roll, and Ross (1986) risk factors.
# define test assets - test assets are size portfoliosreturns_data = crsp_ff3_monthlyN =20# you can design the portfolio size here by seting N# define test assets as 20 size portfoliotest_assets = create_size_portfolios(returns_data, N)# merge test assets with Fama-French risk factors and calculate the excess returns for each portfoliotest_assets = test_assets.merge(ff3_factors_monthly.drop(columns=['date']), how="left", on=['year','month'])test_assets['ret_excess'] = test_assets['e_w_port_size_ret'] - test_assets['rf']# format the columns names for test_assets to prepare for Fama-MacBeth regressiontest_assets = test_assets.rename(columns={'portfolio': 'test_id'}) # rename portfolio as test_id# take a look at the test assets (size portfolios)test_assets.head(21)
date
test_id
e_w_port_size_ret
year
month
mkt_excess
smb
hml
rf
ret_excess
0
1958-02-01
1
-0.018120
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.019320
1
1958-02-01
2
-0.020987
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.022187
2
1958-02-01
3
-0.028752
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.029952
3
1958-02-01
4
-0.019040
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.020240
4
1958-02-01
5
-0.021064
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.022264
5
1958-02-01
6
-0.002595
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.003795
6
1958-02-01
7
0.001753
1958
2
-0.0152
0.0065
0.0033
0.0012
0.000553
7
1958-02-01
8
-0.011303
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.012503
8
1958-02-01
9
-0.005352
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.006552
9
1958-02-01
10
-0.020944
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.022144
10
1958-02-01
11
-0.006328
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.007528
11
1958-02-01
12
-0.007115
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.008315
12
1958-02-01
13
-0.013130
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.014330
13
1958-02-01
14
-0.006343
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.007543
14
1958-02-01
15
-0.015301
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.016501
15
1958-02-01
16
-0.004638
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.005838
16
1958-02-01
17
-0.013799
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.014999
17
1958-02-01
18
0.004682
1958
2
-0.0152
0.0065
0.0033
0.0012
0.003482
18
1958-02-01
19
-0.011304
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.012504
19
1958-02-01
20
-0.019989
1958
2
-0.0152
0.0065
0.0033
0.0012
-0.021189
20
1958-03-01
1
0.035002
1958
3
0.0327
0.0065
-0.0097
0.0009
0.034102
We load CRR1986 risk factors.
# load CRR1986 risk factorscsv_usrec ='https://raw.githubusercontent.com/lorenzogarlappi/COMM475/refs/heads/main/Data/replicated_UI_DEI_MP_UPR_UTS.csv'crr_data = pd.read_csv(csv_usrec)# format risk factorsrisk_factors = crr_data[['date','UI','DEI','MP','UPR','UTS']]risk_factors = risk_factors.dropna()risk_factors = risk_factors.reset_index(drop =True)# take a look at the risk factorsrisk_factors.head()
date
UI
DEI
MP
UPR
UTS
0
1953-04-01
0.002071
0.000537
0.004143
0.0042
0.0072
1
1953-05-01
-0.000342
-0.000008
0.005492
0.0044
0.0070
2
1953-06-01
0.001910
-0.000006
-0.004120
0.0046
0.0076
3
1953-07-01
0.000044
0.000290
0.012305
0.0058
0.0074
4
1953-08-01
0.001244
0.000111
-0.005447
0.0061
0.0082
Step 0. Clean test asset monthly returns
Before Fama-MacBeth regression, we load function ‘clean_test_asset_returns(test_assets,risk_factors)’. This function cleans test asset returns data and merges test asset returns with risk factors.
For more inforamtion about the function, please check the function in folder ‘tools’.
# clean_test_asset_returns function load test assets, risk factors, and return cleaned test asset returns.returns_monthly = clean_test_asset_returns(test_assets,risk_factors)returns_monthly
test_id
date_yyyymm
ret_excess
first_date
last_date
date
UI
DEI
MP
UPR
UTS
0
1
195802
-0.019320
1958-02-01
1984-11-01
1958-02-01
0.000338
-0.000196
-0.021560
0.0107
0.0132
1
1
195803
0.034102
1958-02-01
1984-11-01
1958-03-01
0.004348
0.000852
-0.012338
0.0105
0.0145
2
1
195804
0.063931
1958-02-01
1984-11-01
1958-04-01
0.000012
0.000247
-0.016693
0.0107
0.0172
3
1
195805
0.051332
1958-02-01
1984-11-01
1958-05-01
-0.002657
-0.001394
0.009769
0.0105
0.0180
4
1
195806
0.060711
1958-02-01
1984-11-01
1958-06-01
-0.002301
-0.000234
0.026049
0.0098
0.0200
...
...
...
...
...
...
...
...
...
...
...
...
6435
20
198407
-0.025846
1958-02-01
1984-11-01
1984-07-01
-0.000376
-0.000884
0.002778
0.0171
0.0133
6436
20
198408
0.105259
1958-02-01
1984-11-01
1984-08-01
-0.000464
0.000219
0.001204
0.0176
0.0089
6437
20
198409
-0.011635
1958-02-01
1984-11-01
1984-09-01
-0.000691
0.001311
-0.002801
0.0169
0.0084
6438
20
198410
-0.000777
1958-02-01
1984-11-01
1984-10-01
-0.001059
-0.002675
-0.000454
0.0131
0.0114
6439
20
198411
-0.013594
1958-02-01
1984-11-01
1984-11-01
-0.000296
-0.000485
0.003336
0.0119
0.0184
6440 rows × 11 columns
Step 1. Fama-MacBeth time-series analysis of returns to estimate beta
For each test asset portfolio \(i\) (test id), We need to run rolling multiplde regressions (one for each of the test asset \(i\) (portfolio return) ), each with 60 observations (window size defined). This gives estimates of \(\beta_{j i}\) for \(j=1,5\) and \(i=1, 20\) for each month.
The Rolling Ordinary Least Squares (RollOLS) regression is used to estimate beta by applying OLS over the past 60 months.
We load function ‘fama_macbeth_timeseries_estimate_beta(returns_monthly,risk_factors)’ to estimate the monthly beta for each risk factor. For more details about the function, please check the function defined in ‘tools’ folder.
Cross sectional regressions wll give us estimates of market prices of risk for each month: \(\lambda_{j}, j=1,5\).
Next we load function ‘fama_macbeth_crosssection_estimate_premium(test_assets,risk_factors,beta_monthly)’ to estimate the monthly risk premium for each risk factor.
Step 3. Aggregate risk premium timeseries and calculate t-statistics for each risk factor
For each risk factor, we calculate the t-statistics, t-statistics (Newey-West) for the risk premiums.
t-statistics (Newey-West) is the t-statistics calculated with the Newey and West (1987) standard errors. Newey-West standard errors adjust for heteroskedasticity and autocorrelation in time-series regression models. This is commonly used in asset pricing models when dealing with serially correlated resicuals.
The t-statistics indicate whether the risk factor is statistically significant to predict the returns of the test assets.
Next we load function ‘fama_macbeth_crosssection_premium_stat(risk_premiums)’ to test the significance of risk premiums.
We can see that when using 20 size portfolios as test assets, the risk factors in Chen, Roll, and Ross (1986) model are not statistically signifcant. In another word, the predictation power of Chen, Roll, and Ross (1986) model is very limited when predicting stock returns.
The estimation difference from the results in the paper could also be driven by the fact that we did not fully replicate the Chen, Roll, and Ross (1986) risk factor data.
Fama-MacBeth regression results are sensitive to test assets, the test asset period used. We can change the test asset or test asset period to examine the risk premium of Chen, Roll, and Ross model. For example, we can use Fama-French industry portfolios or single stock returns to test the risk premiums.
In the following section, we consolidate the steps in Fama-MacBeth regressions into one function. The function loads risk factors and test assets; the function outputs are the magnitudes and significance of price of risk (or risk premium).
We can load function ‘fama_macbeth_regression(test_assets,risk_factors)’ and report the significance of price of risk.