The Chen, Roll, and Ross (1986) model

In this notebok we provide the code to estimate factor exposures (\(\beta\)’s) and risk premia (\(\lambda\)’s) in the Chen, Roll and Ross (1986) model.

Chen, Roll, and Ross (1986) model includes the following five macroeconomic risk factors:

DEI: changes in expected inflation
MP: monthly industrial production
UI: unexpected inlfation
UPR: interest risk premium
UTS: term structure risk premium

The Fama-MacBeth two-step regression approach measures how correctly these risk factors predict asset returns. The aim of the Fama-MacBeth regression is to determine the risk premium associated with the exposure to these risk factors.

The first step is to regress the return of every asset against 5 risk factors using a time-series approach. We obtain the return exposure to each factor called the “betas”.
The second step is to regress the returns of all assets against the asset betas obtained in Step 1 using a cross-section approach. We obtain the risk premium for each factor.
Lastly, Fama and MacBeth assess the expected premium over time for a unit exposure to each risk factor by averaging these coefficients once for each element.

We load stock returns data from WRDS, and create size portfolios from stock returns, and use 20 size portfolios as test assets to estimate the risk premiums in the Chen, Roll and Ross (1986) model.

We fistly check if required packages are installed.

# run this to check if required packages are installed
import subprocess
import sys

def install_packages(packages):
    """
    Ensure that all specified packages are installed. If a package is not installed,
    it will be installed automatically.

    Args:
        packages (list): A list of package names to ensure are installed.

    Returns:
        None
    """
    for package in packages:
        try:
            __import__(package)
        except ImportError:
            print(f"Installing {package}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        else:
            print(f"{package} is already installed.")

required_packages = [
    "numpy",
    "scipy",
    "pandas",
    "wrds",
    "datetime",
    "statsmodels",
    "matplotlib",
    "pathlib",
    "pandas_datareader",
    "sklearn.linear_model",
    "warnings",
    "itertools",
    "joblib",
    "certifi",
    
]
install_packages(required_packages)

numpy is already installed.
scipy is already installed.
pandas is already installed.
wrds is already installed.
datetime is already installed.
statsmodels is already installed.
matplotlib is already installed.
pathlib is already installed.
pandas_datareader is already installed.
sklearn.linear_model is already installed.
warnings is already installed.
itertools is already installed.
joblib is already installed.
certifi is already installed.

We then import required packages:

### The following packages are for user-defined packages ###
# load user-defined packages

import sys
from pathlib import Path
current_dir = Path().resolve()
sys.path.append(str(current_dir))

# load user-defined packages: these packages are for Fama-MacBeth regressions
from tools.main import clean_test_asset_returns
from tools.main import fama_macbeth_timeseries_estimate_beta
from tools.main import fama_macbeth_crosssection_estimate_premium
from tools.main import fama_macbeth_crosssection_premium_stat
from tools.main import fama_macbeth_regression

# run this to make sure that the modules used are the most updated version
import importlib
import tools.main 
importlib.reload(tools.main)

### The following packages are generally used ###

# load package WRDS
import wrds

# load package for dataframe operation
import pandas as pd
import numpy as np
from datetime import datetime

# !pip install --upgrade certifi
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# handle warnings in wrds
import warnings
warnings.filterwarnings("ignore")

Then, we enter credentials and connect to WRDS. In the following code, update your user name to connect to WRDS:

# WRDS connection
# update this line wrds_username = 'your_username'
wrds_username = 'eileenbc'

try:
    print("Establishing connection to WRDS database...")
    params = {
        'wrds_hostname': wrds.sql.WRDS_POSTGRES_HOST,
        'wrds_port': wrds.sql.WRDS_POSTGRES_PORT,
        'wrds_dbname': wrds.sql.WRDS_POSTGRES_DB,
        'wrds_username': wrds_username,
        'wrds_connect_args': wrds.sql.WRDS_CONNECT_ARGS,
    }

    conn = wrds.Connection(autoconnect=True, **params)
    print("Successfully connected to WRDS database.")
except Exception as e:
    print(f"Failed to connect to WRDS database: {e}")

Establishing connection to WRDS database...
Loading library list...
Done
Successfully connected to WRDS database.

Now we have connected to WRDS and we can query data tables from WRDS. CRSP is a data table that includes stock returns, and Fama-French factors is a data table that includes risk-free rate, market, size, and value factors.

To load data from CRSP, we firstly set the start date and end date of the sample data. Here, the start date and end date define the start and end dates for test assets in Fama-MacBeth regression.

# define start date and end date for Fama-MacBeth regression
# the most recent date for Fama-French risk factors is: 11/01/2024

start_date = "01/01/1958"
end_date = "12/01/1984"

The following code loads stock returns from CRSP. CRSP contains stock returns on a monthly basis. The stock returns are the total returns including dividends.

# connect to crsp
crsp_monthly = conn.raw_sql( f"""SELECT permno, siccd, mthcaldt, mthret, mthretx, mthcap,ticker
       FROM crsp.msf_v2 as msf
       WHERE msf.mthcaldt BETWEEN '{start_date}' AND '{end_date}'""",date_cols=['mthcaldt'])

# add year and month to crsp monthly
crsp_monthly = (crsp_monthly.assign(year=lambda x: pd.DatetimeIndex(x["mthcaldt"]).year)
                            .assign(month=lambda x: pd.DatetimeIndex(x["mthcaldt"]).month)
                            )

crsp_monthly.head(3)

	permno	siccd	mthcaldt	mthret	mthretx	mthcap	ticker	year	month
0	10006	3743	1958-01-31	0.137584	0.137584	60045.38	None	1958	1
1	10014	3714	1958-01-31	0.117647	0.117647	3531.63	None	1958	1
2	10022	3420	1958-01-31	0.071429	0.071429	11362.50	None	1958	1

Then we load Fama-French risk factors and merge CRSP with Fama-French risk factors.

# define start date and end date of fama-french factors

# the most recent date for Fama-French factors is '11/01/2024'
start_date = "01/01/1958"
end_date = "12/01/1984"


# load ff 3 factors from wrds
ff3_factors_monthly = conn.raw_sql(
    f"""SELECT date, mktrf, smb, hml, rf
    FROM ff.factors_monthly WHERE date BETWEEN '{start_date}' AND '{end_date}'
    """,
    date_cols=['date']
)

# add year, month, and rename mktrf to mkt_excess
ff3_factors_monthly = (ff3_factors_monthly.assign(year=lambda x: pd.DatetimeIndex(x["date"]).year)
                                          .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month)
                                          .rename(columns={"mktrf": "mkt_excess"}))


# merge crsp monthly with fama-french factors
crsp_ff3_monthly = (
    crsp_monthly.merge(ff3_factors_monthly, how="left", on=['year', 'month']))

# calculate excess return for each stock
crsp_ff3_monthly['ret_excess'] = crsp_ff3_monthly['mthret'] - crsp_ff3_monthly['rf']

# rename permno to be 'test_id' for Fama-MacBeth regression
crsp_ff3_monthly.rename(columns={'permno':'test_id'}, inplace = True)

# take a look at the data after merge
crsp_ff3_monthly.head(3)

	test_id	siccd	mthcaldt	mthret	mthretx	mthcap	ticker	year	month	date	mkt_excess	smb	hml	rf	ret_excess
0	10006	3743	1958-01-31	0.137584	0.137584	60045.38	None	1958	1	1958-01-01	0.0466	0.0439	0.0419	0.0028	0.134784
1	10014	3714	1958-01-31	0.117647	0.117647	3531.63	None	1958	1	1958-01-01	0.0466	0.0439	0.0419	0.0028	0.114847
2	10022	3420	1958-01-31	0.071429	0.071429	11362.50	None	1958	1	1958-01-01	0.0466	0.0439	0.0419	0.0028	0.068629

Now we have merged the stock returns with Fama-French risk factors. Next we implement Fama-MacBeth method to Chen, Roll, and Ross model and test the risk premiums.

Fama-MachBeth regression

This part implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.

Firstly we build a function to create size portfolios based on single stock returns in WRDS.

# this function loads stock returns data and create size portfolios
# size portfolios are defined per market cap

def create_size_portfolios(returns_data, N):

    """
    returns_data include:
    - stock id: test_id
    - market cap: mthcap
    - return: mthret
    - date: mthcaldt or date

    N: the number of portfolios to be created

    function output:
    - portfolio returns per date, portfolio
    (1 - N: from the smallest size to the largest size)
    """

    returns_data = returns_data.sort_values(by=['test_id', 'date'])

    # create a lag variable of market value
    returns_data['mthcap_lag'] = returns_data.groupby('test_id')['mthcap'].shift(1)
    returns_data = returns_data.dropna(subset=['mthcap_lag']) # drop if mthcap_lag is NaN

    # assign portfolios based on market cap of previous period (month)
    num_portfolios = N
    size_labels = range(1, N+1) # create a portolio lables

    # assign size portfolios based on lag market cap for each date
    returns_data['portfolio'] = returns_data.groupby('date')['mthcap_lag'].transform(
      lambda x: pd.qcut(x, q=num_portfolios, labels=size_labels)
    )

    # Calculate equally weighted returns for each portfolio, date
    returns_data['e_w_port_size_ret'] = returns_data.groupby(['date','portfolio'])['mthret'].transform('mean')

    # Aggregate portfolio returns per date, portfolio
    portfolio_returns = (
    returns_data[['date', 'portfolio', 'e_w_port_size_ret']]
    .sort_values(by=['date', 'portfolio'])  # Sort by date and portfolio
    .drop_duplicates()  # Drop duplicate rows
    .assign(year=lambda x: pd.DatetimeIndex(x["date"]).year)
    .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month)
    .reset_index(drop=True)  # Reset index for clean output
    )

    return portfolio_returns

For example, if N = 3, we create the size portfolios (small, medium, large) by loading the function.

returns_data = crsp_ff3_monthly
N = 3
portfolio_returns = create_size_portfolios(returns_data, N)

portfolio_returns

	date	portfolio	e_w_port_size_ret	year	month
0	1958-02-01	1	-0.016964	1958	2
1	1958-02-01	2	-0.008845	1958	2
2	1958-02-01	3	-0.010172	1958	2
3	1958-03-01	1	0.033966	1958	3
4	1958-03-01	2	0.035722	1958	3
...	...	...	...	...	...
961	1984-10-01	2	-0.022795	1984	10
962	1984-10-01	3	-0.007181	1984	10
963	1984-11-01	1	-0.048714	1984	11
964	1984-11-01	2	-0.032826	1984	11
965	1984-11-01	3	-0.012504	1984	11

966 rows × 5 columns

Next we implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.

We firstly define test assets and risk factors. The test assets are 20 size portfolios, risk factors are Chen, Roll, and Ross (1986) risk factors.

# define test assets - test assets are size portfolios
returns_data = crsp_ff3_monthly
N = 20 # you can design the portfolio size here by seting N

# define test assets as 20 size portfolio
test_assets = create_size_portfolios(returns_data, N)

# merge test assets with Fama-French risk factors and calculate the excess returns for each portfolio
test_assets = test_assets.merge(ff3_factors_monthly.drop(columns=['date']), how="left", on=['year','month'])
test_assets['ret_excess'] = test_assets['e_w_port_size_ret'] - test_assets['rf']

# format the columns names for test_assets to prepare for Fama-MacBeth regression
test_assets = test_assets.rename(columns={'portfolio': 'test_id'}) # rename portfolio as test_id

# take a look at the test assets (size portfolios)
test_assets.head(21)

	date	test_id	e_w_port_size_ret	year	month	mkt_excess	smb	hml	rf	ret_excess
0	1958-02-01	1	-0.018120	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.019320
1	1958-02-01	2	-0.020987	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.022187
2	1958-02-01	3	-0.028752	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.029952
3	1958-02-01	4	-0.019040	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.020240
4	1958-02-01	5	-0.021064	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.022264
5	1958-02-01	6	-0.002595	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.003795
6	1958-02-01	7	0.001753	1958	2	-0.0152	0.0065	0.0033	0.0012	0.000553
7	1958-02-01	8	-0.011303	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.012503
8	1958-02-01	9	-0.005352	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.006552
9	1958-02-01	10	-0.020944	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.022144
10	1958-02-01	11	-0.006328	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.007528
11	1958-02-01	12	-0.007115	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.008315
12	1958-02-01	13	-0.013130	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.014330
13	1958-02-01	14	-0.006343	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.007543
14	1958-02-01	15	-0.015301	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.016501
15	1958-02-01	16	-0.004638	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.005838
16	1958-02-01	17	-0.013799	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.014999
17	1958-02-01	18	0.004682	1958	2	-0.0152	0.0065	0.0033	0.0012	0.003482
18	1958-02-01	19	-0.011304	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.012504
19	1958-02-01	20	-0.019989	1958	2	-0.0152	0.0065	0.0033	0.0012	-0.021189
20	1958-03-01	1	0.035002	1958	3	0.0327	0.0065	-0.0097	0.0009	0.034102

We load CRR1986 risk factors.

# load CRR1986 risk factors

csv_usrec = 'https://raw.githubusercontent.com/lorenzogarlappi/COMM475/refs/heads/main/Data/replicated_UI_DEI_MP_UPR_UTS.csv'
crr_data = pd.read_csv(csv_usrec)

# format risk factors
risk_factors = crr_data[['date','UI','DEI','MP','UPR','UTS']]
risk_factors = risk_factors.dropna()
risk_factors = risk_factors.reset_index(drop = True)

# take a look at the risk factors
risk_factors.head()

	date	UI	DEI	MP	UPR	UTS
0	1953-04-01	0.002071	0.000537	0.004143	0.0042	0.0072
1	1953-05-01	-0.000342	-0.000008	0.005492	0.0044	0.0070
2	1953-06-01	0.001910	-0.000006	-0.004120	0.0046	0.0076
3	1953-07-01	0.000044	0.000290	0.012305	0.0058	0.0074
4	1953-08-01	0.001244	0.000111	-0.005447	0.0061	0.0082

Step 0. Clean test asset monthly returns

Before Fama-MacBeth regression, we load function ‘clean_test_asset_returns(test_assets,risk_factors)’. This function cleans test asset returns data and merges test asset returns with risk factors.

For more inforamtion about the function, please check the function in folder ‘tools’.

# clean_test_asset_returns function load test assets, risk factors, and return cleaned test asset returns.

returns_monthly = clean_test_asset_returns(test_assets,risk_factors)
returns_monthly

	test_id	date_yyyymm	ret_excess	first_date	last_date	date	UI	DEI	MP	UPR	UTS
0	1	195802	-0.019320	1958-02-01	1984-11-01	1958-02-01	0.000338	-0.000196	-0.021560	0.0107	0.0132
1	1	195803	0.034102	1958-02-01	1984-11-01	1958-03-01	0.004348	0.000852	-0.012338	0.0105	0.0145
2	1	195804	0.063931	1958-02-01	1984-11-01	1958-04-01	0.000012	0.000247	-0.016693	0.0107	0.0172
3	1	195805	0.051332	1958-02-01	1984-11-01	1958-05-01	-0.002657	-0.001394	0.009769	0.0105	0.0180
4	1	195806	0.060711	1958-02-01	1984-11-01	1958-06-01	-0.002301	-0.000234	0.026049	0.0098	0.0200
...	...	...	...	...	...	...	...	...	...	...	...
6435	20	198407	-0.025846	1958-02-01	1984-11-01	1984-07-01	-0.000376	-0.000884	0.002778	0.0171	0.0133
6436	20	198408	0.105259	1958-02-01	1984-11-01	1984-08-01	-0.000464	0.000219	0.001204	0.0176	0.0089
6437	20	198409	-0.011635	1958-02-01	1984-11-01	1984-09-01	-0.000691	0.001311	-0.002801	0.0169	0.0084
6438	20	198410	-0.000777	1958-02-01	1984-11-01	1984-10-01	-0.001059	-0.002675	-0.000454	0.0131	0.0114
6439	20	198411	-0.013594	1958-02-01	1984-11-01	1984-11-01	-0.000296	-0.000485	0.003336	0.0119	0.0184

6440 rows × 11 columns

Step 1. Fama-MacBeth time-series analysis of returns to estimate beta

\[ r_{i,t}=\alpha_{i}+I_{1,t} \beta_{1 i}+I_{2,t} \beta_{2 i}+I_{3,t} \beta_{3 i}+I_{4,t} \beta_{4 i}+I_{5,t} \beta_{5 i} + e_{i,t} \]

For each test asset portfolio \(i\) (test id), We need to run rolling multiplde regressions (one for each of the test asset \(i\) (portfolio return) ), each with 60 observations (window size defined). This gives estimates of \(\beta_{j i}\) for \(j=1,5\) and \(i=1, 20\) for each month.

The Rolling Ordinary Least Squares (RollOLS) regression is used to estimate beta by applying OLS over the past 60 months.

We load function ‘fama_macbeth_timeseries_estimate_beta(returns_monthly,risk_factors)’ to estimate the monthly beta for each risk factor. For more details about the function, please check the function defined in ‘tools’ folder.

beta_monthly = fama_macbeth_timeseries_estimate_beta(returns_monthly,risk_factors)
beta_monthly

	Intercept	beta_UI	beta_DEI	beta_MP	beta_UPR	beta_UTS	date	test_id	date_yyyymm
59	-0.075261	1.797498	-6.539273	0.992254	9.761025	1.372954	1963-01-01	1	196301
60	-0.098082	1.843343	-6.525231	0.897680	13.078850	1.200961	1963-02-01	1	196302
61	-0.111297	2.743554	-7.638911	0.892061	14.945628	1.085239	1963-03-01	1	196303
62	-0.112661	3.216630	-7.638812	0.916302	15.120569	0.962282	1963-04-01	1	196304
63	-0.123833	3.603865	-9.394386	0.935116	16.654864	1.070936	1963-05-01	1	196305
...	...	...	...	...	...	...	...	...	...
6435	-0.030460	-2.099438	2.992127	-0.434015	1.738623	0.826883	1984-07-01	20	198407
6436	-0.041946	-2.505150	3.416563	-0.222954	2.379998	0.909552	1984-08-01	20	198408
6437	-0.047187	-2.316854	2.945187	-0.134198	2.622746	0.918623	1984-09-01	20	198409
6438	-0.041781	-2.217842	2.707105	-0.114398	2.382481	0.833663	1984-10-01	20	198410
6439	-0.056195	-2.272755	2.782742	0.035737	3.069251	0.952451	1984-11-01	20	198411

5260 rows × 9 columns

Step 2. Fama-MacBeth cross-sectional analysis of returns

For each month, do a cross-sectional regression across all test ids:

\[ \operatorname{E}\left[r_{i}\right]=\mu+\lambda_{1} \beta_{1 i}+\lambda_{2} \beta_{2 i}+\lambda_{3} \beta_{3 i}+\lambda_{4} \beta_{4 i}+\lambda_{5} \beta_{5 i}+e_{s} \]

Cross sectional regressions wll give us estimates of market prices of risk for each month: \(\lambda_{j}, j=1,5\).

Next we load function ‘fama_macbeth_crosssection_estimate_premium(test_assets,risk_factors,beta_monthly)’ to estimate the monthly risk premium for each risk factor.

risk_premiums = fama_macbeth_crosssection_estimate_premium(test_assets,risk_factors,beta_monthly)
risk_premiums

	date_yyyymm	Intercept	UI	DEI	MP	UPR	UTS
0	196301	-0.033272	-0.002771	-0.001644	-0.013846	0.002604	0.006900
1	196302	0.034729	0.004816	0.000931	-0.057264	-0.000017	-0.003181
2	196303	0.086848	0.008846	0.003620	-0.032144	-0.004417	-0.015324
3	196304	0.018301	-0.003413	0.000265	0.002815	0.001272	0.005437
4	196305	-0.003543	-0.001203	-0.000365	0.045375	-0.002298	-0.012044
...	...	...	...	...	...	...	...
257	198406	0.029005	-0.000091	-0.009527	0.008326	-0.009091	-0.021303
258	198407	0.139352	-0.003935	-0.010575	-0.021792	0.000448	-0.022557
259	198408	0.008082	-0.001393	0.000114	0.012712	-0.007661	0.008807
260	198409	0.047249	0.005477	-0.002527	-0.022399	-0.006090	-0.018988
261	198410	0.023859	-0.004460	-0.009379	0.008274	-0.006360	-0.004589

262 rows × 7 columns

Step 3. Aggregate risk premium timeseries and calculate t-statistics for each risk factor

For each risk factor, we calculate the t-statistics, t-statistics (Newey-West) for the risk premiums.

t-statistics (Newey-West) is the t-statistics calculated with the Newey and West (1987) standard errors. Newey-West standard errors adjust for heteroskedasticity and autocorrelation in time-series regression models. This is commonly used in asset pricing models when dealing with serially correlated resicuals.

The t-statistics indicate whether the risk factor is statistically significant to predict the returns of the test assets.

Next we load function ‘fama_macbeth_crosssection_premium_stat(risk_premiums)’ to test the significance of risk premiums.

price_of_risk = fama_macbeth_crosssection_premium_stat(risk_premiums)
price_of_risk

	factor	risk_premium	t_statistic	t_statistic_newey_west
0	Intercept	0.094	0.229	0.199
1	DEI	-0.020	-1.158	-0.928
2	MP	-0.028	-0.150	-0.166
3	UI	0.058	1.603	1.347
4	UPR	0.084	1.553	1.194
5	UTS	0.262	1.603	1.310

We can see that when using 20 size portfolios as test assets, the risk factors in Chen, Roll, and Ross (1986) model are not statistically signifcant. In another word, the predictation power of Chen, Roll, and Ross (1986) model is very limited when predicting stock returns.

The estimation difference from the results in the paper could also be driven by the fact that we did not fully replicate the Chen, Roll, and Ross (1986) risk factor data.

Fama-MacBeth regression results are sensitive to test assets, the test asset period used. We can change the test asset or test asset period to examine the risk premium of Chen, Roll, and Ross model. For example, we can use Fama-French industry portfolios or single stock returns to test the risk premiums.

In the following section, we consolidate the steps in Fama-MacBeth regressions into one function. The function loads risk factors and test assets; the function outputs are the magnitudes and significance of price of risk (or risk premium).

We can load function ‘fama_macbeth_regression(test_assets,risk_factors)’ and report the significance of price of risk.

price_of_risk = fama_macbeth_regression(test_assets,risk_factors)
price_of_risk

	factor	risk_premium	t_statistic	t_statistic_newey_west
0	Intercept	0.094	0.229	0.199
1	DEI	-0.020	-1.158	-0.928
2	MP	-0.028	-0.150	-0.166
3	UI	0.058	1.603	1.347
4	UPR	0.084	1.553	1.194
5	UTS	0.262	1.603	1.310