The Chen, Roll, and Ross (1986) model

Instructor: Lorenzo Garlappi © 2025*

In this notebok we provide the code to estimate factor exposures (\(\beta\)’s) and risk premia (\(\lambda\)’s) in the Chen, Roll and Ross (1986) model.

Chen, Roll, and Ross (1986) model includes the following five macroeconomic risk factors:

The Fama-MacBeth two-step regression approach measures how correctly these risk factors predict asset returns. The aim of the Fama-MacBeth regression is to determine the risk premium associated with the exposure to these risk factors.

We load stock returns data from WRDS, and create size portfolios from stock returns, and use 20 size portfolios as test assets to estimate the risk premiums in the Chen, Roll and Ross (1986) model.

We fistly check if required packages are installed.

# run this to check if required packages are installed
import subprocess
import sys

def install_packages(packages):
    """
    Ensure that all specified packages are installed. If a package is not installed,
    it will be installed automatically.

    Args:
        packages (list): A list of package names to ensure are installed.

    Returns:
        None
    """
    for package in packages:
        try:
            __import__(package)
        except ImportError:
            print(f"Installing {package}...")
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        else:
            print(f"{package} is already installed.")

required_packages = [
    "numpy",
    "scipy",
    "pandas",
    "wrds",
    "datetime",
    "statsmodels",
    "matplotlib",
    "pathlib",
    "pandas_datareader",
    "sklearn.linear_model",
    "warnings",
    "itertools",
    "joblib",
    "certifi",
    
]
install_packages(required_packages)
numpy is already installed.
scipy is already installed.
pandas is already installed.
wrds is already installed.
datetime is already installed.
statsmodels is already installed.
matplotlib is already installed.
pathlib is already installed.
pandas_datareader is already installed.
sklearn.linear_model is already installed.
warnings is already installed.
itertools is already installed.
joblib is already installed.
certifi is already installed.

We then import required packages:

### The following packages are for user-defined packages ###
# load user-defined packages

import sys
from pathlib import Path
current_dir = Path().resolve()
sys.path.append(str(current_dir))

# load user-defined packages: these packages are for Fama-MacBeth regressions
from tools.main import clean_test_asset_returns
from tools.main import fama_macbeth_timeseries_estimate_beta
from tools.main import fama_macbeth_crosssection_estimate_premium
from tools.main import fama_macbeth_crosssection_premium_stat
from tools.main import fama_macbeth_regression

# run this to make sure that the modules used are the most updated version
import importlib
import tools.main 
importlib.reload(tools.main)

### The following packages are generally used ###

# load package WRDS
import wrds

# load package for dataframe operation
import pandas as pd
import numpy as np
from datetime import datetime

# !pip install --upgrade certifi
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# handle warnings in wrds
import warnings
warnings.filterwarnings("ignore")

Then, we enter credentials and connect to WRDS. In the following code, update your user name to connect to WRDS:

# WRDS connection
# update this line wrds_username = 'your_username'
wrds_username = 'eileenbc'

try:
    print("Establishing connection to WRDS database...")
    params = {
        'wrds_hostname': wrds.sql.WRDS_POSTGRES_HOST,
        'wrds_port': wrds.sql.WRDS_POSTGRES_PORT,
        'wrds_dbname': wrds.sql.WRDS_POSTGRES_DB,
        'wrds_username': wrds_username,
        'wrds_connect_args': wrds.sql.WRDS_CONNECT_ARGS,
    }

    conn = wrds.Connection(autoconnect=True, **params)
    print("Successfully connected to WRDS database.")
except Exception as e:
    print(f"Failed to connect to WRDS database: {e}")
Establishing connection to WRDS database...
Loading library list...
Done
Successfully connected to WRDS database.

Now we have connected to WRDS and we can query data tables from WRDS. CRSP is a data table that includes stock returns, and Fama-French factors is a data table that includes risk-free rate, market, size, and value factors.

To load data from CRSP, we firstly set the start date and end date of the sample data. Here, the start date and end date define the start and end dates for test assets in Fama-MacBeth regression.

# define start date and end date for Fama-MacBeth regression
# the most recent date for Fama-French risk factors is: 11/01/2024

start_date = "01/01/1958"
end_date = "12/01/1984"

The following code loads stock returns from CRSP. CRSP contains stock returns on a monthly basis. The stock returns are the total returns including dividends.

# connect to crsp
crsp_monthly = conn.raw_sql( f"""SELECT permno, siccd, mthcaldt, mthret, mthretx, mthcap,ticker
       FROM crsp.msf_v2 as msf
       WHERE msf.mthcaldt BETWEEN '{start_date}' AND '{end_date}'""",date_cols=['mthcaldt'])

# add year and month to crsp monthly
crsp_monthly = (crsp_monthly.assign(year=lambda x: pd.DatetimeIndex(x["mthcaldt"]).year)
                            .assign(month=lambda x: pd.DatetimeIndex(x["mthcaldt"]).month)
                            )

crsp_monthly.head(3)
permno siccd mthcaldt mthret mthretx mthcap ticker year month
0 10006 3743 1958-01-31 0.137584 0.137584 60045.38 None 1958 1
1 10014 3714 1958-01-31 0.117647 0.117647 3531.63 None 1958 1
2 10022 3420 1958-01-31 0.071429 0.071429 11362.50 None 1958 1

Then we load Fama-French risk factors and merge CRSP with Fama-French risk factors.

# define start date and end date of fama-french factors

# the most recent date for Fama-French factors is '11/01/2024'
start_date = "01/01/1958"
end_date = "12/01/1984"


# load ff 3 factors from wrds
ff3_factors_monthly = conn.raw_sql(
    f"""SELECT date, mktrf, smb, hml, rf
    FROM ff.factors_monthly WHERE date BETWEEN '{start_date}' AND '{end_date}'
    """,
    date_cols=['date']
)

# add year, month, and rename mktrf to mkt_excess
ff3_factors_monthly = (ff3_factors_monthly.assign(year=lambda x: pd.DatetimeIndex(x["date"]).year)
                                          .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month)
                                          .rename(columns={"mktrf": "mkt_excess"}))


# merge crsp monthly with fama-french factors
crsp_ff3_monthly = (
    crsp_monthly.merge(ff3_factors_monthly, how="left", on=['year', 'month']))

# calculate excess return for each stock
crsp_ff3_monthly['ret_excess'] = crsp_ff3_monthly['mthret'] - crsp_ff3_monthly['rf']

# rename permno to be 'test_id' for Fama-MacBeth regression
crsp_ff3_monthly.rename(columns={'permno':'test_id'}, inplace = True)

# take a look at the data after merge
crsp_ff3_monthly.head(3)
test_id siccd mthcaldt mthret mthretx mthcap ticker year month date mkt_excess smb hml rf ret_excess
0 10006 3743 1958-01-31 0.137584 0.137584 60045.38 None 1958 1 1958-01-01 0.0466 0.0439 0.0419 0.0028 0.134784
1 10014 3714 1958-01-31 0.117647 0.117647 3531.63 None 1958 1 1958-01-01 0.0466 0.0439 0.0419 0.0028 0.114847
2 10022 3420 1958-01-31 0.071429 0.071429 11362.50 None 1958 1 1958-01-01 0.0466 0.0439 0.0419 0.0028 0.068629

Now we have merged the stock returns with Fama-French risk factors. Next we implement Fama-MacBeth method to Chen, Roll, and Ross model and test the risk premiums.

Fama-MachBeth regression

This part implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.

Firstly we build a function to create size portfolios based on single stock returns in WRDS.

# this function loads stock returns data and create size portfolios
# size portfolios are defined per market cap

def create_size_portfolios(returns_data, N):

    """
    returns_data include:
    - stock id: test_id
    - market cap: mthcap
    - return: mthret
    - date: mthcaldt or date

    N: the number of portfolios to be created

    function output:
    - portfolio returns per date, portfolio
    (1 - N: from the smallest size to the largest size)
    """

    returns_data = returns_data.sort_values(by=['test_id', 'date'])

    # create a lag variable of market value
    returns_data['mthcap_lag'] = returns_data.groupby('test_id')['mthcap'].shift(1)
    returns_data = returns_data.dropna(subset=['mthcap_lag']) # drop if mthcap_lag is NaN

    # assign portfolios based on market cap of previous period (month)
    num_portfolios = N
    size_labels = range(1, N+1) # create a portolio lables

    # assign size portfolios based on lag market cap for each date
    returns_data['portfolio'] = returns_data.groupby('date')['mthcap_lag'].transform(
      lambda x: pd.qcut(x, q=num_portfolios, labels=size_labels)
    )

    # Calculate equally weighted returns for each portfolio, date
    returns_data['e_w_port_size_ret'] = returns_data.groupby(['date','portfolio'])['mthret'].transform('mean')

    # Aggregate portfolio returns per date, portfolio
    portfolio_returns = (
    returns_data[['date', 'portfolio', 'e_w_port_size_ret']]
    .sort_values(by=['date', 'portfolio'])  # Sort by date and portfolio
    .drop_duplicates()  # Drop duplicate rows
    .assign(year=lambda x: pd.DatetimeIndex(x["date"]).year)
    .assign(month=lambda x: pd.DatetimeIndex(x["date"]).month)
    .reset_index(drop=True)  # Reset index for clean output
    )

    return portfolio_returns

For example, if N = 3, we create the size portfolios (small, medium, large) by loading the function.

returns_data = crsp_ff3_monthly
N = 3
portfolio_returns = create_size_portfolios(returns_data, N)

portfolio_returns
date portfolio e_w_port_size_ret year month
0 1958-02-01 1 -0.016964 1958 2
1 1958-02-01 2 -0.008845 1958 2
2 1958-02-01 3 -0.010172 1958 2
3 1958-03-01 1 0.033966 1958 3
4 1958-03-01 2 0.035722 1958 3
... ... ... ... ... ...
961 1984-10-01 2 -0.022795 1984 10
962 1984-10-01 3 -0.007181 1984 10
963 1984-11-01 1 -0.048714 1984 11
964 1984-11-01 2 -0.032826 1984 11
965 1984-11-01 3 -0.012504 1984 11

966 rows × 5 columns

Next we implement Fama-MacBeth regression method to Chen, Roll, and Ross (1986) model. We define Fama-MacBeth regression function to estimate the magnitudes and significance of risk factors. The test assets for Fama-MacBeth regression method are 20 size portfolios.

We firstly define test assets and risk factors. The test assets are 20 size portfolios, risk factors are Chen, Roll, and Ross (1986) risk factors.

# define test assets - test assets are size portfolios
returns_data = crsp_ff3_monthly
N = 20 # you can design the portfolio size here by seting N

# define test assets as 20 size portfolio
test_assets = create_size_portfolios(returns_data, N)

# merge test assets with Fama-French risk factors and calculate the excess returns for each portfolio
test_assets = test_assets.merge(ff3_factors_monthly.drop(columns=['date']), how="left", on=['year','month'])
test_assets['ret_excess'] = test_assets['e_w_port_size_ret'] - test_assets['rf']

# format the columns names for test_assets to prepare for Fama-MacBeth regression
test_assets = test_assets.rename(columns={'portfolio': 'test_id'}) # rename portfolio as test_id

# take a look at the test assets (size portfolios)
test_assets.head(21)
date test_id e_w_port_size_ret year month mkt_excess smb hml rf ret_excess
0 1958-02-01 1 -0.018120 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.019320
1 1958-02-01 2 -0.020987 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.022187
2 1958-02-01 3 -0.028752 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.029952
3 1958-02-01 4 -0.019040 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.020240
4 1958-02-01 5 -0.021064 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.022264
5 1958-02-01 6 -0.002595 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.003795
6 1958-02-01 7 0.001753 1958 2 -0.0152 0.0065 0.0033 0.0012 0.000553
7 1958-02-01 8 -0.011303 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.012503
8 1958-02-01 9 -0.005352 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.006552
9 1958-02-01 10 -0.020944 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.022144
10 1958-02-01 11 -0.006328 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.007528
11 1958-02-01 12 -0.007115 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.008315
12 1958-02-01 13 -0.013130 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.014330
13 1958-02-01 14 -0.006343 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.007543
14 1958-02-01 15 -0.015301 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.016501
15 1958-02-01 16 -0.004638 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.005838
16 1958-02-01 17 -0.013799 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.014999
17 1958-02-01 18 0.004682 1958 2 -0.0152 0.0065 0.0033 0.0012 0.003482
18 1958-02-01 19 -0.011304 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.012504
19 1958-02-01 20 -0.019989 1958 2 -0.0152 0.0065 0.0033 0.0012 -0.021189
20 1958-03-01 1 0.035002 1958 3 0.0327 0.0065 -0.0097 0.0009 0.034102

We load CRR1986 risk factors.

# load CRR1986 risk factors

csv_usrec = 'https://raw.githubusercontent.com/lorenzogarlappi/COMM475/refs/heads/main/Data/replicated_UI_DEI_MP_UPR_UTS.csv'
crr_data = pd.read_csv(csv_usrec)

# format risk factors
risk_factors = crr_data[['date','UI','DEI','MP','UPR','UTS']]
risk_factors = risk_factors.dropna()
risk_factors = risk_factors.reset_index(drop = True)

# take a look at the risk factors
risk_factors.head()
date UI DEI MP UPR UTS
0 1953-04-01 0.002071 0.000537 0.004143 0.0042 0.0072
1 1953-05-01 -0.000342 -0.000008 0.005492 0.0044 0.0070
2 1953-06-01 0.001910 -0.000006 -0.004120 0.0046 0.0076
3 1953-07-01 0.000044 0.000290 0.012305 0.0058 0.0074
4 1953-08-01 0.001244 0.000111 -0.005447 0.0061 0.0082

Step 0. Clean test asset monthly returns

Before Fama-MacBeth regression, we load function ‘clean_test_asset_returns(test_assets,risk_factors)’. This function cleans test asset returns data and merges test asset returns with risk factors.

For more inforamtion about the function, please check the function in folder ‘tools’.

# clean_test_asset_returns function load test assets, risk factors, and return cleaned test asset returns.

returns_monthly = clean_test_asset_returns(test_assets,risk_factors)
returns_monthly
test_id date_yyyymm ret_excess first_date last_date date UI DEI MP UPR UTS
0 1 195802 -0.019320 1958-02-01 1984-11-01 1958-02-01 0.000338 -0.000196 -0.021560 0.0107 0.0132
1 1 195803 0.034102 1958-02-01 1984-11-01 1958-03-01 0.004348 0.000852 -0.012338 0.0105 0.0145
2 1 195804 0.063931 1958-02-01 1984-11-01 1958-04-01 0.000012 0.000247 -0.016693 0.0107 0.0172
3 1 195805 0.051332 1958-02-01 1984-11-01 1958-05-01 -0.002657 -0.001394 0.009769 0.0105 0.0180
4 1 195806 0.060711 1958-02-01 1984-11-01 1958-06-01 -0.002301 -0.000234 0.026049 0.0098 0.0200
... ... ... ... ... ... ... ... ... ... ... ...
6435 20 198407 -0.025846 1958-02-01 1984-11-01 1984-07-01 -0.000376 -0.000884 0.002778 0.0171 0.0133
6436 20 198408 0.105259 1958-02-01 1984-11-01 1984-08-01 -0.000464 0.000219 0.001204 0.0176 0.0089
6437 20 198409 -0.011635 1958-02-01 1984-11-01 1984-09-01 -0.000691 0.001311 -0.002801 0.0169 0.0084
6438 20 198410 -0.000777 1958-02-01 1984-11-01 1984-10-01 -0.001059 -0.002675 -0.000454 0.0131 0.0114
6439 20 198411 -0.013594 1958-02-01 1984-11-01 1984-11-01 -0.000296 -0.000485 0.003336 0.0119 0.0184

6440 rows × 11 columns

Step 1. Fama-MacBeth time-series analysis of returns to estimate beta

\[ r_{i,t}=\alpha_{i}+I_{1,t} \beta_{1 i}+I_{2,t} \beta_{2 i}+I_{3,t} \beta_{3 i}+I_{4,t} \beta_{4 i}+I_{5,t} \beta_{5 i} + e_{i,t} \]

For each test asset portfolio \(i\) (test id), We need to run rolling multiplde regressions (one for each of the test asset \(i\) (portfolio return) ), each with 60 observations (window size defined). This gives estimates of \(\beta_{j i}\) for \(j=1,5\) and \(i=1, 20\) for each month.

The Rolling Ordinary Least Squares (RollOLS) regression is used to estimate beta by applying OLS over the past 60 months.

We load function ‘fama_macbeth_timeseries_estimate_beta(returns_monthly,risk_factors)’ to estimate the monthly beta for each risk factor. For more details about the function, please check the function defined in ‘tools’ folder.

beta_monthly = fama_macbeth_timeseries_estimate_beta(returns_monthly,risk_factors)
beta_monthly
Intercept beta_UI beta_DEI beta_MP beta_UPR beta_UTS date test_id date_yyyymm
59 -0.075261 1.797498 -6.539273 0.992254 9.761025 1.372954 1963-01-01 1 196301
60 -0.098082 1.843343 -6.525231 0.897680 13.078850 1.200961 1963-02-01 1 196302
61 -0.111297 2.743554 -7.638911 0.892061 14.945628 1.085239 1963-03-01 1 196303
62 -0.112661 3.216630 -7.638812 0.916302 15.120569 0.962282 1963-04-01 1 196304
63 -0.123833 3.603865 -9.394386 0.935116 16.654864 1.070936 1963-05-01 1 196305
... ... ... ... ... ... ... ... ... ...
6435 -0.030460 -2.099438 2.992127 -0.434015 1.738623 0.826883 1984-07-01 20 198407
6436 -0.041946 -2.505150 3.416563 -0.222954 2.379998 0.909552 1984-08-01 20 198408
6437 -0.047187 -2.316854 2.945187 -0.134198 2.622746 0.918623 1984-09-01 20 198409
6438 -0.041781 -2.217842 2.707105 -0.114398 2.382481 0.833663 1984-10-01 20 198410
6439 -0.056195 -2.272755 2.782742 0.035737 3.069251 0.952451 1984-11-01 20 198411

5260 rows × 9 columns

Step 2. Fama-MacBeth cross-sectional analysis of returns

For each month, do a cross-sectional regression across all test ids:

\[ \operatorname{E}\left[r_{i}\right]=\mu+\lambda_{1} \beta_{1 i}+\lambda_{2} \beta_{2 i}+\lambda_{3} \beta_{3 i}+\lambda_{4} \beta_{4 i}+\lambda_{5} \beta_{5 i}+e_{s} \]

Cross sectional regressions wll give us estimates of market prices of risk for each month: \(\lambda_{j}, j=1,5\).

Next we load function ‘fama_macbeth_crosssection_estimate_premium(test_assets,risk_factors,beta_monthly)’ to estimate the monthly risk premium for each risk factor.

risk_premiums = fama_macbeth_crosssection_estimate_premium(test_assets,risk_factors,beta_monthly)
risk_premiums
date_yyyymm Intercept UI DEI MP UPR UTS
0 196301 -0.033272 -0.002771 -0.001644 -0.013846 0.002604 0.006900
1 196302 0.034729 0.004816 0.000931 -0.057264 -0.000017 -0.003181
2 196303 0.086848 0.008846 0.003620 -0.032144 -0.004417 -0.015324
3 196304 0.018301 -0.003413 0.000265 0.002815 0.001272 0.005437
4 196305 -0.003543 -0.001203 -0.000365 0.045375 -0.002298 -0.012044
... ... ... ... ... ... ... ...
257 198406 0.029005 -0.000091 -0.009527 0.008326 -0.009091 -0.021303
258 198407 0.139352 -0.003935 -0.010575 -0.021792 0.000448 -0.022557
259 198408 0.008082 -0.001393 0.000114 0.012712 -0.007661 0.008807
260 198409 0.047249 0.005477 -0.002527 -0.022399 -0.006090 -0.018988
261 198410 0.023859 -0.004460 -0.009379 0.008274 -0.006360 -0.004589

262 rows × 7 columns

Step 3. Aggregate risk premium timeseries and calculate t-statistics for each risk factor

For each risk factor, we calculate the t-statistics, t-statistics (Newey-West) for the risk premiums.

t-statistics (Newey-West) is the t-statistics calculated with the Newey and West (1987) standard errors. Newey-West standard errors adjust for heteroskedasticity and autocorrelation in time-series regression models. This is commonly used in asset pricing models when dealing with serially correlated resicuals.

The t-statistics indicate whether the risk factor is statistically significant to predict the returns of the test assets.

Next we load function ‘fama_macbeth_crosssection_premium_stat(risk_premiums)’ to test the significance of risk premiums.

price_of_risk = fama_macbeth_crosssection_premium_stat(risk_premiums)
price_of_risk
factor risk_premium t_statistic t_statistic_newey_west
0 Intercept 0.094 0.229 0.199
1 DEI -0.020 -1.158 -0.928
2 MP -0.028 -0.150 -0.166
3 UI 0.058 1.603 1.347
4 UPR 0.084 1.553 1.194
5 UTS 0.262 1.603 1.310

We can see that when using 20 size portfolios as test assets, the risk factors in Chen, Roll, and Ross (1986) model are not statistically signifcant. In another word, the predictation power of Chen, Roll, and Ross (1986) model is very limited when predicting stock returns.

The estimation difference from the results in the paper could also be driven by the fact that we did not fully replicate the Chen, Roll, and Ross (1986) risk factor data.

Fama-MacBeth regression results are sensitive to test assets, the test asset period used. We can change the test asset or test asset period to examine the risk premium of Chen, Roll, and Ross model. For example, we can use Fama-French industry portfolios or single stock returns to test the risk premiums.

In the following section, we consolidate the steps in Fama-MacBeth regressions into one function. The function loads risk factors and test assets; the function outputs are the magnitudes and significance of price of risk (or risk premium).

We can load function ‘fama_macbeth_regression(test_assets,risk_factors)’ and report the significance of price of risk.

price_of_risk = fama_macbeth_regression(test_assets,risk_factors)
price_of_risk
factor risk_premium t_statistic t_statistic_newey_west
0 Intercept 0.094 0.229 0.199
1 DEI -0.020 -1.158 -0.928
2 MP -0.028 -0.150 -0.166
3 UI 0.058 1.603 1.347
4 UPR 0.084 1.553 1.194
5 UTS 0.262 1.603 1.310