Information Coefficients & Linear Regression

Motivation

In this paper we use Monte Carlo simulation to show the relationship between the Information Coefficient (IC), correlation, decile returns, and linear regression.¹ We can also gain insights into investment related questions, such as

What level of IC is considered good?
What effect does volatility have on the spread?
What effect does universe size have on the hit rate?

We will start off by defining some terms to make sure that we are all on the same page.

The Spread is the difference between the average return of the top decile and the average return of the bottom decile.
The IC is the correlation between two series, here the return series and the exposure series. The Pearson correlation is used in this JavaScript implementation, but most times Spearman’s Rank correlation is used because it is less effected by outliers.
A linear model is the regression of one series versus another resulting in an intercept and a coefficients which describes the relationship between the two variables.
R-squared measures the goodness of fit of the linear model. It is also referred to as the coefficient of determination.

Simulation Process

We proceed by assuming asset returns are normally distributed and we generate two random series from a bivariate normal distribution with given mean and correlation. We will call the first random series the returns and call the second random series the exposures. The exposures represent the factor, which could be momentum, book to price, or any other factor.

As defined above, the correlation between the two random series is the IC. Here is and outline of how each simulation is performed:

Divide into deciles based on exposures
Calculate spread between top decile and bottom decile
Calculate the IC
Run a linear regression

Base case

To make this presentation interaction we use a JavaScript implementation of a multivariate random number generator to produce two series for each simulation, one we will call the returns and the other the exposures. Both series have zero mean. They only thing they have in common is a correlation, which we vary from zero to twenty percent. Before going further into the details, let’s have a look at some simulations so we can describe how everything relates.

viewof trials = Inputs.range( [100, 1000], {value: 100, step: 100, label: "Trials:"} )
viewof assets = Inputs.range( [100, 1000], {value: 100, step: 100, label: "Assets:"} )

viewof mu1 = Inputs.range( [0, .2], {value: 0, step: .01, label: "Mean (Returns):"} )
viewof mu2 = Inputs.range( [0, .2], {value: 0, step: .01, label: "Mean (Exposures):"} )

viewof s1 = Inputs.range( [.01, .2], {value: .08, step: .01, label: "Volatility (Returns):"} )
viewof s2 = Inputs.range( [.01, .2], {value: .08, step: .01, label: "Volatility (Exposures):"} )

trials = 100

assets = 100

mu1 = 0

mu2 = 0

s1 = 0.08

s2 = 0.08

tbl = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(trials, assets, rho[i], mu1, mu2, s1, s2));
  }
  return out;
}

Inputs.table(tbl);

tbl = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	100	0.08	0.08	0.00	3.8	0.57	0.009	0.014	0.093	0.011	0.011
100	100	0.08	0.08	0.01	6.3	0.59	0.023	0.010	0.081	0.023	0.023
100	100	0.08	0.08	0.02	11.5	0.67	0.054	0.013	0.091	0.056	0.056
100	100	0.08	0.08	0.03	18.4	0.78	0.069	0.014	0.097	0.068	0.068
100	100	0.08	0.08	0.05	21.4	0.76	0.101	0.020	0.118	0.103	0.103
100	100	0.08	0.08	0.10	49.3	0.96	0.207	0.053	0.207	0.208	0.208
100	100	0.08	0.08	0.20	87.4	0.99	0.379	0.152	0.379	0.375	0.375

You can see from the table above that we run 100 simulations, each with 100 assets. By assets we mean how long each random series is and by simulations we mean how many random samples we draw. The values shown in the table are averages of all the simulations. The first column shows the correlation, which represents the information. The next two columns show the volatility of the returns series and the exposures series, which we have set to 8% for now, but will very later.

The Spread shows one measure of the performance of the factor. The exposures series is our factor and we create deciles based on that series. Once we have the deciles, we average the returns for each decile. The spread is the difference between the top decile average return and the bottom decile average return. The percent positive column shows what proportion of the spreads are positive. Each simulation produces one spread, so since we have 100 simulations, we have 100 spreads. The average of these spreads is the spread column and the proportion positive is the percent positive column. The IC is the average correlation between the exposures and returns series and is another standard measure of the performance of a factor.

Measuring the correlation between two factors is a quick and easy way to see how closely they are related, and how powerful the exposures may be in predicting returns. Another way to do so is to run a regression of the exposures on the returns. The regression function provides three main outputs, the y-intercept, the coefficient, and the R-squared measure. The R-squared is a measure of the goodness of fit of the regression equation. The square root of the R-squared statistic returns the IC (to a close approximation). In this simple one-variable linear regression framework the coefficient is equivalent to the correlation that was introduced to the two random series. Lastly, the coefficient can be approximated by the scaled IC as defined in the formula below:

$S c a l e d . I C = \frac{σ_{r e t u r n s}}{σ_{e x p o s u r e s}} * I C$

In this table both the returns and exposures have the same standard deviation, so the volatility ratio is one, so it doesn’t seem very informative, but in the next table we will run simulations with a higher returns volatility and you will see the formula holds.

Remember that these are random series with zero mean, so the only information content is the correlation. By running enough simulations we can reliably approximate the true correlation by both the IC and the R (the square root of the regression R-squared statistic). We can also approximate the coefficient by calculating the scaled IC.

Correlation, IC, and hit rate

As you can see from the base case simulation run, the correlation and the IC are closely linked. In these simulations the correlation is the given relationship between returns and exposures and the IC is a measurement of how well our signal (or rank in this case) works.²

The percent positive (Pct.Positive) column shows the number of simulation runs (out of r trials) that the spread was positive. This number is often called the hit rate. If the factor has a zero correlation we would expect a hit rate of 0.5 (or 50%). A correlation of one percent bumps the hit rate up to almost 60% and two percent gets us to 68%. Usually ICs in the range of 5% to 10% are considered very good. The hit rate in that case would be between 89% and 99%, which is very good indeed. Note that this is on a large universe of 1000 assets. We will see later that the hit rate declines rapidly as the number of assets falls.

Higher Return Volatility

In the next set of simulations we increase the return volatility from 8% to 16%.

tbl2 = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(100, 100, rho[i], 0, 0, .16, .08));
  }
  return out;
}

Inputs.table(tbl2);

tbl2 = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	100	0.16	0.08	0.00	-5.9	0.48	-0.002	0.008	0.073	-0.009	-0.009
100	100	0.16	0.08	0.01	21.7	0.56	0.026	0.010	0.080	0.105	0.105
100	100	0.16	0.08	0.02	52.8	0.68	0.041	0.013	0.088	0.159	0.159
100	100	0.16	0.08	0.03	62.2	0.71	0.071	0.017	0.109	0.279	0.279
100	100	0.16	0.08	0.05	99.9	0.79	0.110	0.022	0.124	0.429	0.429
100	100	0.16	0.08	0.10	235.5	0.97	0.261	0.079	0.261	1.033	1.033
100	100	0.16	0.08	0.20	399.3	1.00	0.451	0.209	0.451	1.676	1.676

Doubling the returns volatility basically doubles the spread. The IC and the regression R (the square root of R-squared) still match the actual correlation between the two random series, but the coefficient is twice as large as in the previous table. This is because the returns volatility is now twice as high as the exposures volatility. The regression coefficient is accurately estimated by the scaled IC measure.

Lower Volatility

In the next table we simulate both series with only 2% volatility (for both returns and exposures). For a given correlation, the percent positive, IC, and coefficients are all comparable to the base case, but the spread is much lower. The returns volatility is what creates the opportunity to profit and if the correlation is high enough the investor can capitalize on the hit rate (percent positive).

tbl3 = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(100, 100, rho[i], 0, 0, .02, .02));
  }
  return out;
}

Inputs.table(tbl3);

tbl3 = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	100	0.02	0.02	0.00	-0.0	0.51	-0.005	0.010	0.083	-0.005	-0.005
100	100	0.02	0.02	0.01	0.2	0.60	0.012	0.010	0.083	0.010	0.010
100	100	0.02	0.02	0.02	0.2	0.54	0.023	0.014	0.092	0.024	0.024
100	100	0.02	0.02	0.03	0.5	0.64	0.046	0.012	0.087	0.047	0.047
100	100	0.02	0.02	0.05	1.6	0.80	0.111	0.022	0.122	0.113	0.113
100	100	0.02	0.02	0.10	2.9	0.97	0.199	0.047	0.200	0.203	0.203
100	100	0.02	0.02	0.20	5.7	1.00	0.394	0.163	0.394	0.396	0.396

All the usual relationships hold. The only difference is that the spread is much lower due to the lower volatility of the returns series.

Higher Exposure Volatility

tbl4 = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(100, 100, rho[i], 0, 0, .08, .16));
  }
  return out;
}

Inputs.table(tbl4);

tbl4 = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	100	0.08	0.16	0.00	-2.8	0.44	-0.006	0.011	0.084	-0.002	-0.002
100	100	0.08	0.16	0.01	4.8	0.58	0.018	0.011	0.082	0.004	0.004
100	100	0.08	0.16	0.02	12.1	0.69	0.061	0.011	0.089	0.015	0.015
100	100	0.08	0.16	0.03	13.4	0.70	0.060	0.012	0.089	0.015	0.015
100	100	0.08	0.16	0.05	30.0	0.87	0.132	0.027	0.142	0.034	0.034
100	100	0.08	0.16	0.10	56.0	0.98	0.247	0.071	0.247	0.064	0.064
100	100	0.08	0.16	0.20	107.0	1.00	0.456	0.217	0.456	0.123	0.123

When the exposures volatility is twice as high as the returns volatility the spread remains comparable to the base case, but the scaled IC drops in half, as does the regression coefficient. Clearly, we want the volatility to be on the returns and not on the exposures.

Non-zero Mean

tbl5 = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(100, 100, rho[i], 0.05, 0, .16, .08));
  }
  return out;
}

Inputs.table(tbl5);

tbl5 = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	100	0.16	0.08	0.00	11.2	0.56	0.009	0.012	0.089	0.036	0.036
100	100	0.16	0.08	0.01	-1.0	0.46	0.012	0.011	0.086	0.053	0.053
100	100	0.16	0.08	0.02	32.1	0.57	0.036	0.011	0.086	0.145	0.145
100	100	0.16	0.08	0.03	53.4	0.67	0.054	0.010	0.077	0.211	0.211
100	100	0.16	0.08	0.05	109.9	0.81	0.127	0.025	0.134	0.511	0.511
100	100	0.16	0.08	0.10	203.9	0.94	0.229	0.063	0.235	0.906	0.906
100	100	0.16	0.08	0.20	401.7	1.00	0.455	0.212	0.455	1.711	1.711

While generally a higher return is better, in this case we are measuring the spread, which is the difference between the average return in the top decile minus the average return in the bottom decile. So a higher return just increases the average return, but does not (necessarily) benefit the top decile more than the bottom decile, leaving the spread pretty much the same. As you can see, all the other metrics are comparable as well. The correlation and volatility are the two driving forces of factor performance.

Small Number of Assets

lower_assets = 25;

tbl6 = {
  var rho = [0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.2];
  var out = [];

  for (let i = 0; i < rho.length; i++) {
    out.push(icsim(100, lower_assets, rho[i], 0, 0, .08, .08));
  }
  return out;
}

Inputs.table(tbl6);

lower_assets = 25

tbl6 = Array(7) [Object, Object, Object, Object, Object, Object, Object]

Trials	Assets	Ret. Vol.	Exp. Vol	Correlation	Spread	Pct_Positive	IC	R_sq	R	Coef	Scaled_IC
100	25	0.08	0.08	0.00	1.9	0.49	0.020	0.046	0.173	0.019	0.019
100	25	0.08	0.08	0.01	5.3	0.51	0.037	0.045	0.176	0.043	0.043
100	25	0.08	0.08	0.02	6.6	0.55	0.044	0.052	0.175	0.047	0.047
100	25	0.08	0.08	0.03	-4.1	0.46	0.009	0.041	0.161	0.011	0.011
100	25	0.08	0.08	0.05	16.6	0.63	0.085	0.055	0.184	0.094	0.094
100	25	0.08	0.08	0.10	38.8	0.75	0.191	0.072	0.231	0.197	0.197
100	25	0.08	0.08	0.20	79.4	0.91	0.392	0.182	0.394	0.402	0.402

As mentioned earlier, when we lower the number of assets to 25 the hit rate (Pct.Positive) falls substantially. The rest of the metrics are similar to the base case. Quantitative investing is a numbers game, where it pays to have as much breadth as possible. The hit rates decline with a smaller number of assets, Although the hit rates are still okay, you have to make numerous “bets” to get those advertised numbers. If you are only investing in a small number of assets, you could get a resulting hit rate that is much worse (or better) than the advertised hit rate.

Conclusions

This interactive presentation explained the relationship between the Information Coefficient (IC) and linear regression model output. These simulations should help you understand the importantance of correlation in factor investing and what level of IC and/or correlation will yield acceptable results. We have shown that the larger the universe the better the expected hit rate (for a given correlation). We have also shown that returns volatility is good and the exposures volatility is (relatively) bad. Finally, average returns are not as important as the spread between the top and bottom deciles.

import {choleskyDecomposition} from "@sw1227/cholesky-decomposition"
import {boxMuller} from "@sw1227/box-muller-transform"
math = require("mathjs")

// Mean of an array
mean = array => (array.reduce((a, b) => a + b) / array.length);

// Round to specified digits
g = (x, digits = 3) => x.toFixed(digits);

// Sum positive values in an array
sumPositives = (arr = []) => {
 const isPositive = num => typeof num === 'number' && num > 0;
 const res = arr.reduce((acc, val) => {
    if(isPositive(val)){
       acc += 1;
    };
    return acc;
 }, 0);
 return res;
};

// Linear regression
linearRegression = (y,x) => {
  var lr = {};
  var n = y.length;
  var sum_x = 0;
  var sum_y = 0;
  var sum_xy = 0;
  var sum_xx = 0;
  var sum_yy = 0;
  
  for (var i = 0; i < y.length; i++) {
  
      sum_x += x[i];
      sum_y += y[i];
      sum_xy += (x[i]*y[i]);
      sum_xx += (x[i]*x[i]);
      sum_yy += (y[i]*y[i]);
  } 
  
  lr['slope'] = (n * sum_xy - sum_x * sum_y) / (n*sum_xx - sum_x * sum_x);
  lr['intercept'] = (sum_y - lr.slope * sum_x) / n;
  lr['r2'] = Math.pow((n*sum_xy - sum_x*sum_y) / Math.sqrt((n*sum_xx-sum_x*sum_x)*(n*sum_yy-sum_y*sum_y)),2);
  
  return lr;
}

multivariateNormal = (mean, covArray) => {
  const n = mean.length;
  const cov = math.matrix(covArray);
  return {
    // Probability Density Function
    pdf: x => {
      const c = 1 / (math.sqrt(2*math.PI)**n * math.sqrt(math.det(cov)));
      return c * math.exp(
        -(1/2) * math.multiply(
          math.subtract(math.matrix(x), math.matrix(mean)),
          math.inv(cov),
          math.subtract(math.matrix(x), math.matrix(mean))
        )
      );
    },
    // Differential entropy
    entropy: 0.5*math.log(math.det(cov)) + 0.5*n*(1 + math.log(2*math.PI)),
    // Generate n samples using Cholesky Decomposition
    sample: n_samples => Array(n_samples).fill().map(_ => {
      const L = choleskyDecomposition(cov);
      const z = boxMuller(n);
      return math.add(
        math.matrix(mean),
        math.multiply(cov, math.matrix(z))
      ).toArray();
    }),
  };
}

pcorr = (x, y) => {
  let sumX = 0,
    sumY = 0,
    sumXY = 0,
    sumX2 = 0,
    sumY2 = 0;
  const minLength = x.length = y.length = Math.min(x.length, y.length),
    reduce = (xi, idx) => {
      const yi = y[idx];
      sumX += xi;
      sumY += yi;
      sumXY += xi * yi;
      sumX2 += xi * xi;
      sumY2 += yi * yi;
    }
  x.forEach(reduce);
  return (minLength * sumXY - sumX * sumY) / Math.sqrt((minLength * sumX2 - sumX * sumX) * (minLength * sumY2 - sumY * sumY));
};

icsim = (trials, assets, rho, mu1, mu2, s1, s2) => {

    const terciles = 10;
    const Sigma = math.matrix([[s1 * s1, s1 * s2 * rho], [s1 * s2 * rho, s2 * s2]]);

    const spread = Array(assets).fill(0);
    const longSpread = Array(assets).fill(0);
    const shortSpread = Array(assets).fill(0);
    const icPearson = Array(assets).fill(0);
    const rsq = Array(assets).fill(0);
    const r = Array(assets).fill(0);
    const coef = Array(assets).fill(0);
    const coefTest = Array(assets).fill(0);

    for (let i = 0; i < trials; i++) {
        const norm = multivariateNormal([mu1, mu2], Sigma);
        const sims = norm.sample(assets);
        
        // Convert array to objects with keys 'returns' and 'exposures'
        const simsData = sims.map(row => ({ returns: row[0], exposures: row[1] }));

        // Rank exposures into terciles
        const rankedSims = _.orderBy(simsData, ['exposures'], ['asc']);
        rankedSims.forEach((item, index) => item.rank = Math.ceil((index + 1) / assets * terciles));

        // Aggregate by rank
        const decile = _.chain(rankedSims)
                        .groupBy('rank')
                        .map((value, key) => ({
                            rank: parseInt(key),
                            avgReturn: _.meanBy(value, 'returns')
                        }))
                        .orderBy('rank', 'asc')
                        .value();

        const univ = _.meanBy(decile, 'avgReturn');
        const decileMap = _.keyBy(decile, 'rank');

        spread[i] = decileMap[terciles].avgReturn - decileMap[1].avgReturn;
        longSpread[i] = decileMap[terciles].avgReturn - univ;
        shortSpread[i] = univ - decileMap[1].avgReturn;
        icPearson[i] = pcorr(rankedSims.map(item => item.returns), rankedSims.map(item => item.exposures));
        
        // Linear regression
        var tc = toColumns(rankedSims);
        var lr = linearRegression(tc.returns, tc.exposures)
        rsq[i] = lr.r2;
        r[i] = Math.sqrt(lr.r2);
        coef[i] = lr.slope;
        coefTest[i] = math.std(rankedSims.map(item => item.returns)) / math.std(rankedSims.map(item => item.exposures)) * icPearson[i];
    }

    var out = {};
    out['Trials'] = trials;
    out['Assets'] = assets;
    out['Ret. Vol.'] = s1;
    out['Exp. Vol'] = s2;
    out['Correlation'] = rho.toFixed(2);
    out['Spread'] = g(mean(spread)*10000, 1); 
    <!-- out['Long_Spread'] = g(mean(longSpread)*10000); -->
    <!-- out['Short_Spread'] = g(mean(shortSpread)*10000); -->
    out['Pct_Positive'] = g((sumPositives(spread) / trials), 2);
    out['IC'] = g(mean(icPearson));
    out['R_sq'] = g(mean(rsq));
    out['R'] = g(mean(r));
    out['Coef'] = g(mean(coef));
    out['Scaled_IC'] = g(mean(coefTest));
    return out;
}

toColumns = rawdata => {
  // Initialize columns
  const columns = {};

  // Get keys from first row (assumes all rows have the same keys)
  const keys = Object.keys(rawdata[0]);

  // Initialize empty arrays for each key
  keys.forEach(key => {
    columns[key] = [];
  });

  // Populate columns
  rawdata.forEach(row => {
    keys.forEach(key => {
      columns[key].push(row[key]);
    });
  });

  return columns;
}

  import {choleskyDecomposition as choleskyDecomposition} from "@sw1227/cholesky-decomposition"

  import {boxMuller as boxMuller} from "@sw1227/box-muller-transform"

math = Object {isNumber: ƒ(e), isComplex: ƒ(e), isBigNumber: ƒ(e), isBigInt: ƒ(e), isFraction: ƒ(e), isUnit: ƒ(e), isString: ƒ(e), isArray: ƒ(), isMatrix: ƒ(e), isCollection: ƒ(e), isDenseMatrix: ƒ(e), isSparseMatrix: ƒ(e), isRange: ƒ(e), isIndex: ƒ(e), isBoolean: ƒ(e), isResultSet: ƒ(e), isHelp: ƒ(e), isFunction: ƒ(e), isDate: ƒ(e), isRegExp: ƒ(e), …}

mean = ƒ(array)

g = ƒ(…)

sumPositives = ƒ(…)

linearRegression = ƒ(y, x)

multivariateNormal = ƒ(mean, covArray)

pcorr = ƒ(x, y)

icsim = ƒ(trials, assets, rho, mu1, mu2, s1, s2)

toColumns = ƒ(rawdata)

Footnotes

This paper was motivated by a short paper and slides by Oliver Buckley at Invesco which can be found here.↩︎
Since this is a simple one-factor (exposure) model, the IC matches the correlation very close (we are able to capture the full signal). In a multi-factor context the IC will usually be lower than the correlation because the factors will not be perfectly uncorrelated.↩︎