“Figures don’t lie” but do regulators figure? Simpson’s Paradox says so

In emails in recent months we've warned bankers about the extremely aggressive enforcement of anti-redlining policies. This is becoming more and more obvious every day as more banks get threatened with referral to the DOJ. The word is the DOJ is now approaching a record backlog of potential redlining cases.

What we find particularly troublesome is the abuse of statistical analysis as the basis for threatening redlining referrals to the DOJ. The technique features a two-step approach in which the regulator first insists that a bank's REMA (Reasonably Expected Market Area) is larger than its CRA Assessment Area. Then, having established a reasonably expected market that extends beyond what the bank considers to be its market, the regulator employs statistical analysis to accuse the bank of severely underperforming in majority-minority tracts that are in the expanded REMA, but outside the bank's traditional market. The accusation and threat of referral are based on underperformance lending in the majority-minority tracts that is "statistically significant".

The use of "statistical analysis" gives the appearance of an unbiased and objective measurement of bank performance. But the reality is that the analysis can be seriously biased because it assumes that the submarkets within the REMA are statistically equivalent. But that is frequently not the case.

A good example of how redlining analysis based on a REMA can be misleading can be seen in the following example.

LOANS IN MM TRACTS OF TOTAL LOANS BY EACH BANK IN EACH MARKET

MARKET	BANK1	BANK 2	BANK 3	MKT TOTALS
AA (No MM Tracts)	0/900=0%	0/10=0%	0/90=0%	0/1000=0%
Additional Mkt (MMCT Originations/Total Originations)	60/100=60%	50/90=55.5%	40/90=44.4%	150/280=53.6%
REMA (TOTALS)	60/1000=6%	50/100=50%	40/180=22.2%	150/1280=11.7%

In the example 3 banks are competing in a market (they compose the entire market for this illustration). Bank1 has no MM tracts in its Assessment Area and makes 90% of its loans in the AA. The other banks compete in both the AA and the expanded REMA tracts (which is where all the MM tracts are located). Bank 2 extends only 10% of its loans in Bank 1's AA while Bank 3 extends 50% of its loans in Bank1's AA.

In the expanded part of the REMA Bank1 outperforms the other 2 banks with 60% of its loans in the expanded market's MM tracts, while the other 2 banks extend 55.5% and 44.4% of their loans in the expanded community MM tracts. But when the two markets are combined into the REMA insisted upon by the Regulators, Bank1 substantially underperforms compared to the other banks. In fact, Bank 1 underperforms so badly that its performance is "statistically significant".

How can Bank1 outperform the entire market for lending in the expanded area where the only MM tracts are located, but underperform when the AA and the expanded area in the REMA are combined?

The answer is "Simpson's Paradox" (No not Homer's) which can appear when we have unbalanced datasets. In this case, Bank1 does the overwhelming percent of its lending in the AA where there isn't even the possibility of lending in MM tracts because there are no MM tracts in the AA. The other banks do more of their lending in the added market where there are MM tracts. Their penetration rates in the MM tracts when computed based on the entire expanded REMA are not as seriously diluted by their activity in Bank1's Assessment Area as Bank1 that is weighed down by the large percentage of its loans in its traditional AA. This demonstrates the fallacy that statistical significance is always a fair and objective indicator of redlining.

View GeoDataVision documents on JD Supra