Inside Apple Card’s use of the ‘black box’

Apple Card’s credit scoring practices seem fairly mainstream. Their case illustrates the problem isn’t that ‘black box’ algorithms are taking over. The problem is that traditional means of credit scoring used by the vast majority of US banks like Goldman Sachs are routinely discriminatory.

Over the weekend, furor erupted when it was revealed that a husband and wife, both of comparable financial means, were offered drastically different credit line terms. This comes on the heels of Apple Card’s impressive news of $10B in credit lines opened in its first month.

New York’s Department of Financial Services announced it would open a probe to investigate whether ‘any algorithm’ resulted in discrimination on the basis of sex. Presidential hopefuls like Elizabeth Warren jumped in and also indicated the dangers of ‘black box’ algorithms.

How much blame do ‘black box’ algorithms deserve for this disparity? Is Apple Card by Goldman Sachs truly discriminating against females?

There’s more here than meets the eye.

Machine learning and its ‘black box’ algorithms have become ubiquitous in many consumer-facing technologies (think Facebook newsfeed, Uber mapping, and Spotify music recommendations). It’s true that these ‘black box’ algorithms make consequential decisions without full visibility into how they arrive at their conclusions.

However, the usage of machine learning for underwriting — and credit scoring and credit decisioning specifically — by US banks remains in its infancy.

Based on a current landscape of the usage of machine learning in underwriting by US banks and public statements by Apple Card, it appears that Goldman Sachs is among the vast majority of US banks that use traditional credit scores to provide decisions.

Cutting through the marketing noise: Many fintech providers and credit scoring companies claim they are using ‘machine learning’ but its remains an early adopter market. Few lenders actively operate ML models for credit scoring.

Only a limited number of US banks and mainly those connected with US AI/ML fintechs like Upstart and ZestAI have fully implemented (or are about to) machine learning algorithms for underwriting. The usage of machine learning by these lenders, such as Discover and Freddie Mac has attracted its own regulatory scrutiny.

If Apple Card uses ‘machine learning’ in underwriting, it’s limited in use

Currently among lenders, it is considered a best practice to use machine learning in exploratory data analysis. Through this upfront analysis, the lender uses machine learning insights to inform the final design of a traditional credit scoring model.

Unlike machine learning algorithms, however, these traditional credit scoring models are fairly traditional, in the mold of widely recognized ones like FICO or Vantage. For regulatory purposes, these models are generally considered to be explainable.

Though Apple Card is very upfront about its use of machine learning for personalized tagging and spending recommendations… it does not appear to use machine learning in underwriting.

This limited usage of machine learning is a far cry from ‘black box’ algorithms running rampant. Indeed, the credit decisions being made are from the same types of mathematical models used by banks for decades.

If Goldman is indeed using machine learning in its underwriting, it is most likely of this ‘use-ML-to-inform-traditional-models’ variety, which means the traditional credit score is what is really at the center of this controversy.

The problem is that traditional credit scores are discriminatory

Goldman Sachs’s consumer business hasn’t reported any use of machine learning in underwriting, which means traditional logistic regression models (the same technique that produces the FICO score) are responsible for this potential disparity.

Machine learning algorithms are considered a ‘black box’ due to the difficulty in interpreting its results. For lending, the principle regulations ML must comply with in the US are ECOA and FCRA, as well as SR 11–7 guidance

Though robust quantitative research on discrimination by sex is rarer, numerous studies have shown traditional credit scores are discriminatory on the basis of other illegal characteristics, like race and ethnicity:

A 2012 study by the Consumer Financial Protection Burea (CFPB) that examined credit scores on 200,000 people found the median FICO score in majority-minority zip codes was in the 34th percentile, while it was in the 52nd percentile for low minority population zip codes.Similarly, the FTC studied racial disparities in the use of credit scores for auto insurance and found substantial racial disparities with African Americans and Hispanics strongly over-represented in the lowest scoring categories.

Additionally, many lenders have settled investigations with the Department of Justice and/or CFPB regarding discrimination these last few years, including American Express, Wells Fargo, and Fifth Third Bank.

Policymakers should hold lenders to higher standards of fairness in lending

The solution here is not hindering innovation, but setting higher expectations on banks and other lenders to provide credit more fairly.

Experimenting with new underwriting techniques and data is one way in which greater fairness can be responsibly achieved.

For example, the recent release of the results of Upstart’s No Action Letter with the CFPB. The study found the inclusion of new data types and algorithms resulted in acceptance rates increasing by 23–29% and average APRs decreasing by 15–17%. The results were even more significant for low- and middle-income consumers and young consumers.


To be sure, ‘black box’ algorithms will one day (and in some cases already) carry significant consequences for credit scoring and on fairness. Today, however, the focus remains on the limits of existing credit scoring approaches to building a truly fair and inclusive financial system.

#machinelearning #AI #fintech #underwriting #AppleCard #fairlending


©2019 by Fintalk. Proudly created with