Bracing for a future that involves AI and ever-increasing data sets, CIOs face great cultural challenges.
Beware of biases in machine learning: One CTO explains why it happens
Computers are only as good, or as bad, as the people who program them. And it turns out that many individuals who create machine learning algorithms are presumably and unintentionally building in race and gender bias. In part one of a two-part interview, Richard Sharp, CTO of predictive marketing company Yieldify, explains how it happens.
The Enterprisers Project (TEP): Machines are genderless, have no race, and are in and of themselves free of bias. How does bias creep in?
Sharp: To understand how bias creeps in you first need to understand the difference between programming in the traditional sense and machine learning. With programming in the traditional sense, a programmer analyses a problem and comes up with an algorithm to solve it (basically an explicit sequence of rules and steps). The algorithm is then coded up, and the computer executes the programmer's defined rules accordingly.
With machine learning, it's a bit different. Programmers don't solve a problem directly by analyzing it and coming up with their rules. Instead, they just give the computer access to an extensive real-world dataset related to the problem they want to solve. The computer then figures out how best to solve the problem by itself.
For example, say you want to train a computer to recognize faces in photographs. To do this using machine learning, you might first build a large dataset of photographs and manually mark up the locations of faces within them. You would then feed this dataset into a machine learning system. Using examples in the dataset you've given it, the machine learning system would figure out for itself how best to recognize faces in images.
So machine learning is primarily programming with data or programming by example. In our face recognition example, at no point did the programmer sit down and write a set of rules about how to recognize faces. In fact, programmers have no idea how the computer is doing the face recognition in practice. They can't say anything other than it is doing it based on the patterns in the data I gave it.
And therein lies the problem. The biases creep in because they are embedded in the real-world data. Our world is full of discriminatory biases such as gender pay gaps, racial wealth gaps, etc. So if you feed a machine learning system real-world data and ask it to solve a particular problem — like, say, maximizing revenue by showing specific adverts to specific people — it may well pick up on these biases and exploit them. The programmer has no idea that this is happening since the computer figured it out for itself. But when you step back and take a look you find you have a machine learning system that is discriminatory, for instance deciding not to show adverts for high paying jobs to women.
TEP: Can you give some real-world examples of bias in machine learning?
Sharp: Yes. There are plenty. Some relate specifically to the advertising placement scenario I just presented. For example, a study from CMU showed that Google was showing significantly more adverts for jobs paying over $200,000 to men than it was to women. Another study from the FTC revealed that searches for black-identifying names yielded a higher incidence of ads associated with arrest than white-identifying names.
These issues were almost certainly down to machine learning systems exploiting biases in real-world data. At no point did a sexist Google programmer ever deliberately write a program that encoded the particular rule: If this is a job advert and salary is more than $200K then target women less frequently. Instead, the dataset model trained because it probably had fewer women clicking on high paying job adverts (due to existing social biases). The model learned that this was a pattern that it could exploit to get closer to its goal.
There are plenty of other examples too. Amazon recently rolled out same-day delivery to new city areas, while conspicuously excluding predominately black zip codes. And, a machine learning system actively in use in US courtrooms significantly overestimated the likelihood of black defendants reoffending.
TEP: What are some of the consequences we face down the road if machines are inherently biased?
Sharp: We are training these systems based on how the world is currently, rather than how we want the world to be. If you ask a machine learning system to optimize advert placement to increase click-through rate, it will do exactly that. If you give the learning system a dataset that has females click on high paying ads less frequently, the machine learning system will learn to exploit that fact to get closer to its goal. The consequence is that machine learning systems will perpetuate, or even exacerbate, existing biases.
[In part two of this interview, Sharp will explain what we can do to avoid bias in machine learning.]