You may think everyone knows what big data is by now, but misconceptions remain. Get expert advice for discussing big data in plain terms with colleagues, customers, or any audience.
To reduce biases in machine learning start with openly discussing the problem
Though machines are inherently objective, programmers are human and often have unconscious prejudices. Surprisingly, those biases can inadvertently show up in machine learning. For instance, a Google algorithm for optimizing job ads was found to show fewer ads for high-paying jobs to women than to men.
In the first part of this two-part interview, Richard Sharp, CTO of the predictive marketing company Yieldify, explained why this happens, whether or not programmers intend for it to happen. In part two, he tells us what we can do to reduce or eliminate bias in machine learning.
The Enterprisers Project (TEP): How can we identify and remove bias from today's machine learning algorithms?
Sharp: This is a difficult question, and I don't think we have all the answers yet. The broad application of machine learning at scale is a relatively new phenomenon, and we're just at the point now of starting to understand that perpetuating discriminatory biases is a real problem.
There are many ways we can potentially address the problem. One approach might be to pre-process training data to explicitly remove known biases that you don't want the machine learning system to exploit. Another might be to explicitly test the output of machine learning systems against known biases and then post-process the results to remove them. But, in summary, more research is needed. Both into the extent of the problem and the best solutions.
TEP: There is the ongoing issue of bias in Silicon Valley and the resulting lack of female and non-white software engineers. Do you think that is making the problem worse?
Sharp: Yes, this is an issue. As I mentioned earlier, it's not that programmers are explicitly encoding their biases into programs directly. But they are involved in the selection of datasets that are used to train the models. So a lack of diversity of employees, if not managed carefully, can lead to a corresponding selection bias in training data. And that's a problem.
For example, when the machine learning-based object recognition algorithm embedded in Google Photos miscategorized a black man as a Gorilla many questioned whether the dataset used to train the algorithm was heavily biased towards photos of white people.
TEP: Any advice you'd give to CIOs and CTOs on what they should do about machine learning bias?
Sharp: The main advice right now is to be very aware of the problem. If you're applying machine learning to an area where you think perpetuation of discriminatory biases is a significant risk, then talk about the potential problem with your teams and how you might mitigate it. In my experience, the risk of inadvertent discriminatory biases is not top-of-mind for most machine learning and data professionals. So just raising the issue and making sure it is being considered and discussed openly is a good first start.
If possible, test the output of your machine learning systems against variables like race and gender to see if you are inadvertently discriminating against your users. If you are, consider revising your training data or post-processing your model's results.