How to tell if AI is working the way we want it to

Credit: Pixabay / Public Domain CC0

About a decade ago, deep learning models began achieving superhuman results on all kinds of tasks, from beating world champion board game players to outsmarting doctors in diagnosing breast cancer.

These powerful deep learning models are generally based on artificial neural networks, which were first proposed in the 1940s and have become a popular type of machine learning. A computer learns to process data using layers of interconnected nodes, or neurons, that mimic the human brain.

As the field of machine learning has grown, artificial neural networks have grown with it.

Deep learning models now often consist of millions or billions of interconnected nodes in many layers that are trained to perform discovery or classification tasks using large amounts of data. But because the models are so complex, even the researchers who designed them don’t fully understand how they work. This makes it hard to know if they are working properly.

For example, perhaps a model designed to help doctors correctly diagnose patients predicted that a skin lesion was cancerous, but it did so by focusing on an unrelated sign that often appears when cancerous tissue is present in a photo, rather than the cancer itself. tissue. This is known as spurious correlation. The model makes the right prediction, but it does so for the wrong reason. In a real clinical setting where the sign does not appear on positive cancer images, it could cause a misdiagnosis.

With so much uncertainty surrounding these so-called “black box” models, how can one uncover what’s going on inside the box?

This conundrum has led to a rapidly growing new area of ​​study where researchers develop and test explanatory methods (also called interpretability methods) that seek to shed light on how black-box machine learning models make predictions.

What are explanatory methods?

At their most basic level, explanatory methods are global or local. A local method of explanation focuses on explaining how the model made a particular prediction, while global explanations attempt to describe the general behavior of an entire model. This is often done by developing a separate, simpler (and hopefully understandable) model that mimics the larger black box model.

But because deep learning models operate in fundamentally complex and non-linear ways, developing an effective global explanation model is particularly challenging. This has led researchers to focus recent attention on local explanation methods, explains Yilun Zhou, a graduate student in the Interactive Robotics group of the Computer Science and Artificial Intelligence Laboratory (CSAIL) who studies models, algorithms and evaluations in interpretables. automatic learning.

The most popular types of local explanation methods fall into three broad categories.

The first and most common method of explanation is known as attribute attribution. Feature attribution methods show which features were most important when the model made a particular decision.

Features are the input variables provided to a machine learning model and used in its prediction. When the data is arrays, features are extracted from the columns in a dataset (transformed using a variety of techniques so that the model can process the raw data). For image processing tasks, however, each pixel in an image is a feature. If a model predicted an X-ray image to show cancer, for example, the feature rendering method would highlight the pixels in that particular X-ray that were most important to the model’s prediction.

In essence, feature rendering methods show what the model pays more attention to when making a prediction.

“Using this explanation of function performance, you can check if a spurious correlation is a problem. For example, it will show whether pixels in a watermark are highlighted or whether pixels in a real volume are highlighted,” says Zhou.

A second type of explanation method is known as counter-explanation. Given an input and a model prediction, these methods show how to modify that input to fit another class. For example, if a machine learning model predicts that a borrower will be denied a loan, the counterfactual explains what factors need to change to get their loan application accepted. Maybe your credit score or income, both attributes used in the model’s prediction, needs to be higher to be approved.

“The good thing about this method of explanation is that it tells you exactly how you need to change the input to reverse the decision, which could have some practical use. For someone who is applying for a home loan and hasn’t got one, this explanation will tell them what they need to do to get the desired result,” he says.

The third category of explanatory methods is known as sample significance explanations. Unlike the others, this method requires access to the data used to train the model.

An explanation of sample significance will show which training sample a model relied on most when making a particular prediction. Ideally, this is the sample closest to the input data. This type of explanation is especially useful if you are considering a seemingly absurd prediction. There may have been a data entry error that affected a particular sample used to train the model. With this knowledge, one could correct this sample and retrain the model to improve its accuracy.

How are explanatory methods used?

One motivation for developing these explanations is to ensure quality and debug the model. With a better understanding of how features affect a model’s decision, for example, you can recognize that a model is malfunctioning and take steps to fix the problem, or discard the model and start over.

Another more recent area of ​​research is investigating the use of machine learning models to discover scientific models that humans have never discovered before. For example, a cancer diagnosis model that bypasses doctors may be flawed or may detect certain patterns hidden in an X-ray image that represent an early pathologic pathway to cancer that were either unknown to human doctors or thought to be irrelevant. .

However, it is still very early in this area of ​​research.

Warning words

While explanatory methods can sometimes be useful for machine learning practitioners when trying to catch bugs in their models or understand the inner workings of a system, end users should be careful when trying to use them in practice, Marzyeh says. . Ghassemi, assistant professor and head of the Healthy ML group at CSAIL.

As machine learning has been adopted in many industries, from healthcare to education, explanatory methods are used to help decision makers better understand a model’s predictions so they know when to trust the model and use its guidance in practice. But Ghassemi cautions against using these methods in this way.

“We found that explanations make people, both experienced and novice, very confident about the capabilities or advice of a particular recommender system. I think it’s really important for people not to turn off that inner circuit by asking, “let me challenge the advice that I’m

given,” he says.

Scientists know that explanations make people overconfident based on other recent work, he adds, citing some recent studies by Microsoft researchers.

Far from being a silver bullet, explanatory methods have their share of problems. First, Ghassemi’s recent research has shown that explanatory methods can perpetuate prejudice and lead to worse outcomes for people from disadvantaged groups.

Another pitfall of explanatory methods is that it is often impossible to tell whether the explanatory method is correct in the first place. It would be necessary to compare the explanations with the actual model, but since the user does not know how the model works, this is circular reasoning, Zhou says.

He and other researchers are working to refine the explanation methods so they are more faithful to the actual model’s predictions, but Zhou cautions that even the best explanation should be taken with a grain of salt.

“Furthermore, people generally perceive these models as anthropomorphic decision makers, and we are prone to overgeneralization. We have to calm people down and hold them back to really make sure that their understanding of the generalized pattern that they build from these local explanations is balanced,” he adds.

Zhou’s latest research attempts to do just that.

What are the prospects for machine learning explanation methods?

Rather than focusing on providing explanations, Ghassemi argues that more effort needs to be made by the research community to study how information is presented to decision makers so that they understand it, and that more regulation needs to be put in place to ensure that the machine learning models are used responsibly in practice. Better methods of explanation alone are not the answer.

“I was excited to see that there’s a lot more recognition, even in the industry, that we can’t just take this information and create a nice dashboard and assume people will perform better with it. You have to have measurable improvements in action, and I hope this leads to real guidelines for improving how we display information in these deeply technical fields like medicine,” he says.

And in addition to new work focused on improving explanations, Zhou plans to see more research on explanation methods for specific use cases, such as model debugging, scientific discovery, fairness checking, and security assurance. By identifying the detailed characteristics of explanation methods and the requirements of different use cases, researchers could create a theory that would match explanations in specific scenarios, which could help overcome some of the pitfalls of using them in scenarios of real world.


This story was republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular website covering MIT research, innovation and teaching news.

Leave a Comment