Fifteen Questions: Finale Doshi-Velez on AI Decision Making, Novel Writing and Unicorns

Computer Science Professor Finale Doshi-Velez sat down with Fifteen Minutes to talk about artificial intelligence in healthcare decision making, the dangers of “boring AI,” and writing what may be her first novel.

By John Lin Sep 27, 2024

{shortcode-f0eb495de7b0cbc26450a43891e17d0152aea0eb}

Finale Doshi-Velez is the Herchel Smith Professor of Computer Science. This interview has been edited for length and clarity.

FM: Your undergrad was in aerospace engineering and physics at MIT. I am curious how you transitioned from that background to AI and decision making.

FDV: I think it’s easy to look back and find a common theme of what excites you. As an undergrad, the classes that really excited me were about probability and machine learning and AI. There were already classes that were related to what I’m doing now; it’s just different sets of applications, whether it was more focused on aerospace or more focused on robotics. At one point, I wanted to work for NASA and do information fusion from satellites. If you think about the math, there’s a lot of similarity between, like, “How do you point satellites and fuse information between them?” and “How do you collect sensory data from patients and fuse those together?” I feel like there’s been a throughline about probabilistic reasoning and probabilistic methods and decision making under uncertainty.

FM: Your niche in AI is its applications to health and making it useful for physicians in interpreting different disease symptoms and phenotypes. How’d you carve out that niche?

FDV: My Ph.D. was not applied to any particular problem. I started out as a roboticist, and then really shifted into probabilistic methods and reinforcement learning.

And then I reached a point toward the end of my Ph.D. where I was like, I love what I’m doing. But if I’m retiring at some later point, and I’m looking back on my life, would I be happy if I just did what I am currently doing? I was like, “I need something that touches real life in a more concrete way.” At the time, I talked to 40 different labs across areas that I cared about: health, education, energy and climate, developing world and social good, and these sorts of things. And it just happened to be that a health-related postdoc just fit in terms of methods and stuff. So that’s the area that I’ve been working on for the last decade or so.

FM: The field must have changed so much since you started. Now, AI and large language models are embedded in our lives through ChatGPT and generative AI. Did you expect that sort of change? And how has that shaped your work over time?

FDV: I think it’s important that we pay a lot of close attention to those models, because they have capabilities that are just so close to human life — but not. There’s a lot that can go wrong in that space. I also worry that there’s all these “boring AIs,” like the ones that are making your credit decisions, and the ones that are deciding different types of criminal justice decisions,housing benefits, deciding whether something is fraud or not. They’re spaces that are becoming increasingly automated with these “boring” AI tools. I think there’s such an impact that that can have, for good or worse, on humanity if you don’t do those right. Done right, these things can be really helpful. They can triage easy situations and help get consistency. But done poorly, they can lock out marginalized groups from access.

FM: Moving forward, what sorts of regulations would you want to see to address these limitations in these AI models that might not be perfect, but can be used as the basis for critical decisions, especially for diverse populations?

FDV: A lot of industries already have rules about what’s okay and what’s not okay. The big challenge currently is, how do you link either AI-specific checklists and regulations — which are pretty good — or sector specific regulations to particular use cases? Because the regulation will still say something like “It has to be fair or unbiased,” or “Data must be representative of the population that is going to be applied on.”

How do you measure that? How do you know that something is “fair” and “unbiased” or that the data are “representative”? I think that’s really the big gap currently.

FM: You direct the Data to Actionable Knowledge group in the Computer Science department, and one of the goals I’ve read is to untangle these complex, heterogeneous datasets using AI models. Could you give an example of what that looks like in your group?

FDV: I’ll give you one concrete example that I’m really excited about at the moment. When you look at health data, one common way of analyzing it is that you get a batch of data from a hospital.

We first focused on finding places where doctors disagree. So, try to identify patients who are in very similar conditions that ended up with different treatments. That simplifies two things. First of all, data are super messy, but if you really focus on identifying those areas of disagreement, you end up with relatively few. With the ICU critical care database, once we did this analysis, we found that there were only on the order of 15 types of situations in which there was disagreement.

That was great to present to the doctors and be like, “Here are places where you and your colleagues don’t do the same thing.” First, that was very interpretable. It was something that they could engage with. And secondly, now we can do some optimization and say, “What happened to the people who took branch one versus the people who took branch two?”

FM: A common issue that comes up with AI models is the issue of the “black box” in these models — not knowing necessarily what’s happening inside. I’m curious about your thoughts on how important it is to ensure we know exactly what the model is like doing?

FDV: This raises the question of, “What should the system even be doing?” One thing that I’ve realized over the years is that, when I started, a lot of the effort was around, “Which treatment for which patient?” But as I’ve done more work, I’m realizing there’s other things that AI can do.

Rather than presenting one option, you present the expected pros and the expected cons of three options or five options. With the AI tools, if you can make one prediction, you can make three predictions. It’s all the same math. Then someone can look at that. You’re in a very different brain state when you’re thinking about choosing between options versus “I’m telling you what to do.” Because if you get told what to do, you might just do it because the AI seems very authoritative. You might also just do it because of liability, because if you don’t do it, you’re going to have to explain to your employer if something goes wrong. But if you get a couple of options and some pros and cons, then it invites you to engage with the system.

Human doctors talk to each other about their decisions and their rationale. If we can figure out what information we need to present such that the other person can feel confident in trusting us or not trusting us, I feel like you could do the same thing with AI.

FM: I’m curious how you communicate that with doctors. How do you communicate these best practices to the people who will be using them, especially since they’re busy?

FDV: I don’t think you can just tell people to behave. You need to design the system in a way that invites engagement. For example, we built a prototype of a system that allows you to toggle conditions on and off. You start out with, populated from the health record, “Here are the conditions I think you have.” But because it’s toggleable, you can change it.

So let’s suppose you have some sleep drama, but it’s never been diagnosed as insomnia. You can toggle on the insomnia because you don’t want a medication that’s going to mess with your sleep.

If you make the system interactive, then people are more likely to engage with it.

FM: There’s this concept you examine in your work called “counterfactual explanations” — how a model’s output would change if we change the inputs even slightly. If these recommendations are so dependent on these inputs and they would be so sensitive to change, how do we make sure that whatever we’re putting into the system is as accurate as possible, especially with messy data like patient records?

When it comes to noise in the data, there’s a number of regularization methods and probabilistic methods that can handle that.

One clean, mathematical way is that you assume there’s a true value, and then you assume that the measurement is some noisy version of the true value, and then you assume that the true value affects some outcome. I think that element is relatively easy to handle.

The things that are harder to handle are systematic elements. For example, one hospital might measure certain things for everyone who comes into the emergency room. Another hospital might measure only for people that they believe have a certain condition or are at risk for a certain thing. Now, suddenly the missingness of the measurement becomes informative. If you didn’t measure something about someone, it means that the doctor wasn’t concerned. Those sorts of things are actually much harder to deal with. There many times there’s meaning in the data that’s not explicit and needs a human to help interpret.

FM: That reminds me of another paper that your group put out on the importance of context in AI models, arguing that we cannot use a generalizable model for everything because context is so important. How do we think about the implementation of AI models in a way that still incorporates context into it?

I think we just really have to get comfortable with the fact that AI systems need to be constantly monitored. It’s not like a device that you just buy once — you need to do an annual inspection. You need to monitor the system, and you also need to keep tuning the system to the new data.

There’s great work on trying to figure out how to fine tune or adapt models. What I see as a major bottleneck is just that hospitals are not designed for this type of object.

FM: How can hospitals prepare for the increased use of AI?

FDV: Hospitals are really complicated systems. They’re also businesses, and they need to not only worry about their bottom line, but also about compliance and implications of noncompliance because that can be super costly, both in terms of fines and reputational loss. I think that there’s just a culture that does not encourage experimentation and change in a safe way in the U.S.

The U.S. regulatory environment and payment structure just makes it really hard to make these changes.

FM: On the other hand, for the people making these models and wanting to implement them in the clinics, do you feel like they’re making enough progress, and they are aware of these limitations in AI?

FDV: I’m still working on these retrospective data analyses, but I’m actually pivoting the lab also into mobile health and wearables. It’s more direct-to-person because there, you don’t have all of these administrative structures that are going to keep you from getting to a user. I’m trying to find that balance of having some stuff that’s very clinical, analyzing angiograms and stuff like that, but also having stuff that is more in the wellness space, personal development space.

There’s an interesting uncertainty and reinforcement learning problem to try to help you figure out what works best for your own nerves or whatever. We’re really in the early days, but I am trying to diversify my portfolio of research to include some of those things that are more direct-to-user.

FM: What are some examples of these wearables? Is it like a healthwatch or something?

FDV: We have a couple of Garmins that we got donated to us, and we’re working with LabFront, which helps do a lot of the low-level data management. And we can get pretty good data off of those devices. We’re in really early stages of just doing experiments, but we found interesting things.

For example, people who self-report as introverts versus extroverts have very different physiologic signals during social interactions.

I think it’s just really fascinating that you see some of these differences.

FM: I noticed that your logo for Data for Actionable Knowledge is a unicorn with rainbows, and you call yourself a unicorn aficionado. What’s so special about unicorns?

FDV: To be clear, a grad student in my lab wrote that I’m a unicorn aficionado, not me. I feel like it’s just taken on a life of its own.

FM: How did it become a mascot?

FDV: My email inbox is a disaster. So I had a “If you read this on my website, put a ‘purple unicorn’ in the subject line.” That was just picked really off the cuff, like I asked the kids, “Name a random word.” There wasn’t a ton of intentionality behind it. Being a unicorn aficionado might not have been the reason why the unicorn became a logo. But, now it’s a logo, and I’m fully for it.

FM: I also saw that you were writing your first novel, “Lines in the Sand,” and you minored in creative writing in college. Could you talk a little bit about the process of drafting a novel?

FDV: I always really enjoyed writing. Since I was a kid, I would put books together.

This is the third or fourth novel — the first one that I think is publishable. I play and Game Master for role-playing games, and I find that that gives me a lot of ideas because you spend that much time being in a character’s head or planning thoughts and seeing stuff happen. So this particular one initially did get its inspiration from that sort of thing.

FM: You also make music videos of your work on your YouTube channel Bayesonce. And I just noticed you have an overall very fun and very playful approach to science. Why do you feel like these creative forms of expressing science are important?

FDV: I’ve been a Weird Al fan for forever, and I just like the art of parody.

There’s people who like to do their math, and that’s great. But for me, I’m much broader. I like reading literature and talking about it; I enjoy listening to different types of music or going to theater performances. There’s so many aspects of myself, and I feel like it’s fun to be able to bring those together. Our society encourages us to fit into buckets. You’re a STEM person, or you’re a theater kid, or you’re an analyst, right? I think most of us probably don’t fit neatly into just one bucket.

When it comes to the music videos, I’ve always been a huge fan, and we had a grad student visit day, and the lab actually was like “Why don’t we do a music video?” Someone was like, “Oh, I can do the filming,” and “I can do the dance.” And I will write the lyrics.

— Associate Magazine Editor John Lin can be reached at john.lin@thecrimson.com.