It often feels like machine learning experts are running around with a hammer, looking at everything as a potential nail - they have a system that does cool things and is fun to work on, and they go in search of things to use it for. But what if we flip that around and start by working with people in various fields - education, health, or economics, for example - to clearly define societal problems, and then design algorithms providing useful steps to solve them?
Rediet Abebe, a researcher and professor of computer science at UC Berkeley, spends a lot of time thinking about how machine learning functions in the real world, and working to make the results of machine learning processes more actionable and more equitable.
Abebe joins EFF's Cindy Cohn and Danny O’Brien to discuss how we redefine the machine learning pipeline - from creating a more diverse pool of computer scientists to rethinking how we apply this tech for the betterment of society’s most marginalized and vulnerable - to make real, positive change in people’s lives.
This episode is also available on the Internet Archive.
In this episode you’ll learn about:
- The historical problems with the official U.S. poverty measurement
- How machine learning can (and can’t) lead to more just verdicts in our criminal courts
- How equitable data sharing practices could help nations and cultures around the world
- Reconsidering machine learning’s variables to maximize for goals other than commercial profit
Rediet Abebe is an Assistant Professor of Computer Science at the University of California, Berkeley, a Junior Fellow at the Harvard Society of Fellows, and a 2022 Andrew Carnegie Fellow who conducts research in the fields of algorithms and artificial intelligence with a focus on inequality and distributive justice concerns. She serves on the Executive Committee for the ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization and was a Program Co-Chair for the inaugural conference. She also co-founded and co-organized the related MD4SG (Mechanism Design for Social Good) research initiative as well as the non-profit organization Black in AI, where she sits on the Board of Directors and co-leads the Academic Program. She earned a Bachelor’s degree in Mathematics and a Master’s degree in Computer Science from Harvard University, a Master of Advanced Studies in Mathematics from the University of Cambridge, and a Ph.D. in Computer Science from Cornell University.
Music for How to Fix the Internet was created for us by Reed Mathis and Nat Keefe of BeatMower.
This podcast is licensed Creative Commons Attribution 4.0 International, and includes the following music licensed Creative Commons Attribution 3.0 Unported by their creators:
- Probably Shouldn't by J.Lang
- Klaus by Skill_Borrower
- commonGround by airtone
- Smokey Eyes by Stefan Kartenberg
- Chrome Cactus by Martijn de Boer (NiGiD)
Machine Learning and AI:
- Artificial Intelligence & Machine Learning
- Bias in a Feedback Loop: Fueling Algorithmic Injustice, CCCB
- Machine Bias, ProPublica
- Math Can’t Solve Everything: Questions We Need To Be Asking Before Deciding an Algorithm is the Answer
- Victories and Disappointments in AI and Algorithmic Decision-Making: 2019 Year in Review
Transparency and Inclusivity in AI:
- Computer says no: why making Ais fair, accountable and transparent is crucial, The Guardian
- Help EFF Track the Progress of AI and Machine Learning
Probabilistic Genome Cases:
- How Your DNA-or Someone Else’s-Can Send You to Jail
- EFF and ACLU Tell Federal Court that Forensic Software Source Code Must Be Disclosed
- Shining a Light on Black Box Technology Used to Send People to Jail: 2021 Year in Review
Personal Ownership of Your Data:
- Privacy Without Monopoly: Data Protection and Interoperability
- New Attack on the Illinois Biometric Privacy Act
- How to Improve the California Consumer Privacy Act of 2018
- EFF’s Recommendations for Consumer Data Privacy Laws
Rediet: It's about how we build these tools, it's about how we conceptualize our role within society and within these different types of spaces, like the criminal legal space, and saying, what do we think is the thing that we would like to see? What should our society look like? What would a just system look like? What would a more equitable community look like, in anything, in academic spaces and research spaces and just broadly, and say, what will we need to do to get there? And sometimes what we need to do is to step back and say, I think that I'm not going to be helpful, and so I'm going to not try to center myself. Sometimes it's building things that will challenge things that we've already built, or that will challenge other communities. And other times it's just being good community members and just showing up. It could literally save lives.
That's our guest Rediet Abebe. Rediet is a researcher and professor of computer science at UC Berkeley, where she's working on how to make the results of machine learning processes more actionable and more equitable.
Danny: She's also the co-founder of Black in AI and MD4SG, that's Mechanism Design for Social Good. She's going to shine a light on the way forward.
Cindy: I'm Cindy Cohn, EFF's Executive Director.
Danny: And I'm Danny O'Brien. Welcome to How to Fix the Internet. A podcast of the Electronic Frontier Foundation.
Cindy: Welcome Rediet.
Rediet: Thank you for having me.
Rediet, you spent a lot of time thinking about how machine learning works in the real world and how we can make sure that we're using it for good purposes and not bad. Could you give us some of the examples of some of the good uses of machine learning that you've identified and then we can unpack what goes into that?
Rediet: Decisions that you make, as let's say, a researcher in how you understand social problems, and how you choose to formulate them as machine learning problems or algorithm problems, and how you then choose to work on it, and whatever outcome you have, how you choose to deploy that.
And so, the same set of techniques could be used to do a lot of good or a lot of harm. A lot of my research these days is now really focused on taking the machine learning pipeline that we traditionally have within this community and expanding it, also taking responsibility for stages of that pipeline that are considered often outside of our scope of responsibility, which include translating social problems into the types of research problems that we would typically work with, machine learning problems. But then also taking the output, and thinking critically about how that gets translated into practice, into some intervention, what impact it might have on communities. And so, a lot of the work that I'm doing is critically examining that from both a technical perspective, but just also from a practice perspective, and saying that we should be expanding the pipeline.
Cindy: We're fixing the internet now, we can certainly talk about the problems, but I'd like to start with the things where things go. If we did it right, or where have we done it right?
Rediet: We have this large scale project that involves examining the use of machine learning in education. So, rather than diving into build tools, which of course, we're also working on, we also wanted to do a survey, a holistic survey and analysis of how machine learning is used in education. And we identified education experts, and we sat with them and we talked through the papers with them. We were like, "This paper's goal is to let's say, predict student dropout. And the goal there would be not just to predict it, but also to hopefully do something about it so that students are not dropping out of schools."
And so, that's an admirable goal, that's a goal that all of us can get behind, but that problem has to be concretely formulated into some machine learning problem, and then you have to find the right data set and all that, and then now you have this prediction that you're doing around which students are going to drop out, and hopefully you get to translate that into some real world intervention.
So, just taking this student dropout risk example, we had this interview with someone at a large state university that also struggles a lot with student dropout. One of the things that they mentioned was "Okay, this paper is predicting which students are likely to drop out "What can we do with that? You tell me that some student is at high risk of dropping out, that's terrible. But in some sense, the deed is done." At that point you tell a student, "Hey, you're at high risk of dropping out." That's not necessarily useful to the student.
And you tell the schools, the student's dropping out, it doesn't necessarily tell them what can you do about it? And so, what he said was something subtle, but I really appreciated. He said, "Instead of predicting what students are going to drop out, why don't you predict, for instance which students are likely to miss class, they were already missed some number of classes, and maybe they're about to miss a third class or something.” We know that if students are missing several classes, that's a sign that they might be at risk of dropping out. But missing classes is a more actionable thing. We can tell students, 'Hey I've noticed that you've been missing a lot of classes. I'm worried that you're going to miss more classes. What can we do here to support you to attend class.’
The point here is more subtle. It's saying you have your target variable, that target variable could be dropout, or it could be something actionable, like missing classes. And the second thing is something that they can more easily do something about the latter, not as clear.
Danny: The giant machine comes up with a bleeps and says, this person is more likely to drop out, but it doesn't give you strong clues about how it came to that decision. And in many ways, I mean, this is an obvious example, I guess, but if kids are missing school, and that's something that system has learned is an early indicator of that, it's better to know that what it's learned than to have it as this opaque machine that just claims that these people are doomed.
Rediet: If I tell you, for instance, that students' race is a partial explainer for dropout, there's nothing you can do about that. That's a fixed thing. Whereas student missing classes being an issue, maybe there is something you can do about it. It showed me in some sense that if we were working on these problems in a more bottom up way, you go to a university that has a lot of dropout issues, you talk to educators there and other staff supporting students, and you get a sense of what it is that they notice in the students. In some sense, they can just give you the problem, or you can collaboratively, participatory form the problem, rather than as machine learning people, deciding what we want to solve. And then after we've solved it, hoping that's what they wanted us to solve.
Danny: To what degree do you think that this is a problem that's located in machine learning and the nature of, I know, computer science or this part of computer science, and how much is it just related to the societal fact that people doing research or building these systems are often quite distant and quite uninvolved with people
Rediet: A lot of research communities with this gap that exists, for instance, you can take health. This is one space where we've made a lot of advances in improving health outcomes for people, but not for everyone. So, we've made very few advances for health issues that impact black people, trans individuals, you name it. Like marginalized communities. And so, this gap definitely exists in other spaces.
But there is also something special to the machine learning AI space as well. It is a powerful set of technologies that are being built. There are a lot of resources that we have in machine learning and AI that is in some ways really unprecedented. There's a lot of organizations that are invested in it. And the other thing is also, the field is one of the least diverse fields out there.
I mean, that's just the fact. You can look at graduation patterns in undergraduate degrees, in graduate degrees. You can look at the faculty composition. We have, I think in computer science, in the U.S, among the PhD granting institutions, there's something like over 5,000 faculty, and of those fewer than a hundred of them are black, and something like fewer than 20 of them are black women. That's just one example. You look at native American, it's like in the single digits.
And so, it is a very homogenous community. As it stands right now, things are slightly improving. There's a really, really long way to go. And so, as long as our field, our research field continues to not be representative of the society that we're trying to impact, you're going to continue to have these gaps and these gaps are going to show up in our papers. And sometimes they show up in more subtle ways and other times they show up in not so subtle ways. And so, I think these issues around thinking about the community culture and who gets to be part of the community is really tightly integrated and intertwined with our research itself.
Cindy: I love this example, I’m wondering is there another one? Are there other places where you see how we could do it better in ways that are actually either happening or within the pipeline?
Rediet: Yes. Right now we're actually focused a lot on understanding policy, public policy, and where things might be going wrong. So, just to give you one concrete example right now, a lot of decisions around which services are going to be allocated to what individuals are made based off of measurements like the official poverty measurement in the U.S. And this official poverty measurement was something that was originally proposed by an economist, Mollie Orshansky, back in the sixties. So, this was over 50 years ago. It was really meant to be almost a proof of concept, a placeholder, not meant to be something that generally applied. And she even explicitly says, we should not be generally applying it. And we're using it almost unchanged, outside of maybe inflation 50 years later.
Rediet: As time has gone on things are getting harder and harder for a lot of communities. And so, there are many people who are by any reasonable definition, struggling, but this official poverty measurement might not necessarily pick up. So, you end up in situations where someone is really in need of services and you use this very coarse, extremely outdated measurement, and you can't necessarily detect it in that way. One of the things that we are looking at is that, there's a lot of evidence that predicting bad life outcomes, like let's say poverty, is hard. And we're seeing that, okay, part of the reason might be that the way that we measure poverty itself is very ill defined. It's extremely outdated.
If you have an official poverty measurement that is so low, that it can't actually even reliably detect a lot of people were struggling, then does it actually matter what you're predicting here? And so, in some ways we're using these machine learning techniques and these kinds of results, like predicting life outcomes is hard, to really challenge public policy, and to say, Hey, the way that we're measuring this thing is actually not good, we think.
Danny: In situations like that, how do you know that you are doing better? Do you anticipate a way that people could learn to change that and challenge that in the way you are challenging these earlier measurements?
Rediet: We all are feeding in value into these systems, there's no neutral value. And so, in some sense what we're doing here is to say, the knee jerk machine learning researcher reaction here may have been, you get some data, you're excited about the data, you're like, what can I predict here? And one of the things that you can predict is poverty. You Are like, great. I'm going to try to predict poverty. And of course that assumes a lot of things. Like as I mentioned, that the way we're measuring poverty is accurate or reasonable or useful. And that's a huge, huge assumption that you're making there. And so, what we did here is to challenge that in many ways. So, the first thing is rather than just taking things as given, we were like, okay, why is this a measure of poverty? Let's go back to the literature in the public policy and sociology and economic space and understand what conversations are happening there.
And there, you notice that there is actually very robust conversation happening around how we should be measuring poverty. There's alternatives that have been proposed, like the Supplemental Poverty Measurement, which captures a whole host of other things, like material hardship that you might be facing; are you able to pay your utilities? Are you able to pay your rent, that stuff. This is not something that you can necessarily always capture using people's income.
And so, these are conversations that if you stayed within the more traditional machine learning space, you might not necessarily be exposed to unless you happen to be dealing with it as a person yourself. And so, we're challenging that by saying, hey, listen we don't have to take things as given. We can go back and see what the debates are that are, happening out there, out in other fields, out in communities, out in policy spaces, and see how we can potentially contribute to that.
Cindy: It does feel sometimes like people who love machine learning are running around with this hammer, trying to make everything into a nail. "I've got this cool system, it's great, it can do stuff that feels like magic. Plus it's fun for me to work on. So, let me start looking around for things that I can use it for." And what I hear you saying is we really need to flip that around. We need to start with the people who are on the ground, what their problems are, and then make sure that the things that we're doing are actually giving them actionable steps.
Cindy: In addition to making sure that the things we're using machine learning and algorithmic training techniques on are the good things, I think that there's all a strain of concern, that there are things that we shouldn't be using these systems on at all. I know you've thought about that as well.
Rediet: We have this large scale project we're working on, focused on statistical software used in the criminal legal system,it's used at pretrial, investigation, prosecution, post-trial. And we've been especially interested in machine learning as physical software used as evidence. So, these are cases is where you could have something like a Probabilistic Genotyping Software that could be used on samples that are found from crime scenes and used to convict people. We really need to have a serious conversation on the machine learning side, about what tools should even be used as evidence, because that's a really high bar.
Danny: How does that happen? What's the process by which a machine learning approach or a system is put in place, and that leads to the wrongful conviction of a person?
Rediet: It could happen in several ways. So, one is just, even before you're brought to trial, you're not brought to trial just randomly. You don't pick up a random person off the street and you're like, "You're on trial today." There's a lot of things that happen. And a lot of statistical and machine learning tools that are used up until that point to "identify the person", I'm putting that in quotations, to make all sorts of decisions. There's also just the fact that in the U.S, we have incredible inequality and discrimination that surfaces in many different ways, including in what we criminalize. So, I think that's also important context to keep in mind. But what we're focused on in this project is specifically in software used as evidence,
So, this is someone's on trial for some something, and there's just a set of evidence that I now get to present as a prosecution team and say this is why I think this person may have committed this crime, and that's used in discussions, and to make a decision around whether you think that and committed the crime or not. And sometimes that can be like a person I could say, "Oh, I know I totally saw Danny walking down the street at this hour" or something, and you get to question me, cross examine me. And say is, "Your eyesight good? Have you updated your prescription?" Any number of things.
And so, what happens is that I, as a prosecutor get to just like use some software that has not been properly validated, that it's now spitting some number, and we have to use that as evidence. And the defense might not necessarily have the ability to cross examine that. And I get to bring witnesses, like the CEO of the organization to say, "Oh, the software is actually awesome. Let me tell you bunch of things."
Danny: It has the force of some scientific fact. So, someone will stand up and say, this shows that there's a one in a million chance that this isn't the defendant, and people believe that, but they can't challenge it. And the defense doesn't have the technical expertise to say, "Yes, but if his sister also touched it, wouldn't that change the thing?" I see.
Danny: “How to Fix the Internet” is supported by The Alfred P. Sloan Foundation’s Program in Public Understanding of Science. Enriching people’s lives through a keener appreciation of our increasingly technological world and portraying the complex humanity of scientists, engineers, and mathematicians.
Cindy: EFF has done a bunch of these probabilistic genome cases and working with people in this. And so, the there's a threshold level problem, that often the company will claim a trade secret in how the technology works, which means that defense doesn't have access to it at all. The second thing is that, they will often point to papers about how great their system is, that were written just by them. we've managed in a couple of cases to get access to the actual systems and have found horrific problems in them, that they're not actually very good, and that there's a finger on the scale on things. And sometimes it's just that it's poorly designed, not that anybody has evil intent. There's supposed to be a standard in the law called the Daubert Standard that make sure that technologies that are introduced in cases have been vetted. And honestly, it's not being followed at the level of rigor that is needed right now.
Rediet: This is precisely what we are, what we're working on. And this is a joint project with a lot of amazing people, including Angela Zhen and John Miller who are graduate students here at Berkeley, and Rebecca Wexler, who was an intern at EFF, and is a close collaborator of mine, and Ludwig Schmidt, and Moritz Hart also. So, what we're working on here is precisely what you mentioned. You have this Daubert Standard that would be followed in a lot of different cases, but in statistical and the use of statistical software used as evidence, I think right now, it's just not happening, I think. And it in the same way that, if someone was a witness and goes up to the stand and says a bunch of things, and you're able to cross examine them, you should also have that ability with the statistical software. And currently, we don't actually have a framework for doing that.
The legal system the U.S is meant to have these two sides, one of which is meant to prove innocence and the other is meant to prove guilt. And the idea is that, in a situation where these two sides are equally resourced and all that stuff, then maybe you're able to go about this truth seeking process in a more reliable way. But that's not what we have right now. There's an massive, massive imbalance. You have defense counsel who don't have time, who don't have resources, who don't have the energy to be able to challenge whatever the prosecution puts on the table. And then what next? This is completely unjust. And you end up having a lot of people wrongfully convicted for crime they didn't commit, and we're not doing anything about it.
Cindy: I think it's tremendously important. And I think it's also helpful to judges. I think judges often feel like-
Cindy: ... they are skeptical, or at least they want to be sure that they're relying on good evidence, but they don't have the tools to know what they don't know.
Cindy: So, I think that this is great for the defense, but it's also, I think, going to be helpful to the courts in general-
Cindy: ... to be able to have a way to think about I'm being presented this information, how much rely, how should I-
Rediet: There's just a lot of different ways we could be involved. So, one of the things that is happening in this organization called PD Query, it's by Dana Yow, who's a law student here at Berkeley. I think she's graduated now. So, PD Query matches public defenders with graduate students with technical training to do work that's maybe even more straightforward for us, but could be really, really useful.
But you could really imagine scaling this up and doing something even bigger. So, one of the things that we could do now is to write cross examining questions. that helps the defense side. You could write affidavits. You could write maybe even more generic way to file affidavits that could be signed by experts in their respective areas. So, you could have, for the Probabilistic Genotyping Software, you could have a computer science professor and someone in population genetics saying, "Hey this is what we understand about the tools, and these are the concerns that we have about them. Please proceed with caution if you, it, in these types of situations." I think there's just so much the machine community could do, both in this more as like community members saying we have expertise, so we can just try to help people. These are things that will literally save lives. I mean, people are put on death row for this. They will literally save communities and families and avoid wrongful convictions and challenge our unjust system as it exists.
Cindy: It hurts a lot of people, it hurts the defendants, but it also hurts the other people who might be hurt by the person who really needed to go to jail, who didn't go to jail. So, the whole endeavor of criminal justice is aided by truth, and that's what these interventions are.
Rediet: And may I add one more thing? Can I just add one more thing? I think one other common misconception, I'm just assuming I have the ears of some people and I'm like really trying to pitch something. But one more thing I want to mention is, I think an unspoken, sometimes I guess it is spoken assumption, is that when you're working in this area, that it's technically less challenging. And let me tell you, I have never been more challenged than I have been in the past two years technically, honestly. When I started graduate school and I was doing stuff that was a lot more standard because I was like, "I want to get an academic job. So, I'm just not going to rock the boat." So, when I was doing stuff that was more standard, it was just, I mean, it was fun, but it was a lot more simple. This is a space that is challenging in so many different ways.
Danny: I can see how you're getting involved in connecting these dots is through creating your own institutions and organizations. And so, let's talk a little bit about Black in AI and also mechanism design for social good.
Rediet: Mechanism design for social good was started as a graduate student reading group in 2016. It was eight of us. There was a very small group of people. And we were just trying to read stuff and understand where we could be useful with the set of techniques that we had. And now, fast forward, I won't tell you the whole story, but fast forward to now, it's this much broader community. It's thousands of people in hundreds of different countries and institutions, and the scope is also broader.
Danny: Has the organization or your perceptions of the problems changed as it's become more global?
Rediet: There’s so many instances of this. So, one of the things that came out of this community is this larger project we have on data practices, primarily in Africa. Here in the U.S, there's obvious issues in data sharing, but we've gotten a lot of things under control.
If you're thinking in the, let's say Ethiopian context where I'm from, I've seen situations where data has been shared, that really should not be shared. There was no data sharing agreement. It included people's personally identifying information and it just passed around. A lot of data generated in the continent ends up to the benefit of those based outside of the continent, and especially those with no connections with the continent. And so, you have situations where literally those who collected the data and whose data is being collected don't have ownership rights over the data, and are not even able to buy it like everyone else would. You have situations like right now we're focused on open data, which of course, open software, but data, this is stuff that we get excited about, but there's some irony here. When you make something open, that's not necessarily equitable. It favors people like myself, who have the ability to go and work with that data, to work with the complexities of the data, who have a lot of compute power. And so, that's not actually equitable either. So you could still end up in a situation where open data of communities that are not given the resources could continue to just shut them out. So, it's not necessarily open in the sense that we would like to think that it is. So, lots of things that you could miss here that really shape how I understand the data economy in various parts of the world.
Danny: This is something that I spent some time tracking at EFF, because it was a huge concern because what I felt was that essentially that there were some countries and some regions that were being used as Guinea pigs for both data collection and data application.
Rediet: It's very disappointing and super common. I mean, it's incredibly common. And the common pattern that you observe here is that, it's, again, this gap that I mentioned to you, between those who are building something, doing research, whatever it is, and the people actually affected by it. Because they just talk to people, and you talk to people in such a way that they feel empowered, they'll let you know what's going to work and what's not going to work. And there's just so many things that you could get wrong. It's not just even the data collection. Even if we were to agree on the data set, there's even concepts like privacy, for instance. That mean different things in different parts of the world.
Cindy: Give us a picture what, if we, what if we got this all right?
Rediet: Shut off the internet for a bit, let's all take a break and just come back in a year.
Cindy: We need to reboot it.
Danny: Just close it down for repairs.
Cindy: Let's say that. I mean, because I think that you've identified so many important pieces of a better world. Like this looking at the whole pipeline of things, looking, talking, the people who are impacted, making sure the people are impacted just get told what's happening. And so, let's say we got all those things right, what values would we be embracing? What would be different about the world?
Rediet: So, I don't have the answers, which is good. I don't think anyone has the answers, but I have some things that I would start with. So, one of the things is that right now, a lot of what happens on the internet is profit driven. In many ways you can see how this primary goal being profit, maximization, being at the root of a lot of things that are going wrong. And so, that's one thing that's going wrong. And so, that of course brings up questions around like monopolies, it brings up questions around what kinds of regulations should we be putting in place?
These are not technical questions, but these are incredibly important questions that if you don't resolve, there's nothing that we can do here that will create a long lasting impact. So, that's one thing. And then instead, you have to decide, okay, what are the other values that we should be putting in? What are the things that we should be potentially maximizing for? And there, again, I don't think there's going to be a universal answer. I think that we have to embrace that different communities will need different things. And so, you have to figure out a way to move to more decentralized framework, where like one single entity does not get to impact billions of people in the way that it is now.
Unless we are able to figure out a way in which people are empowered, everyone is empowered, and especially those who are marginalized, because when you're marginalized, it's not like we're all starting from the same place. It's that like the marginalized person is more things have gone wrong for them. That is what that means. And so, we really have to focus on those communities and ways to empower those communities. And so, we really need to think about how we can create that first. And then from there, good things will happen.
Cindy: It's important to note when things are getting better, but I don't think you have to say that things are getting better in order to try to envision a place where they are better. Too many people who I talk to, really think that we're, there's no better vision at all. And so, we want to give that better vision, because I don't think you can build a better world unless you can envision a better world. And we can be frank and brutal that we're not even headed toward that in some ways. We're headed against it in some places. And I think that's absolutely true why we celebrate our victories. We have to recognize that not everything is headed in the right direction.
Rediet: And I appreciate what you're saying here about just being able to articulate our vision for what a better world might be to ourselves, I think. And also, and to do so precisely, to be as precise as we can. And then also to one another, so we can have a conversation about it as well. So, I appreciate that.
Cindy: Thank you so much for coming. There’s such a rich conversation about how do we really rethink how we do machine learning and algorithmic decision making. And I thank you so much for taking the time to talk to us.
Rediet: Thank you so much, Cindy and Danny,
Cindy: Well, that was a fascinating conversation. I really appreciate how she thinks about making sure machine learning gives actionable information. That we go beyond just the, can we predict something? And I think she's right that sometimes we actually can't predict something when we think we can, but more importantly, that our predictions need to move into actionable information. And the difference between just telling an educator that a student's at risk for dropping out versus telling the educator you need to watch for how many times they miss the class. Those are two different inquiries. And recognizing the difference between the two of them is really important.
Danny: These are challenges about applying new academic or computer science techniques into the real world that have already existed. I love the story about the, as she was doing the investigation into how people measure poverty, that she realized that she found the embers of a 1960s research project that have been taken far too seriously. And even the researchers then, were trying to signal that this was tentative and not perfect. And again, it spreads out. I think that sometimes it's not the sin of machine learning, but the things that it reifies, or the data it sucks in. The stories of the hunger for data. Meaning that people stop considering about the privacy of data or the communities that they're drawing from, is an important lesson.
Cindy: She's deep into developing tools to help us get it right. Using machine learning in some instances to try to help alleviate the problems that machine learning is causing. And I think of the paper that they're writing about how to evaluate DNA evidence, developing the systems to help the courts and advocates figure out whether that thing is working well. So, in some ways we've got machine learning, interrogating other machine learning. And I think that's fabulous, that's how we end at the balance where we think we're getting things more right.
Danny: Rediet, as, I think, a key figure doing this pitch to the machine learning community themselves to do the work to make things better. It was flattering that she thought though we are one of the podcasts they listened to, but also nice to know that this isn't the boring work.
Cindy: It's not just the coding. It's not the time when you're training up the model or the model is spitting out the results. It's actually looking all the way from what problems are you trying to solve? Are you defining the problems in a way that is actionable? And then what happens on the other side? What data are you feeding in? But then what happens on the other side of the actionable things being spit out?
Are they able to be implemented? And then how does it fit into the whole story?I think that one of the things that she's helping do is breaking machine learning out of this silo that it's just about the tech, and also breaking the pattern where the people doing machine learning don't look like all of the people who are impacted by their doing. The Black in AI and the other systems that she's building, really trying to make sure that we grow the number of people who are using these systems and developing these systems to better match the way that they're being used in the rest of the world.
Danny: Well, thanks again to Rediet Abebe. Thank you for joining us on How to Fix the Internet. If you want to let us know what you think about this or any other episode, do please write to us at email@example.com. We do read every email. Music for How to Fix the Internet was created for us by Reed Mathis and Nat Keefe of Beat Mower.
This podcast is licensed creative commons attribution 4.0 international, and includes music licensed under the creative commons attribution 3.0, Unported license by their creators. You can find those creators' names and links to their music in our episode notes or on our website at eff.org/podcast.
How to Fix the Internet is supported by the Alfred P. Sloan foundation's program in public understanding of science and technology. I'm Danny O'Brien.
Cindy: And I'm Cindy Cohn.