
Kavita Ganesan - Ethical AI

Kavita Ganesan is the founder of Opinosis Analytics and the author of The Business Case for AI, with over 15 years of experience in the field of artificial intelligence.
This is the second conversation we've had on the podcast with Kavita; in the first one, we talked more generally about artificial intelligence and how can businesses can start leveraging it. This time, we focus on the question of ethics in AI, discussing topics such as bias propagation, plagiarism, responsible AI implementation & government regulation, and more.
Links & mentions:
Transcript
“Can we explain the predictions of the AI system? So, AI systems are like, kind of like a black box right now, so it comes up with a prediction, but you don't know why, but in some cases, this is very risky.”
Intro:
Welcome to the Agile Digital Transformation podcast, where we explore different aspects of digital transformation and digital experience with your host, Tim Butara, content and community manager at Agiledrop.
Tim Butara: Hello, everyone. Thank you for tuning in. Joining us today is our second returning guest, Kavita Ganesan, founder of Opinosis Analytics and author of The Business Case for AI. In our previous conversation, we briefly discussed ethics in the context of AI with Kavita, and we both agreed that this was definitely a topic which deserved an entire episode dedicated exclusively to it.
And so this is what we'll be covering today – ethical AI and how companies and businesses can enable more responsible implementations of AI. Welcome, Kavita. It's really great having you back on the show again.
Kavita Ganesan: Hey, Tim, nice to be back on your show. And thank you for inviting me back.
Tim Butara: Yeah, as we said in the first episode, we definitely needed to record at least one more episode, because ethical AI is such an important topic that I think nowadays you almost can’t discuss AI without at least mentioning ethical AI. But to kind of ground our discussion, the first question I need to ask you today is, what even is ethics in the context of AI?
Kavita Ganesan: Yeah, so that's a good question. So ethics really means are you doing AI responsibly? So which means are you using the right data to train your models? So is your data representative of the groups of things or people that you want your model to learn? Are you considering the short term and the long term risks of deploying specific AI systems?
So, for example, if you have an AI system that's trying to predict if somebody should get a loan or not; so, is your AI model being fair in those instances? So let's say you release a self-driving car that uses AI behind the scenes, and it's an out of whack car that has a poor quality AI system behind the scenes, and the risks are so high that it can kill pedestrians, it can kill the drivers. So are you considering those types of risk before deploying models?
And another thing is, can we explain the predictions of the AI system? So, AI systems are kind of like a black box right now, so it comes up with a prediction, but you don't know why. But in some cases, this is very risky. Like in the case of healthcare, it's predicting that this patient has a high risk of having lung cancer, but the doctors need to know why. Like, what's the evidence that this patient actually has a high risk of lung cancer?
So that's another aspect of AI ethics. And the last thing that I think that is not really being discussed as frequently is if AI is being used in the right context, because you can use AI as a sole decision maker or you can use AI as a second opinion. So how are you using AI? So in the case of healthcare, let's say you have cancer. Not you in particular, but a person has cancer, and the AI system is coming up with a treatment plan for you.
So can you really trust these treatment plan just based on the sole prediction of the AI system? So what if the treatment plan is not effective? Who is to blame? Is it the AI system? Is it the doctor or is it the hospital that decided to use the AI system? So I think the context in which AI is being used is a big aspect to AI ethics, which is not being discussed as often.
Tim Butara: That was an awesome intro. I think that we're definitely off to a strong start. And I think that to add to this about context, it’s also– we underscore this a lot during a lot of the conversations on our podcast, how the key to unlocking the biggest potential of AI is to kind of leverage it in combination with human capabilities. So I would add to – context, yes. And also kind of empowering humans to make better decisions through the use of AI.
Kavita Ganesan: Yeah, and I think that's the best use of AI at this point in time. So having humans in the loop, working with the AI behind the scenes.
Tim Butara: But do you think that can or should change, especially in the context of ethics?
Kavita Ganesan: In some applications, the mistakes from AI systems is very harmless, like in your Gmail spam classifier. So what if it doesn't classify a lot of the emails as spam? The end effect is not as harmful. But I think what's going to dictate how to use AI will be the application and the risks involved with those applications.
Tim Butara: So what would be some of the most frequent ethical issues when it comes to artificial intelligence?
Kavita Ganesan: Yeah. The most common is, as you may have heard, is bias propagation. And this is largely because of the underlying data. And there can be different reasons why the data causes bias to perpetuate. So, for example, if historically certain groups of people have been discriminated against, then this discrimination is sometimes recorded in data.
And this has happened with a recruiting tool where the tool learned to dismiss women candidates. And that's simply because the data showed that male candidates were preferred in the workforce as opposed to women. So it just learned what the data was telling it. Yeah. So that's one area which is very common, bias propagation.
Another area where I think that's not as discussed is plagiarism. The newer AI systems today, they learn from large amounts, like massive amounts of web data. And it has the capability to generate language like sentences, paragraphs, even whole articles. You may have heard of tools like Jarvis.ai, which helps you with copywriting. So because of that, it may spit out content that is identical to a source on the web, but you don't really know if the AI system has plagiarized this content or if it's really its own prediction of words. So it's hard to claim plagiarism, and it's hard for this AI system to even produce any type of attribution because it's just learned based on this bag of data.
Then another area, I think, common ethical issue when it comes to AI is the misuse of the AI system itself. So a few months ago, the Ukrainian state leader was portrayed as saying something he did not say, using a tool called DeepFakes, which is an AI tool that can portray people as saying something, different things, depends on how you train it.
So this is a case of a bad actor using the AI tool in an inappropriate way. So it's a basic misuse of technology, and it can happen in any context. Like you can use AI to hyper target objects in a wall, so you can use AI system to do really unethical things, and it is already happening, and it's happening quite often.
Tim Butara: Yeah, I think that we've definitely seen a lot of that. But one thing that really kind of spiked my interest in this answer was the part about plagiarism. And you focused mostly on copywriting. But what we've seen in the past few months, I think a real resurgence of, is the rise of AI generated visual art through, I don't know, through platforms like Dall-E or Dall-E, I guess it’s pronounced.
And what I've already encountered of artists complaining on Twitter how they would post some of their work and then somebody would comment and be like, oh, this is cool. Which kind of AI generator are you using for this? And it's like, this is my original art. And people are like, oh no, you're lying. This looks too much like the one generated by Dall-E, because Dall-E was literally trained on the art that was produced by real people.
Kavita Ganesan: Exactly. And I was also thinking that the rice bowl of this artist is now going to be affected by this AI systems because it's using their art to produce a combination of different art and these people are not being paid for it.
Tim Butara: And also, this was, like, when we started talking about concerns about AI taking over jobs, pretty much the only consensus was that, okay, at least artists are safe. At least people in creative positions are safe. But what we're seeing now seems to really contradict that.
Kavita Ganesan: Yes, people in creative positions are not safe anymore. Definitely.
Tim Butara: So in this light, how can we avoid these biases and prejudices that are propagated by AI systems?
Kavita Ganesan: So, when it comes to training AI models, you need to be very careful about the source of your data. So let's say you're coming up with the model and you know that the data comes from a source that has inherent biases in it. So let's say you're trying to predict– develop a recruiting tool, but you're using data from a tech company, which you know is predominantly male workers. And if you train on such data, you know that there could be some prejudice against women in that data.
So you want to be aware of the types of data you're using, the attributes in that data, and the potential biases from those sources. And you can also proactively eliminate certain attributes when training models. So let's say things like race, age, gender, and skin color’s often like very sensitive issues. So maybe you want to eliminate the use of those attributes when training models, so it learns from other things, like your skills, like basically anything else but these sensitive matters.
So you can proactively do things, but the best way to find biases is to rigorously evaluate your AI models. So once you have a version of your AI model that's ready for deployment before it goes anywhere into production, you want to subject it to different types of data that it may have never encountered before and see how it's failing. So is it, in this case, dismissing maybe younger candidates if it's a recruiting tool, or older candidates if it's a recruiting tool. So trying to really understand its behavior will help surface a lot of those biases as well, and then you can come up with ways to address those biases.
So let's say it's discriminating against older people. That means it makes a lot of mistakes on that group of people. So maybe you can enhance your data set to have more older people in the mix. So there are a lot of things you can do once you understand the model's behavior.
Tim Butara: But it wouldn't be too much of a stretch to say that I guess all data sets are biased in one way or another. How does bias even get ingrained in data so much?
Kavita Ganesan: Yeah, so I think every data set has its own biases. For example, some data sets are so small that they may only represent certain groups of people. And we see this commonly in facial recognition systems, where the data is predominantly white, male, young subjects, and there are fewer kids, there are not as many women, not as many old people, and people of color. So the AI system tends to make a lot of mistakes on these minority groups. So you think that the AI system is being biased, but really the data is not representative of the groups. And this might just be because the data is too small. So that's one way bias happens.
And another way that bias happens is the historical and societal biases, which gets recorded in data over time. So if specific groups of people have been denied loans for some reason or the other, maybe because of our own prejudices, this is actually recorded in data. And if you try to use the data just right off the bat, without any analysis, and train a model for loan prediction, like who should get a loan and who should not get a loan, then you're going to perpetuate the same biases because it's already ingrained in data.
So datasets can get biased in a lot of different ways. And also if your data stores operate in silos in your company, so you may use data from a dataset from maybe branch A, but not from branch B. So your dataset itself is not representative. So you have automatically introduced bias in your data right there. So bias seeps in in a lot of different ways and it's just perpetuated through these algorithms.
Tim Butara: Maybe one of the most kind of complicated, maybe not one of the worst things about it, but one of the most complicated things about it is that from what I understand about all this is that more often than not, actually, bias is not intentional or it's not done with negative intentions. As you said, it can just be the result of a small sample size or the result of the specifics of a particular field, context.
But then that does have negative connotations and negative impact throughout history if it gets ingrained in culture, in society. So it's really interesting how something, I guess we can revisit or kind of use the popular quote about the road to hell being paved with good intentions and how even if something isn't intended as bad or maybe has good intentions, it can lead to negative consequences for a lot of people, actually.
Kavita Ganesan: Yeah, that's absolutely right. So we are not trying to be biased, but because the bias is present in data, you are inherently perpetuating that bias. But there are like small cases where when we are developing algorithms, our biases can seep into that algorithm.
So maybe we think that certain attributes are good predictors of something, like maybe skin color. You may inadvertently use that because of your own belief, but that is rare compared to what's present in data. So a lot of times we are just using data as is and not paying attention to where it's coming from, what are the biases within the data. So that's the main cause.
Tim Butara: I think data collection is actually also a very important aspect of ethical AI. And I mean, it's kind of twofold. It's not just what kind of data it collects so as to kind of eliminate bias, but also, I'd say also how that data gets collected. Right. Because of all the privacy regulations and all the talk around privacy, I think that this is kind of twofold. Data can be used in nefarious ways, but it can also be, before that, it can be collected in kind of illegal or shady ways.
Kavita Ganesan: Correct. Yeah, that's right. And if you collect data without the consent of the users and then you use all attributes that the user may not want. So there are problems around that.
Tim Butara: But what about– now, we discussed what you need to do if you want to deploy AI super irresponsibly. But what should businesses do to deploy more responsible AI?
Kavita Ganesan: So the first thing is to start with your data sources. So whenever you have an AI system that you're going to develop or you want to know the sources of your data. So if it's coming within your company, you need to ensure that it's representative of all groups, of all branches, of all countries where your company is situated. So make sure it's representative, there’s a holistic view of your data, covers all conditions.
And then also understand what types of biases is present in your data. And you can do this through the data analysis itself. You can look for things like class imbalance issues. A lot of things you can do just by plain data analysis, and also, in terms of the application, what kinds of biases it can potentially create. So think about the end effect. So if you're developing a tool that predicts the risk of lung cancer, so what are the ways it can go wrong or it can do things unfairly? So think about that. So that will give you ideas on how to avoid this situation and how to handle it. So that's the first step. So your data.
The second step is to have like an independent committee to actually review your model's behavior. So this committee should be trained to be very fair, I guess, and just want to look for problems in the model and segment the problems in different ways. So maybe by age group or by skin color, see how the model is behaving under different conditions. So rigorous evaluation will also surface all the problems in the model and that will give you cues on how you can improve your model. So I think these are the two main ways to get started.
Tim Butara: I think this was mostly focused on businesses that haven't yet deployed AI and need to do so responsibly. But what about instances where a business has already rolled out their AI implementation and they're already kind of doing it, but they then realized that it's maybe super biased and very unethical. What can they do about that?
Kavita Ganesan: Correct. Yeah, that's a great point. Yeah. So what you can do is peel back the layers. So bring your model back in house, form your data set to test the models that are already in production, and you need to do the same thing. Basically, your committee needs to fairly assess this model and find problems with it. So once you find the problems, you can iterate and improve your model and then just redeploy this newer non biased model.
And the same thing with the newer AI tools like GPT-3. These tools don't require you to do a lot of training, but you can use it right out of the box. But it may be biased because it's trained on a lot of web data. So you still want to evaluate the application the same way you would an AI model that you developed from scratch.
Tim Butara: That makes sense. Yeah. So I think that we've covered all the most important things. Just one question left for this great conversation, Kavita, and that is about government agencies and then how they regulate AI, because we covered the business aspect. So what can government agencies do to regulate AI in a way that reinforces more ethical AI?
Kavita Ganesan: Yeah, so I think the first approach is to regulate how AI is being used. So the applications of AI. So, for example, you may not want AI to be used in law enforcement to find criminals, simply because if the AI makes mistakes, then an innocent person can go to jail and their whole life is ruined because of this. They may not be able to get jobs after that. They'll be ostracized by society. So the stakes are really high, and then they have to go into the system to be supported by the government. So the end effect is bad for everyone. So by regulating specific applications and how they are used, we'll prevent, one – bad actors and we’ll prevent basically discrimination and unfair side effects.
The second way, I think, is to regulate how data is being collected and used by these companies. So let's say you upload images of yourselves. So can these companies on like, social media, so can these companies without your permission use this data to improve their facial recognition system? So are they allowed to do that or should they ask for permission? Because you may not want to have your data used in any way to train models.
So, Facebook, for example, recently, not recently, but a few years ago, got in trouble because they ran their facial recognition system on images uploaded by different users without their permission. So their face got tagged. And people may not want their face to be tagged. They may not want people to know that they are in a particular image. But Facebook did it anyway, so they were fined because the government found that this was an unethical use of AI.
So questions like data collection, how the data can be used, all of that needs to be somehow regulated. So the best way is to seek permission from the user. Hey, can I use your data to train my model or can I tag your face using this AI application? So ask for permission instead of doing first and then getting fined later on.
Tim Butara: Yeah, I just wrote down in my notes that they're basically following the model of don't ask for permission, ask for forgiveness. And, you know, even though they got fined, I'm sure that they were actually fine.
Kavita Ganesan: Yes, they were actually fine after being fined.
Tim Butara: Yeah, exactly.
Kavita Ganesan: But it's just unethical for your face to be used without your permission.
Tim Butara: I think using AI for facial recognition is one of the most ethically, kind of inconvenient uses of AI, I think. I think that it ties back to what you said before, how facial recognition tends to be super biased, based on, like, ethnic origin of people. And tying this to the thing you said about kind of not using AI in law enforcement because of the potential it has for misuse, I immediately thought of yeah, facial recognition and somebody being convicted of a crime based on the AI algorithm accusing them based on facial recognition.
So there are just so many, all kinds of trouble that you run into if you don't regulate it well. And as you said, you just need to refrain yourself from using it in certain contexts because the risk is too great.
Kavita Ganesan: Yeah. The stakes are too high rather than not using it at all.
Tim Butara: Kavita, this was an amazing conversation. I have to say that it was even better than the first one. And the first one was awesome already. So yeah, I'm really glad that we had you back, and I had a feeling that this would be another excellent one. I can't wait for everybody else to hear it as well. But just before we wrap it up, if people listening right now would like to reach out to you or learn more about you, where can they do that?
Kavita Ganesan: Yes. So the best way is to go to my website, kavita-ganesan.com, and you'll have it up in your notes, right?
Tim Butara: Of course.
Kavita Ganesan: And you can connect with me on LinkedIn. But the best place to start is to get to my website, and I have three free chapters of my book that you can download.
Tim Butara: Awesome.
Kavita Ganesan: And thank you for having me back.
Tim Butara: Yeah, it was great. And I guess we'll do another one next year.
Kavita Ganesan: Sure, I look forward to that, yeah.
Tim Butara: Awesome. Thanks again for joining us and have a great day, Kavita.
Kavita Ganesan: Thank you, Tim.
Tim Butara: And to our listeners, that's all for this episode. Have a great day everyone, and stay safe.
Outro:
Thanks for tuning in. If you'd like to check out our other episodes, you can find all of them at agiledrop.com/podcast, as well as on all the most popular podcasting platforms. Make sure to subscribe so you don't miss any new episodes, and don't forget to share the podcast with your friends and colleagues.