Kaggle has recently released a new course about AI Ethics: https://www.kaggle.com/learn/intro-to-ai-ethics
I find theses Kaggle courses very well suited for those who want to catch an overview of a field. I had the opportunity to attend NLP course a few months ago and it was really worth it. I discovered the SpaCy library.
In this post, I’ll make a quick overview of what I learnt in AI Ethics. I think, this point is getting more and more vital as AI systems are implemented everywhere in our daily life. If we want to trust these systems, we must take the time to challenge our systems and be transparent about their strengths and their weaknesses.
An important point is that this course has no prerequisites, and don’t assume any programming background.
This post is divided in 4 sections :
- HCD: Human-Centered Design for AI
- Identifying Bias in AI
- AI Fairness
- Model Cards
At the end of the post, you’ll be better prepared to challenge AI System and anticipate ethic issues when running an AI solution.
Ready? Let’s get started.
Ethics is a conversation.
This post can help you start that conversation, but not end it.
1. HCD: Human-Centered Design for AI
The first point is to stay focused about human and more precisely about the user. This approach makes me think about UX (User eXperience) we can widely find in the Web area.
There are 6 main steps when starting an AI project:
1. Understand people’s needs to define the problem
Observe how people interact, and the details about their processes. Record and note everything.
2. Ask if AI adds value to any potential solution
Sometimes a rule-bases solution is easier to develop and to maintain. Has AI proven to get better on this field? Is there some success you can rely on?
3. Consider the potential harms that the AI system could cause
Your team can help uncover hidden privacy issues and determine whether privacy-preserving techniques like differential privacy or federated learning may be appropriate. More infos: https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html and https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
4. Prototype, starting with non-AI solutions
Build something quick and easy for your users to understand. Even if it does not include AI at that moment.
5. Provide ways for people to challenge the system
Your system should be able to receive requests from your users or be able to turn off features in case the environment changes from where it has been designed for.
6. Build safety measures
You are entitled to protect users against harm. Be aware of what could be going wrong and monitor it.
2. Identifying Bias in AI
Bias is very complex. Every good data scientist would tell you:
Garbage in, garbage out.
Some bias are easy to spot. Others are more discrete.
There are six types of bias:
- Historical Bias (from the state of the world in which the data is generated)
- Representation bias (when datasets poorly represents the model it will serve)
- Measurement bias (when accuracy varies across groups)
- Aggregation bias (when groups are inappropriately combined)
- Evaluation bias (when the benchmark data does not represent the population the model will serve)
- Deployment bias (when the problem the model is intended to solve is different from the way it is actually used)
These types of bias are represented below along the lifecycle of an AI project:
The associated exercise is very interesting. It’s inspired from a famous problem: Jigsaw Unintended Bias in Toxicity Classification.
By running simple examples, we see that our data set is bias. When we evaluate these two sentences, one is inferred as toxic, the other one as non-toxic:
“I have a Christian friend” => not toxic
“I have a Muslim friend” => toxic
We have the same results with these two sentences:
“I have a white friend” => not toxic
“I have a black friend” => toxic
This is what we called historical bias.
If we take some comments in another language and translate them to English, we introduce measurement bias.
If we take comments from UK to train our system, then deploy it to Australia, we introduce deployment and representational bias.
3. AI Fairness
The third part of this course focuses on AI Fairness.
Imagine we are designing a credit application. How can I say my application is fair:
- if approval rate is equal across genders
- if gender is removed from data-set
Hmm, tough one, isn’t it?
We distinguish 4 fairness criteria:
- Demographic parity / statistical parity
- Equal opportunity (True Positive Rate — sensitivity — same for each group)
- Equal accuracy (accuracy same for each group)
- Group unaware / “Fairness through unawareness” (remove all group membership information from the data-set)
To measure these criteria, we can rely on our beloved confusion matrices. This tool is well-known to measure some specific performances indicators in AI field. In this fairness environment, you should definitely build one for each group you are worried about.
In practice, it is not possible to optimize a model for more than one type of fairness. So which fairness criterion should you select, if you can only satisfy one? As with most ethical questions, the correct answer is usually not straightforward, and picking a criterion should be a long conversation involving everyone on your team.
4. Model Cards
The last part is about a tool: model card.
A model card is a short document that provides key information about a machine Learning model. We can find model card about GPT-3 or a model from Google Cloud below:
A model card must be easy to understand and it must communicate important technical information.
There are 9 main sections :
- Model details
- Intended Use
- Factors (what factors impact the model?)
- Evaluation data
- Training data
- Quantitative Analyses
- Ethical Considerations
- Caveats and Recommendations
With all this content, we are now better prepared to tackle problems with ethics on mind. It’s not an easy subject and, like often, this is a story of trade-off. But along your AI projects, you should always keep in mind these issues.
A great power comes with great responsibilities.
If you want to pursue, Kaggle has another course : Machine Learning Explainability: https://www.kaggle.com/learn/machine-learning-explainability
Here is my “certificate”. I hope you also learn things. If you have any think to tell me, find me here or on Twitter. I’ll be glad to talk about that with you.
Have a good day folks,
From AmateK Solutions