Ethical AI Techniques to Minimize Bias Throughout the Model Build Process
Artificial intelligence ethics are often thought of in the context of the model itself. Meaning, the focus is on the ethical quality of a model’s prediction and outcome, as well as the impact it has. Even though most companies focus on reducing bias in their models, that’s not enough to say that ethical AI techniques are in place. When it comes to responsible AI, we need to factor in the ecosystem as a whole.
It’s critical to think about the entire journey when it comes to implementing ethical AI: what is the AI model built on, what does the AI model do, and what will this model impact? These are just some of the big questions that companies need to ask. And the reality is, the answers to many of these questions aren’t limited to AI, but are just responsible practices by good corporate citizens.
Despite the tough questions, there’s good news. Real World AI that’s built responsibly is often more successful.
Four Key AI Ethics Considerations
Companies committed to building ethical AI need to explore ethical questions around all facets of their AI journey—from model bias, to data security, to explainability. Finally, they must consider what they intend their system to do and what type of impact the model’s actions will likely have on not only their business but society, too.
“Debiasing humans is harder than debiasing AI systems.” – Olga Russakovsky, Princeton.
Bias is a significant challenge in the AI space. Companies would do well to approach bias mitigation from the start of their AI journey. Without vigilant efforts, it can be introduced at various stages in AI development and production, hindering the intentions of AI ethics from the start.
The data annotation phase is especially crucial, as the data you choose and who you choose to annotate your data directly impacts the types of biases introduced into it. Data annotated by a white team of American men will look different from data annotated by a team sourced from various races, genders, and geographies. Diversity is much-needed across the board.
Many organizations rely on a global Crowd of data annotators, although the nature of crowdsourcing can still introduce bias if not done correctly. Crowdsourcing is typically a short-term commitment between job requesters and contributors. Analysis shows that contributors spend about two-thirds of their time just looking for work and that the work can often be dull and low-paying. Competition in the marketplace leads to a small number of contributors submitting a large fraction of the available work, creating less diverse perspectives within the data.
These issues can benefit from rehumanizing crowdsourcing, which is where Appen’s framework for minimizing bias comes in. This framework optimizes job launching and availability with machine learning (ML) and statistical models. In each use case, the framework will increase judgment accuracy, decrease job completion times, maximize the contributor’s profit and satisfaction, and mitigate dataset biases.
The framework provides a smart Crowd labeling process, one that’s real-time and on-demand. Its implementation can create a more balanced set of contributors for each job that is best suited to the work’s specific requirements.
Remember that this bias minimizer framework is only one component of responsible crowdsourcing. Check out our Crowd Code of Ethics to learn more.
Data security and privacy is another challenge for companies. Companies often make a mistake by not having a data strategy or governance plan in place before starting a project. There’s a lot more to data than privacy concerns, however.
Frequently, the data collected as part of financial services operations contains sensitive and confidential data that requires additional security measures in place, for example. The right data partner will offer a variety of security options to suit your specific needs and have strong security standards to ensure your customer data is appropriately handled. Look for data partners who are compliant with industry-specific or region-specific data regulations, such as SOC2 Type II, HIPAA, GDPR, CCPA, and offer options such as secure data access (critical for PII and PHI), secure annotation and onsite service options, private cloud deployment, on-premise deployment, and SAML-based single sign-on.
Creating an AI model that provides accurate predictions will only be successful if it can be explained to, understood, and trusted by customers. Because developing models based on customer information is common, customers will want to be sure that their personal information is collected responsibly, handled, and stored securely, and some will even want to understand the basics of how it’s being used.
While the most advanced AI applications are more challenging to explain, you can always go back to the training data you used to develop the model and extract some explainability from the data structure, inputs, and outputs. The validation and retraining processes can shed more light on how your models make predictions and please your customers.
Before pursuing any AI endeavor, teams should ask critical AI ethics questions around impact: What is my model intended to do? What impact will my model’s creation have on my business, the people who build my model, my end users, society? What happens when my model makes the wrong decision?
These types of questions will drive you to develop a model with a net positive impact on all relevant stakeholders in the best of cases. However, if you avoid these questions or don’t answer them with accuracy, you could face unintended consequences. A model that performs poorly may make discriminatory decisions—for example, AI-powered recruiting tools that show bias against women or facial recognition software that has trouble recognizing darker-skinned faces.
These results are not only problematic for the company that produced the model, which may face a weakened reputation and revenue loss, but also for the end-user and society at large. Reviewing model impact before and after deployment will ensure your AI models are serving their purpose successfully.
AI Ethics Starts with the Data
Answering the call for responsible AI requires applying an AI ethics lens from start to finish. Most importantly, AI models need high-quality training data sourced responsibly from a diverse crowd of contributors to work effectively. Minimizing bias should be top of mind throughout the model build process, and even after deployment, when model drift can occur. Retraining the model regularly with new data helps catch or prevent bias and maintains the accuracy of the model over time. It ensures the model continues to work as intended, avoiding the introduction of unwanted impacts to the business or consumer.
Responsible AI isn’t just a philosophical concept but an approach that every company in the AI space must adopt. When built responsibly, AI is more successful and works in a way that benefits everyone, no matter their race, gender, geography, or background. At Appen, we’re committed to responsible AI, partner with the World Economic Forum, and have a long history of working with a diverse Crowd. Learn more about our expertise here.