We are excited to announce Test Questions are now available in Quality Flow ADAP. Test Questions, a signature feature available in Appen’s AI Data Platform historic jobs and workflows setups, are the cornerstone of your human-in-the-loop process, unblocking high quality for your data operations.
Test questions are pre-labeled queries, or golden data, that serve as benchmarks for assessing the performance of human contributors. Using test questions enable you to continuously evaluate your contributors’ performance, to identify potential issues in your instructions or in your data (if contributors consistently fail the same questions), and to calculate and track important industry metrics such as Inter Annotators Agreement (IAA).
Continuous improvement made easy
These questions can be used both before and during annotation and evaluation tasks. Before the task, quiz mode enables you to establish a predefined accuracy threshold that only permits workers who pass to start the task. During the task, work mode ensures that contributors meet a certain accuracy threshold in order to continue working on the task.
If a contributor falls below this threshold, the job manager is notified and can choose whether to allow the contributor to keep working or to remove them and their work from the job.
There are many reasons why a worker may fall under the accuracy threshold, such as:
- The worker is not focused or inconsistent and, in this case, you may want to remove them from the task.
- Your test question turned out to be incorrect, and in this case, you might want to remove it from the pool.
- Your instructions were not clear enough for your workers to pass the test question, and in this case, you might want to improve your guidelines.
This process helps maintain high-quality data and streamlines the workflow by continuously improving your task while identifying underperforming contributors for removal. Check out our demo video here.
Featured Use Cases:
In Gen AI model evaluation, it is common to ask contributors to compare different outputs to a unique prompt to test the model’s alignment for real-world scenarios. For this use case, you could pre-label a set of questions, your “golden set,” with the correct and expected “best answers” to be chosen by your workers.
This set could then be blended in with the rest of the data to be labeled and randomly presented to your workers like a regular task prompt. Their consistency in choosing the correct “best answer” on this golden set will demonstrate their trustworthiness as individual contributors and allow you to inform their IAA with their true-positive answers rate.
To increase the efficiency of your test questions, we suggest your golden set be correctly balanced to reflect your data set composition and have an even distribution of possible answers across the ground truth.
Higher-Quality AI Data:
Quality Flow Test Questions improve AI model training and evaluation by ensuring accurate and reliable AI data. They offer flexible, automated quality control for both objective and subjective tasks. Quality Flow Test questions can serve as an effective teaching mechanism, offering continuous feedback to contributors as they work. This feedback helps contributors align with the guidelines and improve their performance over time.
Users can easily monitor the responses to test questions in the ADAP interface. If test questions are contested or missed, users can quickly edit or disable these questions, ensuring the ongoing accuracy and fairness of the quality control process.
Get Started
Quality Flow Test Questions improve the accuracy and reliability of your AI models. It offers automated quality control, flexibility for objective and subjective tasks, and continuous feedback to contributors - helping data teams optimize their AI models for peak performance.
If you're a current customer, please contact your Appen representative to get started. Interested in a demo? Contact our team today.