Uncover the latest AI trends in Appen's 2024 State of AI Report.
Resources
Case Studies

Enhancing an AI Video Description Generator with Human Validation

April 2, 2025
Share

Introduction

A leading creative software company partnered with Appen to refine their AI video description generator, ensuring a high standard of readability, coherence, and accuracy. The project involved leveraging human validation to evaluate and improve the AI-generated video descriptions, reducing errors, and ensuring descriptions accurately captured key visual details. Over the course of the project, Appen validated and refined 40,000 video descriptions, achieving a 95% accuracy rate. This structured approach significantly improved the generative AI model’s ability to generate high-quality descriptions at scale.

Goal

The client required reliable, AI-generated video descriptions to enhance the performance of key features in their software, such as video editing and design. This goal necessitated a scalable process to systematically refine model outputs, eliminating errors and inconsistencies across diverse video content. Integrating human expertise was essential to ensure their AI produced high-quality, informative descriptions that accurately captured the nuance of video data.

Challenge

While the AI model could generate initial text descriptions of videos, the outputs often contained inaccuracies, unclear language, or lacked key visual details. Prior to this project, the AI-generated descriptions exhibited multiple deficiencies, including:

  • Factual inaccuracies: Descriptions often contained errors or omitted key details about visual content.
  • Grammatical inconsistencies: AI-generated text lacked fluency, making descriptions unclear and difficult to understand.
  • Contextual misalignment: Descriptions did not always match the intended subject matter, leading to unclear or misleading outputs.
  • Scalability concerns: The client required a solution capable of handling a high volume of data while maintaining quality standards.

Solution

The client needed a structured validation process to correct these deficiencies and ensure consistent, high-quality descriptions across diverse video content.

To address these challenges, Appen implemented a two-phase approach:

  1. Phase One: Expert Human Review
    Expert annotators evaluated AI-generated descriptions based on key criteria such as accuracy, completeness, grammar, and contextual alignment. They made targeted corrections to improve the quality of descriptions – ensuring accuracy, linguistic fluency, and consistency.
  2. Phase Two: Automated Quality Enhancement
    Appen incorporated automated spelling, grammar, and similarity checks within the AI Data Platform (ADAP). to enhance efficiency and maintain high annotation quality. This iterative process enabled efficient continuous improvement for the AI-generated descriptions resulting in consistent, high-quality outputs at scale.

By combining expert human validation with automation, Appen delivered a scalable and efficient solution that significantly improved the AI model’s output.

Results

Over the course of the project, Appen delivered 40,000 high-quality video descriptions, achieving over 95% accuracy. The rigorous validation process significantly reduced factual errors and ensured descriptions were both precise and contextually relevant. This refined training data helped the client enhance their AI model’s performance, ensuring better accuracy in future descriptions.

By leveraging expert human validation, the client successfully improved the readability, coherence, and factual correctness of their AI video description generator while maintaining scalability and efficiency.