Training an LLM Image Generator for Graphic Design in 20+ Languages
Introduction
A leading graphic design software company created a multimodal AI model to generate original images from text prompts in 20+ languages. To ensure these AI-generated images met high standards of visual quality and cultural relevance, they partnered with Appen to evaluate the AI-generated images and ensure successful alignment with user expectations. Appen supported the company in expanding their model capabilities to 20+ languages and producing high-quality AI-generated images across diverse cultural contexts.
Goal
This project focused on refining the LLM image generator’s ability to create culturally relevant, high-quality images from text prompts localized into 20+ languages. Refining the model was essential to ensure that each design met user expectations, delivering quality and resonance across varied cultural backgrounds while upholding the brand’s standards.
Challenge
Evaluating AI-generated designs across 20+ languages posed a unique challenge due to the nuanced cultural elements inherent to each language and region – such as cultural appropriateness and stylistic preferences. This required in-depth knowledge of cultural subtleties across various regions, as well as a thorough understanding of graphic design principles, in order to effectively evaluate the model output. The complexity of accurately evaluating and rating prompts from 25 distinct cultural perspectives was further compounded by the need for consistency and clarity in the model’s visual output.
Evaluating designs across 20+ languages posed unique challenges, as each language and region brought distinct cultural elements—like symbols, styles, and relevance. This required Appen’s reviewers to have both cultural expertise and a solid understanding of design principles to ensure clarity and consistency in the model’s visual output.
Solution
Appen executed a two-step approach to address this challenge with the LLM image generator:
1. Prompt Localization
Appen’s network of native-language translators localized the prompts from English into 24 languages, applying cultural expertise to ensure accurate adaptation. Beyond direct translation, this phase also required transcreation in cases where certain cultural events or visual elements needed adaptation to better resonate with the local audience. This process was essential in capturing culturally specific celebrations, symbols, and habits to ensure the prompts accurately reflected each target language’s unique characteristics.
2. Design Evaluation
In the second phase, Appen’s expert reviewers evaluated each LLM-generated image against a set of detailed criteria including cultural relevance, adherence to prompt instructions, design style, and format. The English version of each prompt also underwent evaluation, serving as a benchmark to ensure consistency across 20+ languages. By providing clear and consistent feedback on each image, Appen enabled the client to refine their model and improve the quality of the AI-generated designs.
Results
Appen delivered localized text prompts for a wide variety of scenarios, resulting in hundreds of evaluations per language. Each design output was reviewed against specific criteria, including the original English prompts as a comparative baseline, allowing the client to analyze model performance across a comprehensive multilingual framework.
Through this detailed, culturally sensitive evaluation approach, Appen helped the client achieve high-quality, culturally relevant graphic outputs that met user expectations. By refining the AI model’s ability to produce contextually appropriate designs, Appen’s collaboration ensured an improved user experience for this global design software application, enhancing user satisfaction and product engagement across a diverse, international audience.