Highlights:
- OpenAI claims that competing AI image generators often struggle with prompts requiring them to depict multiple objects.
- Another key feature of the upgraded image generator is its ability to create objects with transparent backgrounds.
OpenAI upgrades ChatGPT’s built-in image generation tool. Previously, the feature relied on DALL-E 3, a text-to-image model introduced in 2023 and built upon earlier versions dating back to 2021. The original iteration was a modified version of GPT-3 designed for rendering tasks.
With the latest update, OpenAI is replacing DALL-E with GPT-4o, a multimodal large language model launched last April. According to OpenAI, this upgrade will greatly improve ChatGPT’s graphic design capabilities.
The enhanced image generator can now handle more complex requests. In an internal test, OpenAI asked ChatGPT to depict an early physics experiment by Isaac Newton, and the chatbot produced a detailed illustration complete with explanatory text.
ChatGPT can tailor its generated images based on user instructions. After illustrating Newton’s experiment, OpenAI engineers asked the chatbot to overlay the drawing onto a notebook. The chatbot successfully adjusted both the angle of the illustration and the complexity of the background to complete the task.
OpenAI claims that competing AI image generators often struggle with prompts requiring them to depict multiple objects. In contrast, GPT-4o can accurately render up to 20 distinct elements specified by the user, including text, which it generates more reliably than DALL-E 3.
Users also have the option to provide reference images. For example, an interface designer could upload a dropdown menu template and request improvements from ChatGPT.
Another key feature of the upgraded image generator is its ability to create objects with transparent backgrounds. This makes it easier to integrate newly generated visuals into other designs, such as seamlessly adding a logo to an existing application interface.
OpenAI trained GPT-4o with publicly available assets and data licensed from partners such as Shutterstock Inc. “We trained our models on the joint distribution of online images and text, learning not just how images relate to language, but how they relate to each other,” OpenAI staffers reported.
Following the initial training phase, OpenAI refined ChatGPT’s output quality using a technique called Reinforcement Learning from Human Feedback (RLHF). This method, a variation of traditional reinforcement learning, is widely used in AI development.
In standard reinforcement learning, a second neural network oversees the training process of an AI model. RLHF enhances this approach by incorporating feedback from human experts to improve that overseeing network. These expert-driven refinements help boost the overall quality of the AI.
At launch, ChatGPT’s upgraded image generator is available in the Free, Plus, Pro, and Team plans. OpenAI plans to expand access to Enterprise and Edu plans in the near future.