Highlights:
- A new “Asset Orchestrator” at the heart of the release will enable developers to fine-tune their models using assets like Low-Rank Adaptations, or LoRAs.
- The new service, OctoML claims, can produce art generations in an average of 2.8 seconds, significantly speeding up image production with the photo-realistic model Stable Diffusion XL.
Recently, OctoML Inc., a startup specializing in artificial intelligence optimization, announced the release of OctoAI Image Gen. This architecture enables developers to customize image generation on well-known models like Stable Diffusion and simultaneously apply changes to thousands of assets.
Luis Ceze, Chief Executive of OctoML, said, “Image generation applications have quickly gone from fad to real business, with many e-commerce, entertainment and creative organizations looking to differentiate their service with AI. But building these custom experiences with Stable Diffusion today is an extensive engineering effort that simply doesn’t scale.”
In June, OctoAI was introduced to assist developers in creating and expanding their artificial intelligence models. With the addition of this new offering, it can now offer an API endpoint and enable mass fine-tuning with its resources.
The new “Asset Orchestrator,” which is the centerpiece of the release, will enable developers to enhance their models with assets like Low-Rank Adaptations, or LoRAs. Using a LoRA, users can quickly train Stable Diffusion on various concepts, like a specific character or style. LoRAs are fine-tuning models.
Unlike standard image-generating models, which can be cumbersome due to their size, LoRAs generate small portable models, making them useful. Due to their reduced processing power requirements, they are also far faster and simpler to train.
A LoRA can enhance a Stable Diffusion model once it has been trained to produce an image with that particular character or style. Therefore, for fine-tuning a model, LoRAs represent a reasonable trade-off in terms of size, time, and computing power.
Users can prompt a Stable Diffusion model with text to create an image of a video game character or comic book character, for instance. This would probably lead to finicky and inconsistent results — and would probably require a lot of trying to get the model to manifest the image they want. This is called prompt engineering.
A LoRA trained on pictures of that particular character and styles the user desired, like from a certain video game or art style from a certain era of comic books, would more accurately align the model to match the intended results. A lot less engineering would be needed for the model to produce a reasonably good customized image.
Using OctoML’s photo-realistic model Stable Diffusion XL, the new service produces art generation on average in 2.8 seconds, according to OctoML. As part of the asset management feature, users can manage and pull models and data from popular sources like CivitAI, an open-source tool where users can share AI artwork from Stable Diffusion, and Hugging Face Inc.’s open-source AI model repository.
Numerous clients, such as Storytime AI, which creates an app that employs AI to create children’s stories, and NightCafe Studio Ply Ltd., which operates an AI art generator website and community, have already implemented the OctAI image generation solution in their business applications.
Brian Carlson, CEO of Storytime AI, said, “Our top priority is to deliver kid-safe, consistent, engaging images for our custom children’s stories. Previously, this process relied on heavy-handed prompt engineering. But OctoAI helped us stand up a whole new image gen architecture utilizing assets like LoRAs to create consistent visuals without the added complexity of prompt engineering.”