Highlights:

  • The Gemini model is available in three sizes: the Pro, a smaller Nano tailored for Pixel phones and mobile devices, and an exceptionally potent Ultra, specifically crafted for enterprise services.
  • MusicFX harnesses Google’s MusicLM AI model, adept at generating high-fidelity musical tracks from user text prompts or interpreting a melody that the user hums.

Google LLC recently revealed the complete upgrade of its Bard artificial intelligence chatbot, integrating Gemini Pro, its most potent Large Language Model (LLM) model. Additionally, the Bard AI now boasts image generation capabilities powered by its Imagen 2 model.

The company has also unveiled ImageFX, a novel image generation tool, alongside an enhancement to MusicFX, an experimental AI model that converts text into music.

Gemini Pro has been accessible in Bard since December, albeit limited to a select subset of English-speaking users. This update will see the global rollout of Gemini Pro, extending its availability to users in over 40 languages across more than 230 countries and territories.

Gemini embodies Google’s most robust LLM, boasting advanced text generation, question answering, document summarization, conversational logic, and coding capabilities. The Gemini model is available in three sizes: the Pro, a smaller Nano tailored for Pixel phones and mobile devices, and an exceptionally potent Ultra, specifically crafted for enterprise services.

In addition to the upgrade, Google Bard will acquire the ability to generate images using text prompts, courtesy of the Imagen 2 text-to-image model. This marks the second iteration of the Imagen model, introduced by Google in May 2022.

Incorporating image-generating capabilities into Bard enables it to produce vivid, imaginative, and photorealistic images based on user text descriptions. This enhancement aligns Bard with Microsoft Corp.’s Bing Chat, which utilizes OpenAI’s DALL-E 3 to generate pictures from user conversations.

“Just type in a description — like ‘create an image of a dog riding a surfboard’ — and Bard will generate custom, wide-ranging visuals to help bring your idea to life,” Bard’s Product Lead, Jack Krawczyk, stated in the release.

To facilitate the secure sharing of artwork generated by Bard, all graphics will be watermarked using SynthID, a tool devised by Google DeepMind researchers for identifying AI-generated images. SynthID watermarks are invisible to the human eye but can be easily detected by computer-assisted tools.

Google’s latest standalone ImageFX tool, fueled by Imagen 2, has been integrated into the company’s AI Test Kitchen. This platform grants public access to experimental AI tools developed by Google. Google has also refreshed MusicFX, an AI model that transforms text into music, enabling users to create songs.

ImageFX functions similarly to other generative AI artwork creation tools, enabling users to input simple text prompts to generate images. Users can then continue to modify these images by providing additional prompts.

Kristin Yim, Product Manager at Google Labs stated, “People often discover new ideas through testing a range of prompts and concepts as they iterate. To spur further creativity, ImageFX includes a prompt interface featuring ‘expressive chips’ that let you quickly experiment with adjacent dimensions of your creation and ideas.”

MusicFX harnesses Google’s MusicLM AI model, adept at generating high-fidelity musical tracks from user text prompts or interpreting a melody that the user hums. Google debuted the text-to-music experiment last year, and since its introduction, users have generated over 10 million tracks. The text-to-music feature has been enhanced to enable the creation of 70-second music loops. Moreover, users can now utilize “expressive chips” for exploratory prompts, facilitating the iteration of generated music.

Yim said, “With feedback and improvements to our underlying MusicLM model, we’re enabling new capabilities like higher-quality audio and faster music generation.”

Google stated that both ImageFX and MusicLM utilize SynthID to watermark their outputs. This ensures that artwork and songs generated by these tools can be identified as AI-generated.