Highlights:
- The business showed numerous mobile-only generative AI applications, including a novel picture generation model, a huge language model-based fitness coach, and a 3D reconstruction tool for extended reality.
- Qualcomm said ControlNet is powered by a broad portfolio of AI optimizations spanning its model architecture, specialist AI software including the Qualcomm AI suite and AI Engine, and neural hardware accelerators on the device.
Recently, Qualcomm Inc. took the stage at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition to announce its latest advancements in edge-based generative artificial intelligence.
The company demonstrated several new mobile-only generative AI applications, such as a new image generation model, a large language model-based fitness coach, and a 3D reconstruction tool for extended reality.
ControlNet, a 1.5 billion parameter image-to-image model that operates on a typical midrange smartphone, was Qualcomm’s crowning achievement. The company explained that ControlNet belongs to a class of generative AI algorithms known as language-vision models, which enable precise control over image generation by conditioning an input image with an input text description.
In an on-stage demonstration, Qualcomm demonstrated how ControlNet could generate new images in less than 12 seconds simply by uploading a photo and describing how it should be edited in straightforward English. In one instance, it uploaded a simple illustration of a kitten with the caption “yellow kitten, photorealistic, 4k.” Within seconds, the ControlNet-enabled mobile device made the sketch considerably more remarkable.
Qualcomm explained that ControlNet is propelled by a full suite of AI optimizations across its model architecture and specialized AI software such as the Qualcomm AI suite and AI Engine and neural hardware accelerators on the device itself.
Qualcomm demonstrated how it utilized an LLM similar to OpenAI LP’s ChatGPT to develop a digital fitness instructor capable of natural, context-aware interactions in real-time. Qualcomm explained that an action recognition model will process the data on the device. The user merely films themselves while exercising.
Then, based on the recognized actions, a stateful organizer translates that into prompts supplied into the LLM, allowing the digital fitness coach to provide the user with feedback as their workout progresses. Qualcomm noted that this was made possible by three new innovations: a vision model trained to identify fitness activities, a language model taught to create words based on visual concepts, and an orchestrator that coordinates the interaction between these two modalities to provide live feedback.
The 3D construction tool for XR, an umbrella term for augmented, virtual, and mixed reality, allows developers to create highly detailed 3D models of virtually any environment that can operate solely on a mobile device. According to Qualcomm, depth maps are generated from individual pictures and combined to build 3D scene representations.
Qualcomm stated that its precise 3D maps can be utilized in various AR and VR applications. Qualcomm developed an augmented reality (AR) scenario that enables users to fire virtual spheres at real objects, such as walls and furniture, and observe them rebounding off those objects in a realistic manner based on accurate physics calculations.
Qualcomm adapted generative AI to the construction of facial avatars for XR environments. It demonstrated a model that can capture one or more 2D photographs of a person’s face, apply a customized mesh and texture, and convert the image into a 3D face avatar.
The avatars can even depict the user’s actions in real-time through headset cameras that monitor the user’s eye and facial movements and recreate them within the avatar. Qualcomm explained that the purpose of this model is to enable users to construct digital human avatars for use in the metaverse and human-machine interfaces on its Snapdragon XR platform.
Finally, Qualcomm demonstrated how it incorporates AI into its driver monitoring technology. In this instance, a computer vision model capable of detecting unsafe driving conditions was developed and combined with active infrared cameras that monitor the driver’s status in real-time, including indications of distraction or fatigue. Qualcomm stated that the system, which operates on the Snapdragon Ride Flex system-on-chip, can alert the motorist whenever it detects dangerous driving.