Highlights:

  • NIMs allow developers to fine-tune models with proprietary data and quickly launch AI applications without requiring significant customization.
  • Advancements in generative physical AI, such as the Metropolis reference methodology for creating interactive visual AI agents, were unveiled by Nvidia.

Nvidia Corp. extended Nvidia Inference Microservices library at the recent Siggraph conference in Denver. The development will constitute advanced visual modeling, physical environments, and a range of vertical applications.

Among the highlights are the improved support for three-dimensional training and inferencing and the availability of Hugging Face Inc.’s inference-as-a-service on the Nvidia cloud.

Nvidia AI Enterprise houses a collection of containerized microservices called NIM, that makes deploying AI models simpler and quicker. Designed to minimize latency and operating expenses while enhancing performance and scalability, each inference engine is designed and customized for different hardware configurations. It can be accessed using application program interfaces. NIMs allow developers to fine-tune models with proprietary data and quickly launch AI applications without requiring significant customization.

Nvidia’s announcement that Hugging Face will offer inferencing-as-a-service on Nvidia’s DGX cloud will benefit Hugging Face’s 4 million developers by delivering faster performance and easier access to serverless inferencing. Hugging Face offers a library of pre-trained models for natural language processing (NLP) activities like text classification, translation, and question answering, in addition to a platform designed explicitly for NLP and machine learning creation and staging. Additionally, it provides a sizable dataset repository tailored for usage with Transformers, an open-source Python library that gives tools for interacting with natural language processing models.

Nvidia unveiled advances in generative physical AI, such as the Metropolis reference methodology for creating interactive visual AI agents. Metropolis is a set of tools and developer workflows for developing, deploying, and scaling generative AI applications on various hardware platforms. Additionally, it revealed brand-new NIM microservices, which assist developers in teaching actual machines to perform intricate jobs.

Three new Fast Voxel Database NIM microservices enabling novel deep learning frameworks for three-dimensional environments are among today’s announcements. A novel deep-learning framework called FVDB is designed to produce virtual worlds prepared for artificial intelligence. It is based on OpenVDB, an industry-standard library of programs and structures for modeling and visualizing sparse volumetric data, such as clouds, fire, water, and smoke.

FVDB offers four times the spatial scale, 3.5 times the performance, and access to a vast library of real-world datasets compared to previous frameworks. It streamlines procedures by merging features that previously required several deep learning libraries.

Additionally, the three microservices introduced are USD Code, USD Search, and USD Validate, all of which utilize the Universal Scene Description (USD) open-source interchange format to create diverse 3D scenes.

USD Code can respond to OpenUSD knowledge inquiries and generate Python code. USD Search facilitates natural language access to extensive OpenUSD 3D and image data libraries. USD Validate ensures the compatibility of uploaded files with OpenUSD release versions and produces a fully rendered path-traced image using Omniverse cloud APIs.

Nvidia’s Vice President of Omniverse and Simulation Technology, Rev Lebaredian, said, “We built the world’s first generative AI models that can understand OpenUSD-based language, geometry, materials, physics, and spaces.”

According to Nvidia, its NIMs are explicitly designed for physical AI support speech and translation, realistic animation and behavior, and vision. Visual AI agents use computer vision skills to see, interact, and reason via the physical world.

They are driven by a novel class of generative AI models known as vision language models, which improve performance, accuracy, decision-making, and interaction. Nvidia’s Omniverse and OVX supercomputers can be used to hone skills in a digital twin, while its AI and DGX supercomputers can be utilized to train physical AI models.

Applications include robotics, for which Nvidia said that it will offer a range of services, models, and computing platforms to the world’s top robot builders, AI model developers, and software developers. These will enable them to create, train, and develop the next generation of humanoid robotics.

Offerings include the OSMO orchestration service for managing multistage robotics workloads, NIM microservices and frameworks for robot simulation and learning, and an AI- and simulation-enabled teleoperation workflow that drastically lowers the amount of human illustration data required to train robots.

The visual output of generative AI is generally “Random and inaccurate, and the artist can’t edit finite details exactly how they want. With Omniverse and NIM microservices, the designer or artist builds a ground-truth 3D scene that conditions the generative AI. They assemble their scene in Omniverse, which lets them aggregate brand-approved assets like a Coke bottle and various models for props and the environment into one scene,” said Lebaredian.

For image generation utilizing text or image prompts, Nvidia NIMs will be available for Shutterstock Inc.’s 3D asset production and Getty Images Holdings Inc.’s 4K image generation API. Nvidia Edify, a multimodal framework for visual generative AI is leveraged by both.

Rev Lebaredian said, “We’ve been investing in OpenUSD since 2016, making it, and therefore Omniverse, easier and faster for industrial enterprises and physical AI developers to develop performant models. Nvidia has also been working with Apple Inc., which co-founded the Alliance for Open USD, to build a hybrid rendering pipeline stream from its Graphics Delivery Network to Apple Vision Pro. Software development kits and APIs that enable this on Omniverse are now available through an early access program.”

To overcome the lack of real-world data that frequently restricts model training, developers can leverage NIM microservices and Omniverse Replicator to create generative AI-enabled synthetic data pipelines.

Soon to come: FDB Mesh Generation, USD Smart Material, NIMs, or USD Layout, which creates an OpenUSD-based mesh rendered using Omniverse APIs.