Highlights:
- The open-source Cosmos tokenizer, which is 12 times faster than existing tokenizers and offers developers high-quality tokens with extraordinarily high compression rates, was made generally available by Nvidia.
- Nvidia announced Isaac Lab, an open-source robot learning framework based on Omniverse, a digital twin simulation platform that enables developers to test and operate robots in the virtual world.
Nvidia Corp. released new tools for developers who work on AI-powered robots, including humanoids, that facilitate quicker development cycles using blueprints, simulation, and modeling.
At this week’s annual Conference for Robot Learning, which took place in Munich, Germany and focused on the nexus between robotics and machine learning, the business unveiled the new tools.
Among the resources was the robot learning framework Nvidia Isaac Lab, which was made publicly available. Additionally, there are new video processing developer tools and six new humanoid robot learning workflows for Project GR00T, which will facilitate the building of AI robot brains.
Development in robotics depends on seeing and comprehending the surroundings. Camera footage needs to be decomposed for AI models to process. The open-source Cosmos tokenizer, which is 12 times faster than existing tokenizers and offers developers high-quality tokens with extraordinarily high compression rates, was made generally available by Nvidia. It optimizes and comprehends inputs in conjunction with the NeMo Curator.
Better “world models,” or AI representations of the world that can forecast how environments and objects will react when a robot conducts activities, can also be created as a result.
What happens, for instance, if a robot gripper closes on a banana? Because ripe bananas are soft, a robot gripper cannot close them too quickly or forcefully without smashing them, deforming them and making a mess. A piece of paper, perhaps? It needs to be grasped in a new way. High-quality video data encoding and decoding are required in each of these scenarios.
The Cosmos tokenizer enables his business to accomplish great data reduction while maintaining exceptionally high visual quality, according to Eric Jang, Vice-president of AI at 1X Technologies, a startup that develops humanoid robots. “This allows us to train world models with long horizon video generation in an even more compute-efficient manner,” he added.
Nvidia announced Isaac Lab, an open-source robot learning framework based on Omniverse, a digital twin simulation platform that enables developers to test and operate robots in virtual worlds, because not all robot AI brains can be learned in the actual world.
Using completely actualized physics, artists, developers, and businesses can create realistic 3D models and scenarios of factories, cities, and other locations with Omniverse, a real-time 3D graphics collaboration and simulation platform. This makes it an effective tool for training robots by mimicking virtual settings.
Developers can utilize Isaac Lab to train robots and enable large-scale policy adjustments for learning about safety and performance. The framework applies to all robot embodiments, including arms, swarms, quadrupeds, and humanoids.
According to Nvidia, a number of commercial robot manufacturers and research organizations worldwide, including Berkeley Humanoid, Boston Dynamics, 1X, Galbot, Fourier, Mentee Robotics, and Agility Robots, have included Isaac Lab into their processes.
Project GR00T and humanoid robot workflows
Because it takes a great deal of hardware engineering, AI training, and AI computation to enable robots to perform even relatively easy activities, creating sophisticated humanoid robots is a difficult endeavor. Humans are accustomed to walking, seeing, and acting.
Nvidia’s Project GR00T gives developers access to software libraries, data pipelines, and AI foundation models for general-purpose humanoid robots, enabling them to prototype and create more quickly.
Nvidia unveiled six new Project GR00T workflow blueprints to give developers a head start on creating sophisticated humanoids. These blueprints will assist developers in giving their robots new skills.
To train robots to walk about, operate things, and carry out other activities, engineers can generate realistic simulated environments using GR00T-Gen. It creates visually varied sceneries and randomized scenarios to assist building strong training environments using large language models and 3D generative AI models.
Robots can learn from human teachers due to GR00T-Mimic. This process enables human demonstrators to teleoperate robots and carry out tasks like moving boxes from shelves and putting them on carts, as well as wandering about a warehouse. Allowing the robot to imitate the same behaviors in the same setting is the goal. According to Nvidia, the method scales the motion data from a small number of human demonstrations in the real world using extended reality software, like Apple Vision Pro, to assist the robots in producing more natural motion on their own.
For humanoid robots, GR00T-Dexterity and GR00T-Control offer a set of models and rules for both broad body control and fine-grained dexterous manipulation. In order to deal with missed grasps, grip force, and other grip actions, developers will benefit from dexterity while working with robots that have extremely dexterous hands with actuators and knuckles. Control will assist in organizing the full body’s movements whether walking, moving limbs, or carrying out duties.
A collection of models for guiding humanoid robots as they walk and negotiate obstacles is made available to developers by GR00T-Mobility. A learning-based strategy that can swiftly adapt to different settings is made possible by mobility.
Lastly, to help robots recall lengthy event histories, GR00T-Perception incorporates sophisticated software libraries and foundation models for human-robot interaction. Nvidia accomplished this by adding the appropriately called ReMEmbR to Perception. In addition to providing context and spatial awareness to improve perception, cognition, and adaptability, this will give the robot a memory that personalizes human relationships.