The AI behavior models that control how robots interact with the physical world have not been advancing as quickly as GPT-style language models. However, new multiverse ‘world simulators’ from Nvidia and Google have the potential to rapidly change this situation.
There is a chicken-and-egg issue slowing down AI robotics progress. Large language model (LLM) AIs have benefited from massive amounts of data to train on, thanks to the vast wealth of text, image, video, and audio data available on the Internet.
On the other hand, there is much less data available for large behavior model (LBM) AIs to train on. Gathering data on 3D representations of real-world physical situations for robots and autonomous vehicles is time-consuming and expensive.
Recent announcements from Nvidia and Google Deepmind suggest that this data bottleneck will soon be resolved, leading to a significant acceleration in physical AI development.
Multiversal AI acceleration through real-world data simulation
The concept involves generating vast amounts of reliable training data using multiverse-style world simulators. These simulators can take a single real-world scenario or even just a text prompt, create a virtual model of it, and then generate an infinite number of slightly different situations.
For example, if you have data from six cameras on an autonomous car driving down a street on a summer day, you can virtualize this data to create a 3D world representation. This representation can then be used to generate numerous slightly different scenarios, such as the same situation at different times of the day or under various weather conditions.
By creating these different virtual worlds, each with variations in the behavior of other vehicles, pedestrians, etc., you can provide a wide range of training scenarios for the autonomous car to react to. This approach allows for simulating unlikely edge cases that are rare in the real world.
Furthermore, you can generate different scenarios in which the autonomous car reacts differently in response to each situation. This simulated 3D world representation can then be used to create simulated video feeds for the car’s cameras and other sensor data.
Nvidia’s Cosmos world simulation model, as announced by founder and CEO Jensen Huang during a keynote at CES, is expected to democratize physical AI development and make it more accessible to developers.
The Cosmos model is capable of real-time operation, providing foresight and multiverse simulation capabilities to AI models to assist in decision-making.
NVIDIA Cosmos: A World Foundation Model Platform for Physical AI
The computational and data requirements for this type of simulation are immense, but Nvidia has introduced the Cosmos Tokenizer to address this issue. The Cosmos Tokenizer can significantly reduce the amount of data required for processing images and videos, leading to faster processing speeds.
Nvidia has already garnered support from various companies in the robotics industry for the Cosmos initiative, including those working on humanoid robots and autonomous vehicles.
Google Deepmind is also launching a similar initiative to accelerate progress in physical AI development, aiming to leverage world simulation models for various applications.
Overall, advancements in LBMs embodied in robots hold significant potential to revolutionize industries involving interaction with the physical world. The rapid progress in this sector is expected to bring about transformative changes in the coming years and decades.
Source: nVidia / Google Deepmind