Top

Nvidia open-sources advanced AI agents with Llama Nemotron models

US chipmaker Nvidia has launched its new family of open models, Llama Nemotron, whose reasoning capabilities are designed to provide developers and companies with a foundation to create advanced AI agents that can work independently or in teams to solve complex tasks with high accuracy. According to the company, the models have been improved during post-training to optimize their results when executing tasks in mathematics, coding, reasoning, and complex decision-making.

Thus, as explained in a statement on its blog, this refinement process has managed to increase the accuracy of the models by up to 20% compared to the base model and optimize inference speed up to five times compared to other open reasoning models in the sector.

In this way, the models offer on-demand AI reasoning capabilities and, with improvements in inference performance, allow the management of more complex reasoning tasks while also reducing operational costs for companies. It has pointed out that some of the leading companies developing AI agents, such as Accenture, CrowdStrike, Deloitte, Microsoft, and ServiceNow, are already collaborating with the US chipmaker in creating their reasoning models and software.

Nano, Super and Ultra AI model sizes

As detailed by Nvidia’s founder and CEO, Jensen Huang, the Llama Nemotron model family is available as Nvidia NIM microservices in Nano, Super, and Ultra sizes, each optimized for different deployment needs. In the case of the Nano model, it offers high accuracy on PCs and edge devices; the Supermodel, in turn, provides “the best accuracy and highest performance on a single GPU.” Finally, the Ultra model delivers “the maximum agent accuracy” on multi-GPU servers.

With all this, the tech company has indicated that developers can now implement the Llama Nemotron reasoning models with Nvidia’s new agent AI tools and software to streamline the adoption of advanced reasoning in collaborative AI systems, all through the Nvidia AI Enterprise platform. Specifically, the Nano and Super models, as well as the NIM microservices, are available as an application programming interface (API) on the build.nvidia.com service and on Hugging Face. Meanwhile, companies can run the Llama Nemotron NIM microservices with Nvidia AI Enterprise in accelerated data centres and cloud infrastructures. The new models will help Nvidia achieve its public goal of achieving “superhuman productivity through AI.

Unveiling Nvidia Dynamo

In addition to all this, another development announced by Nvidia is the open-source, free inference software Nvidia Dynamo, designed to accelerate and scale AI reasoning models in AI factories, achieving maximum efficiency at a lower cost. The company highlights the importance of efficiently coordinating AI inference requests when using a large fleet of GPUs to ensure that AI factories operate at the lowest possible cost. In this regard, with Nvidia Dynamo, the successor to Nvidia Triton Inference Server, the company offers an option to maximise token revenue generation for AI factories implementing reasoning models.

Specifically, this is because it accelerates inference communication between GPUs and uses a disaggregated service to separate the processing and generation phases of large language models (LLM) across different GPUs. This optimises each phase independently, ensuring maximum utilisation of GPU resources. As a result, Nvidia claims that, with the same number of GPUs, Dynamo “doubles the performance and revenue of AI factories” using the Llama models on the current Nvidia Hopper platform. Nvidia Dynamo is open-source and compatible with PyTorch, SGLang, Nvidia TensorRT-LLM, and vLLM, allowing companies, startups, and researchers to develop and optimise AI model deployment.

Marc Cervera is a freelance journalist based in Barcelona, Spain, with over four years of experience contributing to leading Spanish and international media outlets. He holds a double degree in Journalism and Political Science from Universitat Abat Oliba and an MA in Political Science from the University of Essex. Marc has lived in the US, UK, Spain, and the Netherlands, and his work primarily explores economics, innovation, and politics.