NVIDIA has introduced the Llama-Nemotron series—an open family of reasoning models designed to deliver top-tier inference efficiency and state-of-the-art reasoning capabilities. Available in three sizes—Nano (8B), Super (49B), and Ultra (253B)—these models build on the Llama 3 architecture and leverage NVIDIA’s Puzzle framework for hardware-aware neural architecture search.
The flagship LN-Ultra outperforms rivals like DeepSeek-R1 in demanding scientific and mathematical benchmarks such as GPQA-Diamond and MATH500, while running efficiently on a single 8×H100 node. A standout feature is its dynamic “detailed thinking on/off” switch, allowing users to toggle between standard chat and deep reasoning modes during inference.
Under the NVIDIA Open Model License, all models, their post-training dataset, and training toolchains (NeMo, NeMo-Aligner, Megatron-LM) are freely available for commercial and research use.
With FFN Fusion and FP8 inference optimizations, LN-Ultra achieves significant throughput and latency gains, demonstrating that high reasoning performance can coexist with low hardware cost. The Llama-Nemotron series also integrates large-scale supervised fine-tuning and reinforcement learning stages to surpass teacher models and enable new levels of open-source reasoning.
This landmark release promises to empower enterprises and researchers to deploy cutting-edge reasoning pipelines while retaining full control and customization, marking a new era for open AI development.