
Nvidia launched Nemotron 3 Super, a 120-billion-parameter open-weight model designed for large-scale agentic AI systems. The company announced the release on Wednesday, positioning the model for speed and efficiency.
The model targets enterprise deployment for complex tasks requiring long-context processing. It is available immediately on several major platforms, expanding access for developers and researchers.
Nemotron 3 Super is available on build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. Enterprises can access it via Google Cloud’s Vertex AI, Oracle Cloud Infrastructure, and soon on Amazon Bedrock and Microsoft Azure. The model is also accessible on Coreweave, Crusoe, Nebius, and Together AI.
The model uses a hybrid latent mixture-of-experts and Mamaba-Transformer architecture. This structure allows it to use four times as many expert specialists during inference as previous models at the same cost.
Nvidia trained the model on synthetic data from other frontier reasoning models. The company is publishing over 10 trillion tokens of training data and 15 training environments with this release.
According to Artificial Analysis benchmarks, the model scores 36 for overall intelligence. This places it above gpt-oss-120B at 33 points but behind Gemini 3.1 Pro and GPT-5.4, which both score 57.
Find the best AI model for your needs, check out our AI Model Leaderboard!
The model achieves a speed of 478 output tokens per second, making it the fastest model available. Nvidia states it achieves 7.5x higher inference throughput than Qwen3.5-122B.
Nvidia has not announced a release date for Nemotron 3 Ultra, the family’s largest 500-billion-parameter model. The company teased the Ultra variant in its original announcement last year.
Featured image credit




























