Nvidia Nemotron 3 Ultra sets a new open-US AI benchmark, but China still leads

Nvidia’s Nemotron 3 Ultra is a real milestone for the US open-model camp: according to Artificial Analysis, it is now the strongest open model from the United States to date. The model arrives at roughly 550 billion total parameters, with about 55 billion active at any given time, and it posts an AAII score of 48.

That matters, but it does not rewrite the leaderboard. The same benchmark set still puts China’s Kimi K2.6 ahead at 54, while the strongest closed model, Opus 4.8, sits higher still at 61. In other words, Nemotron 3 Ultra expands the ceiling for open US models without closing the gap to the best systems overall.

The release timing adds to the significance. Nvidia says Nemotron 3 Ultra will be available on June 4 through Hugging Face, OpenRouter, and other platforms. For teams that care about where a model can be run, integrated, audited, or swapped out, that distribution footprint is almost as important as the benchmark score itself.

A new open-US apex, but not a new global apex

The most useful way to read Nemotron 3 Ultra is not as a victory lap, but as a recalibration of what “strong open US model” now means. On the Artificial Analysis leaderboard, it lands ahead of other open US entries such as Gemma 4 31B, Nemotron 3 Super, and gpt-oss-120b. That gives Nvidia a clear domestic benchmark reference point for open deployment-oriented work.

Yet the chart still tells a more complicated story. Kimi K2.6 remains the top open model in the comparison set at 54, and Opus 4.8 still leads overall at 61. So while Nemotron 3 Ultra is the best open US option in this snapshot, it is not the best model across the board, and it is not the strongest open model globally.

For technical teams, that distinction matters. The decision is no longer framed as open versus closed in the abstract. It is open-US versus open-China versus closed frontier systems, with different trade-offs in quality, access, and operational control.

Why the throughput number may matter more than the score for some teams

Nemotron 3 Ultra’s other headline figure is its speed. On DeepInfra, Artificial Analysis reports throughput of more than 300 tokens per second. That places it in what the benchmark source describes as a particularly attractive quadrant: strong intelligence paired with fast output.

That combination changes the model’s practical profile. A high score with sluggish generation can still be hard to use in production. A fast model with middling capability may be good for orchestration but weak for reasoning-heavy tasks. Nemotron 3 Ultra appears to sit in a more usable middle ground for latency-sensitive applications where response time matters as much as answer quality.

The 550B total-parameter figure, with around 55B active, also points to the deployment calculus behind the model. Sparse activation can make a large model more tractable than a dense one of similar total size, but the system still implies meaningful memory and compute planning. In production, that affects everything from GPU selection to batching strategy to whether the model is realistic for self-hosting versus managed inference.

For product teams, the immediate question is not whether Nemotron 3 Ultra is the absolute best model available. It is whether its mix of speed, access, and enough capability is the better operational fit for a given workload than a slower closed model or a higher-scoring but less accessible alternative.

Openness changes the integration game

The strongest argument for an open model like Nemotron 3 Ultra is not that openness magically improves accuracy. It is that openness changes the surrounding system.

Open distribution through Hugging Face and OpenRouter lowers friction for evaluation, routing, fine-tuning experiments, and vendor comparison. It gives infrastructure teams more freedom to test the model in their own stack, to benchmark it against internal traffic, and to build safeguards, adapters, and fallback paths around it. That makes the model more attractive to organizations that want leverage over deployment details rather than a single black-box API relationship.

But the benchmark gap remains a constraint on how far that advantage can go by itself. If your workload depends on the very top end of reasoning or instruction-following performance, the best Chinese open models and the best closed models still define the upper bound. Open access may accelerate integration, reproducibility, and ecosystem tooling, but it does not erase the value of raw model capability.

That is the central tension Nemotron 3 Ultra exposes. The model strengthens the US open ecosystem, but it does so inside a market structure where the performance leaders are still elsewhere.

What teams should do with this release

For builders and procurement teams, the practical response is diversification rather than ideological commitment.

If you are selecting a model for a real product, it makes sense to keep a Nemotron-based option in the candidate set, especially for workloads where throughput, controllability, and platform availability matter. But it is just as important to benchmark against leading Chinese open models and top closed systems before locking in a default.

Teams should also look closely at deployment constraints. Nemotron 3 Ultra’s speed profile may reduce user-visible latency, but only if your serving stack is tuned for it. Memory footprint, batching, routing logic, and hardware provisioning will all shape whether the model’s headline throughput translates into actual product performance.

The June 4 release window is also worth watching for more than one reason. It will tell teams how broadly the model is exposed across platforms, how quickly it gets adopted into inference providers, and whether the surrounding tooling ecosystem matures fast enough to make the benchmark gains operationally useful.

The bottom line

Nemotron 3 Ultra is a meaningful achievement for the US open-model ecosystem. At 550B total parameters, around 55B active, an AAII score of 48, and more than 300 tokens per second, it sets a new bar for open US systems and gives technical teams a fast, accessible option worth testing.

But the broader race still belongs to China at the open end and to closed models at the absolute top. The lesson for product teams is not that openness wins outright. It is that openness now buys speed of integration, deployment flexibility, and ecosystem leverage — while the highest benchmark scores continue to come from elsewhere.

Nvidia’s Nemotron 3 Ultra sets a new open-US ceiling — while China and closed models still set the pace

A new open-US apex, but not a new global apex

Why the throughput number may matter more than the score for some teams

Openness changes the integration game

What teams should do with this release

The bottom line

AI News Desk

Claude Cowork’s biggest use case is the office work nobody wants to own

Altman’s ‘pretty sure’ moment shifts the AI debate from layoffs to throughput

Brown’s 96-to-48 Split Is a Stress Test for AI-Era Assessment