NVIDIA unveils H200 GPU featuring 141GB HBM3e memory and 2x inference performance improvement over previous generation.
NVIDIA has officially announced the H200 GPU, delivering a claimed 2x performance improvement over the previous generation for large language model inference. The new chip features 141GB of HBM3e memory, the largest ever in a GPU.
The H200 is designed specifically for the demanding requirements of running large generative AI models, with memory bandwidth and capacity improvements that enable inference of models with up to 175 billion parameters on a single GPU.
"H200 enables AI inference at scale with unprecedented efficiency," said NVIDIA's CEO Jensen Huang. "Customers can now run the largest models with fewer GPUs, dramatically reducing deployment costs."
Key specifications include: - 141GB HBM3e memory (up from 80GB in H100) - 4.8TB/s memory bandwidth - 2x inference performance vs H100 - 1.8x training performance vs H100
Cloud pricing is expected to range from $3.72 to $10.60 per GPU hour depending on provider, with on-demand pricing significantly lower than previous generations. Major cloud providers including AWS, Google Cloud, and Microsoft Azure have announced plans to offer H200 instances.