Deep Dive into NVIDIA GB200 Liquid Cooling Plate Design: Advanced Liquid Cooling for AI Chips

Next-generation AI chips like NVIDIA’s GB200 are pushing the boundaries of performance. But this immense power comes at a cost: staggering heat generation. A single GB200 chip package consumes up to 2700 W of power. With such high power in such a compact space, traditional air-cooling systems simply can’t keep up. The only way to maintain these chips’ cooling is through advanced liquid cooling technology.

small_nvidia-gb200-server-interior
Next-generation AI chips

The Numbers Behind the GB200 Cooling Challenge

To understand the solution, we must first delve into the problem. The heat produced by the GB200 isn’t just high—it’s extremely concentrated. This is known as heat flux density.

The chip generates a heat flux exceeding 50W/cm², equivalent to producing 50 W of heat in an area the size of a small fingernail. In certain hotspot areas, the heat flux can even surge to 150W/cm². If not controlled, this heat can quickly damage the chip. Therefore, the chip’s temperature rise (Tc) must typically be kept below 40°C, while the GPU itself requires even stricter limits—below 30°C.

Meeting these figures requires a cooling system with two key features:

  1. Extremely low thermal resistance (below 0.03°C/W). This means the cooling plate can efficiently remove heat from the chip.
  2. Low flow resistance (no more than 20kPa). This ensures the coolant can easily pass through the plate without needing powerful, energy-consuming pumps.
GB200_gpu

Inside the Cold Plate: Structure and Flow Physics

To achieve these impressive numbers, engineers use a special design called a microchannel cold plate, typically manufactured using a skived fin process (also known as the skiving process).

Microchannel Structure Using Skived Fins

The cooling plate is usually made of copper, an excellent thermal conductor (thermal conductivity of 385W/mK). Then, special tools are used to directly skive thin fins from the copper base. This forms a series of tiny parallel channels.

  • Fin thickness (t) is generally ≤0.5mm.
  • Spacing between fins (P) is also ≤0.5mm.
  • Fin height (L) is typically ≥3mm.
Microchannel Structure Using Skived Fins
fins

These minuscule dimensions create a massive internal surface area, which is crucial for absorbing heat into the liquid coolant flowing through the channels.

The structure of the microchannels is critical. The relationship between fin height (L) and spacing (P) determines how the liquid flows.

Understanding the Internal Flow Channels

The way liquid flows in these tiny channels is vital. Engineers use some key formulas to analyze this.

First, calculate the channel’s hydraulic diameter (Dh), which represents the effective size for fluid flow in the channel. The formula is:

Dh = 2(PL)/(P+L)*

Since the fin height (L) is much greater than the spacing (P), and the L/P ratio is usually over 15, the formula simplifies to: Dh ≈ 2P. This tells us that the effective channel size is directly related to the tiny spacing between fins.

Next, engineers use the Reynolds number (Re) to determine the flow type:

Re = ρ * V * Dh / μ

Where ρ is fluid density, V is velocity, and μ is viscosity. For a typical cold plate, the flow velocity (V) is usually less than 0.1m/s. Calculations show the Reynolds number (Re) is less than 2000. This confirms the flow is steady and predictable, known as laminar flow.

Understanding the Internal Flow Channels

How to Improve Cold Plate Performance

Based on this knowledge, engineers know exactly what adjustments are needed to enhance cooling plate performance. There are two main goals:

  1. Reduce Flow Resistance

To make the liquid flow more easily, engineers can:

  • Reduce the flow velocity (V) between the fins.
  • Shorten the length of the fins along the flow direction.
  • Increase Convective Heat Transfer

To make the plate cooling more effective, they need to increase the heat transfer coefficient (h). An interesting finding is that for this laminar flow, velocity (V) has little impact on “h”. Therefore, they have two other options:

  • Increase the liquid’s thermal conductivity (k) by selecting a more effective coolant.
  • Reduce the equivalent hydraulic diameter (Dh), which essentially means making the channels smaller and tighter.

By applying these precise principles and calculations, engineers can design and fine-tune efficient liquid cooling plates. This meticulous work is crucial for the future of AI, ensuring powerful chips like the GB200 can reach their full potential without overheating.

Increase Convective Heat Transfer

GB200 Liquid Cooling Plate Specifications

ItemContent
PlatformGB200
TypeGrace Blackwell Superchip
Architecture2xBlackwell GPU + Grace CPU
TDP2700W (2x1200W GPU + 300W CPU)
Liquid Cooling Component Size189.5×270.4×28.5mm (WxHxD)
Cold Plate Component Weight2.9KG
CoolantPG25
Quick ConnectorCEJN or CPC UQD04
Liquid Cooling Tube MaterialStainless Steel or EPDM
Inlet Temperature45℃
Flow Rate1.8 LPM
Fluid Resistance< 20kPa
Thermal ResistanceExtremely low, below 0.03°C/W
Heat Flux DensityExceeding 50W/cm², reaching 150W/cm² in some areas
Temperature Rise LimitChip temperature below 40°C, GPU temperature below 30°C
GB200 Liquid Cooling Plate Specifications

FiberMall is a specialist provider of optical-communication products and solutions, committed to delivering cost-effective offerings to global data centers, cloud computing environments, enterprise networks, access networks, and wireless systems. Renowned for its leadership in AI-enabled communication networks, FiberMall is an ideal partner if you’re seeking high-quality, value-driven optical-communication solutions.

Scroll to Top