Hotchip 2025 Day 0 Tutorials: Essential Insights on AI Workloads, Rack Architectures, and Custom GB200 Solutions

In the ever-evolving world of AI and data center technologies, Hotchip 2025 kicked off with an enriching Day 0 Tutorials lineup. As a staple event in the industry, this year’s sessions served as an appetizing prelude, focusing on data center racks in the morning and kernel programming in the afternoon. Our coverage dives deep into the hardware-centric morning sessions, with a spotlight on Meta’s innovative customization of the Nvidia GB200. For professionals in AI-enabled communication networks, these advancements underscore the need for robust optical-communication solutionsโ€”enter FiberMall, a specialist provider committed to delivering cost-effective offerings for global data centers, cloud computing, enterprise networks, access networks, and wireless systems. Renowned for leadership in AI-driven networks, FiberMall is your ideal partner for high-quality, value-driven solutions. Visit their official website or contact customer support for more details.

Whether you’re optimizing AI workloads or scaling fabric technologies, understanding these trends can transform your infrastructure. Let’s break down the key sessions.

1. How AI Workloads Shape Rack System Architecture

This session from AMD offered a comprehensive retrospective on the past decade-plus of AI evolution. Starting from the iconic GTX 580 and AlexNet eraโ€”priced at a modest $499โ€”it subtly contrasted with today’s skyrocketing GPU costs from Nvidia.

How AI Workloads Shape Rack System Architecture

The talk recapped various parallelism strategies and the accompanying interconnect structural changes. Essentially, it was an educational bridge for chip engineers to grasp what infrastructure teams have been up to in recent years.

ai model building blocks

Key highlights included the progression of numerical formats, from FP32 down to FP4, reflecting efficiency gains. Chip package sizes are ballooning, and scale-up domains are expanding accordingly. For those dealing with AI rack architectures, this evolution highlights the importance of scalable optical interconnectsโ€”FiberMall’s expertise in AI-enabled communication networks ensures seamless integration for such demands.

2. Scaling Fabric Technologies for AI Clusters

Another AMD-led session, this one delved into scale-up essentials. It listed common scale-up technologies but notably omitted Huawei’s UBโ€”deducting points for that oversight!

Scaling Fabric Technologies

The presentation clarified the distinctions between scale-up and scale-out approaches. It emphasized how switch radix and data paths critically influence the number of scale-up GPUs and bandwidth.

scale up vs scale out

A typical single-layer scale-up network was showcased, followed by explorations of L2 scale-up and building an L1.5 mesh on top of L1 switches. Overall, the first two sessions leaned heavily toward educational overviews, making them perfect for newcomers to AI cluster scaling. If you’re implementing these in data centers, FiberMall’s cost-effective optical solutions can optimize your fabric technologies for peak performance.

alternate topology

3. Liquid Cooling with Google Characteristics

Google shared insights from their TPU experiences, focusing on modular liquid cooling pump designs and more.

design deployment
project deschutes cdu

Notably, their fifth-generation systems are engineered for 1MW racks, pushing the boundaries of thermal management in AI infrastructures. For high-density setups like these, reliable cooling is non-negotiableโ€”FiberMall’s optical-communication products support the underlying networks that make such innovations feasible.

4. Rearchitected Power Systems

Presented by Microsoft, this session introduced 800VDC power delivery. As scale-up within a single cabinet demands more GPUs, power converters are being externalized, leading to Rack Power Disaggregation (RPD).

optimizing the infrastructure

The power chain evolves from today’s multi-stage AC transformers to 800V Medium-Voltage DC (MVDC) systems, primarily to cut losses from AC-to-DC conversions at the endpoint.

ai system power

Future adoption of 800V Solid-State Transformers (SST) promises significant efficiency gains. Behind the scenes, large-scale training causes synchronized GPU start-stops, stressing the grid with harmonics and reactive power issues. Current mitigations include cabinet-level batteries (BBU) and capacitors (CBU). MVDC allows in-row or SST-level compensation for these.

In summary, 800V MVDC can halve data center power lossesโ€”a game-changer. For AI power systems, integrating with optical networks from providers like FiberMall ensures holistic efficiency.

5. Case Study: Nvidia GB200 NVL72

Nvidia recycled a PPT from last year’s OCP summitโ€”major thumbs down. Skipping the details here, as it offered little new value for AI rack enthusiasts.

6. Case Study: Metaโ€™s Catalina (NVL72)

Hands down, the standout session of the day. Meta’s heavy customization of the GB200 rack stole the show. For context, compare this to AWS’s July GB200 instance release to spot customization differences.

Meta’s NVL72 spans six cabinets: two liquid cooling units on each end (mirroring AWS’s near-side pump with direct air cooling for minimal data center retrofits). The middle uses paired NVL36 cabinets.

Metaโ€™s Catalina

Officially, Nvidia’s GB200 pairs one Grace with two B200s, totaling 18 trays. Meta opts for 1:1 Grace-to-B200, yielding 36 compute trays for expanded memory.

Unofficially, deeper motivations exist. Nvidia’s standard CX7-based topology has each Grace linking two B200s via PCIe x1, with two CX7s per Grace (400Gbps scale-out per B200). Dual Graces connect via 6x Clink.

For GPU-Direct-RDMA (GDR), PCIe GDR limits to Gen4 x1 bandwidth, requiring memory allocation on Grace and NVLink C2C access.

Simulations showed B200 needs 800Gbps scale-out. AWS uses external PCIe switches (Gen5, but capped at 400Gbps currently). Nvidia’s CX8 embeds a PCIe switch but retains x1 Grace link for management, routing data via PCIe to Grace then NVLink C2Cโ€”necessitating special NCCL 2.27 handling.

Meta’s fix: 1:1 ratio lets Grace connect two Gen5 x16 CX7 NICs for 800Gbps per B200. B200 still uses Gen4 x1 PCIe to Grace, but true GDR isn’t direct; DMA routes via Grace’s PCIe RC and NVLink C2C.

With 1:1, NIC-to-CPU is advantageousโ€”no bottlenecks from shared CPU memory. GPUs handle RDMA via CPU memory, saving ~200GB/s HBM bandwidth.

Clink between Graces doubles to 12 lanes for higher inter-system bandwidth. Sans full CX8 readiness, Meta’s approach boosts scale-out to 800Gbps while enhancing CPU memory.

Speculatively, Grace + CX7 mimics a supersized BF3: a massive DPU with scale-out on one side, memory-semantic scale-up on the other, plus ample memory. This echoes 2021’s NetDAM, benefiting KVCache and enabling INCA or offloading comm ops to Grace via NVLink C2C.

Meta's compute cabinet includes built-in BBU for redundancy

Meta’s compute cabinet includes built-in BBU for redundancy, scale-out fiber via patch panels with spares. Two Wedge400 switches handle frontend (200Gbps per Grace via CX7 + DC-SCM security moduleโ€”no BF3 here).

Scale-out uses Disaggregated Scheduled Fabric

Scale-out uses Disaggregated Scheduled Fabric (possibly Cisco Silicon One-inspired, addressing multi-path hash conflicts).

Leak detection on each tray's PDB

Added: Leak detection on each tray’s PDB, interfacing to Rack Management Controller (RMC) via RJ45 for GPIO/I2C, plus external sensors.

OCP-spec BMC+TPM for remote management

Final: OCP-spec BMC+TPM for remote management (no BF3).

For custom AI racks like Meta's

For custom AI racks like Meta’s, FiberMall’s AI-enabled optical networks provide the backbone for reliable, high-bandwidth interconnects.

7. TPU Rack Overview

Google wrapped up with TPU rack details. Last year’s analysis covered ICI interconnect routing, protection, elasticity, and scheduling.

TPU Rack Overview

This time: Ironwood rack as a 4x4x4 block, connecting to OCS optical switches via fiber bundles with redundancy and patch panels.

Ironwood rack as a 4x4x4 block
Liquid cooling and UPS are row-level.

Liquid cooling and UPS are row-level.

rack manifolds

In conclusion, Hotchip 2025’s Day 0 set the stage for cutting-edge AI data center innovations. If these insights spark ideas for your setups, consider FiberMall for top-tier optical-communication solutions tailored to AI workloads.

Scroll to Top