Dual-Plane and Multi-Plane Networking in AI Computing Centers

In the previous article, we discussed the differences between Scale-Out and Scale-Up. Scale-Up refers to vertical scaling by increasing the number of GPU/NPU cards within a single node to enhance individual node performance. Scale-Out, on the other hand, involves horizontal scaling by adding more nodes to expand the overall network scale, enabling support for large-model training tasks that a single node cannot handle alone. This article focuses on introducing Scale-Out networking architectures and their development trends in AI computing centers.

Common Architectures for AI Computing Center Networking

AI computing center networking comes in various forms, such as CLOS, Dragonfly, Slim Fly, Torus, and others. Additionally, several variant networking modes have evolved, including Rail-only, Rail-optimized, MPFT, ZCube, and more. Among these, the Fat-Tree CLOS architecture is widely adopted in large-model training scenarios due to its efficient routing design, excellent scalability, and ease of management. Typically, a two-layer Spine-Leaf CLOS architecture is used. When the two-layer structure cannot meet scaling needs, an additional Super-Spine layer can be added for expansion.

Two-Layer CLOS Architecture

Two-Layer CLOS Architecture

Three-Layer CLOS Architecture

Three-Layer CLOS Architecture

Rail-only Architecture: Proposed by MIT in 2023, the Rail-only network architecture retains the HB domain and Rail switches while removing Spine switches, significantly reducing network costs and power consumption.

Rail-only Architecture

For example, using 51.2T switches, just 8 switches (128 x 400G ports) can form a thousand-card training cluster.

Rail-Optimized Fat-Tree Architecture (ROFT): As shown in the figure below, in a multi-rail network architecture, AI training communication demands can be accelerated through parallel transmission across multiple rails. Most traffic is aggregated and transmitted within the same rail (passing through only one level of switching), while a small portion involves cross-rail transmission (requiring two or more levels), thereby alleviating network communication pressure.

Rail-Optimized Fat-Tree Architecture

Dual-Plane Network Architecture

In 2024, Alibaba Cloud proposed the dual-port dual-plane networking architecture, which has been applied in HPN-7.0. The primary goals of this architecture are to improve performance, enhance reliability, and avoid hash polarization. This multi-rail dual-plane design builds on the ROFT architecture by splitting each NIC’s 400G port into dual 2x200G ports, connecting to two different Leaf (ToR) switches. The downlink 400G ports on Leaf switches are split into two 200G links connecting to different NIC ports.

HPN dual-plane design

The HPN dual-plane design features the following key advantages:

  • Elimination of Hash Polarization: In traditional networks, low-entropy and bursty traffic from large-model training can easily cause hash polarization, leading to uneven traffic distribution. The dual-plane design divides ToR switches into two independent groups, fixing paths for traffic entering uplink links, avoiding hash polarization at the aggregation layer, ensuring even traffic distribution, significantly reducing queue lengths, and improving network performance.
  • Enhanced Scalability and Cost Control: A two-layer network can accommodate over 15K GPUs, reducing one layer compared to traditional three-layer CLOS architectures and lowering deployment costs.
  • Improved Reliability and Fault Tolerance: Each GPU connects uplink to two independent ToR switches, eliminating single points of failure. During faults, only local ECMP groups need updating without global controller intervention, improving recovery efficiency. These features enhance network fault tolerance and ensure stability for large-model training.

Multi-Plane Network Architecture

In May 2025, the DeepSeek team published a paper titled Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures, introducing the multi-plane networking concept. As LLM (Large Language Model) parameter scales grow exponentially, traditional three-layer Fat-Tree CLOS topologies increasingly reveal limitations in cost, scalability, and robustness.

DeepSeek-V3 adopts an InfiniBand-based Multi-Plane Fat-Tree (MPFT) network to replace the traditional three-layer Fat-Tree architecture. In this setup, each node is equipped with 8 GPUs and 8 400Gbps IB NICs, with each GPU corresponding to an independent IB NIC belonging to a different “network plane.” The 8 GPUs per node connect to 8 different planes (i.e., 8 two-layer Fat-Tree planes). Using 64 x 400G IB switches, a two-layer Fat-Tree can support up to 16,384 GPUs (one plane includes 32 Spine and 64 Leaf switches, accommodating 64 x 32 GPUs; with 8 planes, totaling 16,384 GPUs). Cross-plane traffic exchange requires intra-node forwarding.

intra-node forwarding

This multi-plane networking mode offers advantages similar to dual-plane networking, with the key difference being that each GPU has a single uplink to an independent plane, lacking dual-uplink fault tolerance per card:

  • Lower Cost: Compared to three-layer Fat-Tree, MPFT can save up to 40% in network costs.
  • Higher Scalability: Theoretically supports up to 16,384 GPUs.
  • Traffic Isolation: Each plane operates independently, preventing cross-plane congestion.

The paper compares several networking modes (FT2: two-layer Fat-Tree, MPFT: multi-plane Fat-Tree, FT3: three-layer Fat-Tree, SF: Slim Fly, DF: Dragonfly):

Comparison table of networking modes

As shown, MPFT demonstrates clear advantages in per-node cost, scalability, and other aspects.

However, the MPFT described above is not the optimal implementation. A more ideal multi-plane networking mode is illustrated below:

Ideal multi-plane deployment diagram

Each NIC is equipped with multiple physical ports (here, 4 x 200G interfaces), with each port connecting to an independent network plane (similar to Alibaba Cloud’s HPN 7.0 dual-plane mode, but with 4 interfaces per NIC instead of 2). A single QP (Queue Pair) can utilize all available ports for packet transmission and reception.

Zooming in on a section of this multi-plane deployment for detail:

Detailed zoom of multi-plane setup

Using 102.4T switches as an example, providing 128 x 800G ports or 512 x 200G via Shuffle (Shuffle will be covered in detail in a future topic; switches can directly provide 512 x 200G links with built-in Shuffle, or use external Shuffle Box or Breakout Shuffle for optical fiber link allocation and mapping). Each GPU connects to 4 different planes via 4 x 200G ports, driven by one QP for per-packet load-balanced routing across ports. This mode is particularly friendly to MoE all-to-all traffic.

Detailed networking diagram:

Detailed multi-plane networking diagram

In a two-layer 4-plane setup, it can also accommodate 16,384 GPUs (note: since each NIC connects to 4 x 200G ports, the number of switches increases—requiring 1,024 Spine and 2,048 Leaf switches, 4 times the 768 switches in single-port MPFT).

two-layer 4-plane setup

Additionally, to enable these features, new requirements are placed on NICs: support for multi-plane communication, achieving load balancing of QP packets across multiple planes. Due to out-of-order packet arrival via different planes, NICs must natively support out-of-order handling.

NVIDIA’s latest CX-8 already supports 4 network planes (4-Plane), enabling multi-path packet spraying on a single QP with hardware-level out-of-order packet processing to ensure data consistency.

In summary, for Scale-Out networking expansion in AI computing centers, trends in the near future likely include shifting from three-layer to two-layer networking, achieving ten-thousand to hundred-thousand card clusters with two layers, and adopting multi-port multi-plane networking.

This comprehensive overview of dual-plane and multi-plane networking architectures highlights their critical role in optimizing AI data center networks, GPU clustering, and high-performance computing for large-scale AI training. These innovations address key challenges in scalability, cost-efficiency, and reliability for next-generation intelligence computing centers.

Scroll to Top