800G/400G AI Data Center Product Architecture

AI Drives the Rise of 400G/800G Optical Module Market

On November 30, 2022, OpenAI, a US artificial intelligence (AI) research company, released ChatGPT, a chatbot model, which attracted more than one million users in five days and more than 100 million users in two months, becoming the fastest-growing consumer application in history. On April 28, 2023, OpenAI completed a $10.3 billion financing, with a post-investment valuation of $27 billion to $29 billion, becoming the world’s fastest-growing unicorn.

With the development of AI technology, big models, big data, and big computing power are increasingly becoming the core constraints for AIGC applications. Big models and datasets are the software foundation for AIGC development while computing power is the most important infrastructure. AI is mainly based on parallel computing, and the core processor is mainly GPU. But in addition to GPU performance, the communication factor will also become a bottleneck for supercomputing. As long as there is network congestion on a link, data latency will occur. Therefore, the AI server for the underlying data transmission rate and latency requirements are very demanding, with the need for high-speed optical modules to match, so the AI server has a great demand for 800G optical modules.

In order to solve the AI network bandwidth bottleneck, the data center network architecture needs to be changed.

data center network architecture

As AI large model training is gradually applied in various fields, traditional networks suitable for HPC can no longer meet the bandwidth and latency requirements of large model cluster training. Large model distributed training requires communication between GPUs, which increases the east-west traffic in AI/ML data centers, and the traffic pattern is different from traditional cloud computing. AI data is short-term and high-volume, which causes network latency and reduced training speed under the traditional cloud computing network architecture. In the traditional tree network topology, the bandwidth converges layer by layer, and the network bandwidth at the root of the tree is much smaller than the sum of all the bandwidths at each leaf. The fat tree is more like a real tree, the closer to the root, the thicker the branches, that is, from the leaf to the root, the network bandwidth does not converge, which can improve network efficiency and speed up training. This is the basis for the fat tree architecture to support non-blocking networks. Because there is no convergence, more optical ports are needed to ensure the consistency of the uplink and downlink rates, which increases the number of optical modules.

When the switch chip is upgraded to 112G electrical interface, it will open up the corresponding 400G/800G module applications. Due to the large differences in the topology architecture of different AI data centers, it is estimated that the demand for optical modules driven by AI is based on a typical situation. When the GPT-type application reaches 1 billion monthly active users, it is estimated that 69.4 A100 are needed. Assuming that one A100 corresponds to three optical module demands, it corresponds to about 2 million 800G optical module demands. In actual applications, from the switch end to the server end, many times 800G is divided into two, and the lower layer is 400G. Upgrading to 800G will inevitably drive the demand for 400G.

From the perspective of the North American optical module market, the next few years are basically divided into two parts, one part is the traditional data center demand, and the other part is the new demand due to the rise of AI. It is expected that the new demand for AI may exceed the traditional data center demand from 2024 to 2025.

From the perspective of the domestic optical module market, 200G/400G deployment will still be the mainstay, and it will last for a period of time. At present, the domestic demand for 400G and 800G has not increased rapidly. On the one hand, the demand for traditional data centers is relatively flat, so the growth rate is not particularly obvious; on the other hand, the demand in the telecommunications field is still a slow development trend, and there are no ups and downs. Therefore, the growth of 400G and 800G in 2024 will come from a demand increase driven by supercomputing, and CPO and pluggable modules will coexist for a long time in the future.

port shipment

Source: Dell’oro 2022.10

shipment

Source: lightcounting 2022.5

Data Center Internet Rate Upgrade Evolution

Evolutionary routes are divided, and a variety of options coexist. Users can choose according to business requirements/network architecture/deployment time.

evolution
Data Center Internet Rate

Typical Applications for 400G/800G Products

400G/800G DAC/ACC

Case 1: Quantum-2 Infiniband Switch Connection or Quantum-2 IB Switch connect to DGX-H100

Quantum-2 IB Switch Connection or Quantum-2 IB Switch connect to DGX-H100

Case 2: Quantum-2 Infiniband Switch to Branch Application

Case 2: Quantum-2 IB Switch to Branch Application

400G SR4/800G SR8 Optical Transceiver

Case 3: Quantum-2 Infiniband Switch to 2 ConnectX-7 400G NICs

Case 3: Quantum-2 IB Switch to 2 ConnectX-7 400G NICs

The 800G OSFP SR8 optical transceiver module is designed for 400G InfiniBand NDR links over multimode fiber using 850nm wavelength. The module has two ports of 4-channel 100G-PAM4 optical modulation, each using an MTP/MPO-12 connector. In the below video, you will see how to connect it to another device using breakout fiber cables, and how to configure the switch protocol based on InfiniBand or Ethernet. You will also learn about the key features and benefits of the 800G OSFP SR8 module, such as its high bandwidth, low power consumption, and hot pluggability.

FiberMall 400G/800G New Product Release

FiberMall launched 800G QSFP-DD SR8, 800G OSFP SR8, 400G QSFP112 SR4, and 400G OSFP-RHS SR4 optical transceivers and AOC cables. The product line features high-performance 112Gbps VCSEL lasers and 7nm DSPs, with an electrical host interface of 112Gbps PAM4 signals per channel and support for CMIS 4.0.

Eye diagram and sensitivity metrics

TDECQ less than 3dB per channel; OMA RXsen sensitivity meets -5.2dBm @ 2.4E-4 Pre-FEC 53.125GBd.

Eye diagram and sensitivity metrics

Transmission Distance

400G OSFP SR4 supports 30 meters (OM3 MMF) and 50 meters (OM4 MMF).

800G OSFP SR8 supports 60 meters (OM3 MMF) and 100 meters (OM4 MMF).

Package Type

400G/800G optical modules support both QSFP-DD and OSFP.

Power Consumption Design

The power consumption of 800G optical module/AOC is less than 14W under three-temperature test, and the power consumption of 400G optical module/AOC is less than 8W.

The complete product portfolio of this series is as follows:

800G (8Ă—112G) multimode product line

  • 800G OSFP SR8 (Dual MPO12/APC or MPO16/APC)
  • 800G QSFP-DD AOC
  • 800G QSFP-DD SR8 (MPO16/APC)

400G (4Ă—112G) Multimode Product Lines

FiberMall’s First 800G Active Copper Cable, 800G OSFP ACC, Powering High-Speed Data Centers, and AI High-Computing Applications.

FiberMall’s 800G OSFP DAC/ACC complies with OSFP MSA and IEEE802.3ck specifications and uses 16 pairs of copper cables to support 8-channel bi-directional transmission at 112GB/s and achieve rate backward compatibility. Compared with 800G OSFP DAC maximum support of 2m, ACC transmission distance supports 4m to 5m, and can meet the general short-distance interconnect cabling needs, the product features are as follows:

800G OSFP ACC

Excellent SI performance and good consistency at 44GHZ.

Excellent SI performance and good consistency at 44GHZ.
  • On the 800G network tester and turn on KP4-FEC, the test meets IEEE Auto-Negotiation and Link Training requirements, Post-FEC BER<1E-15 and FEC margin is better than 27%, and the FEC Frame Loss Ratio is 0 for the whole process.
  • The product adopts a Re-Driver solution, with typical power consumption of about 2.5W, delay less than 20ns. Equalization and signal-to-noise ratio balance are crucial and are far better than the Re-Timer solution in terms of power consumption and delay.
  • Adopting innovative production process, the reliability is favorable, 800G OSFP ACC 26AWG 4m only weighs about 600g.

FiberMall has released a variety of 400G/800G DAC/ACC/AOC/Optical transceiver modules, feel free to inquire!

Leave a Comment

Scroll to Top