Listen to this article:
Listen to this article:
What is MRC and why supercomputers need it
Training today's large language models (LLM) such as GPT-5.5 or Codex requires synchronizing tens to hundreds of thousands of graphics processing units (GPU). At every step of computation, thousands of chips must exchange gigabytes of data. If a single transfer slows down or fails, the entire training run waits for the slowest link in the chain. When operating at a scale of 100,000 GPUs and more, minor network outages become a daily reality that can cost millions of dollars in wasted compute time.
The protocol MRC (Multipath Reliable Connection), presented in a research paper on arXiv in May 2026, addresses this problem with three key innovations: packet spraying, adaptive load balancing according to congestion, and static source routing SRv6, which enables bypassing failures without human intervention.
How MRC works in practice
MRC builds on an extension of the RoCEv2 protocol (RDMA over Converged Ethernet), which is standardly used for fast communication between servers in data centers. Unlike traditional RoCE, where a single data flow travels one path, MRC spreads each Queue Pair (QP) — the basic connection between two GPUs — across 128 to 256 different paths simultaneously.
Every sent packet carries a so-called entropy value (EV), which is a 32-bit number that determines which path the packet takes through the network. The receiving network interface card (NIC) then writes data to memory immediately upon receipt, regardless of the order in which packets arrive. Thanks to this, so-called head-of-line blocking does not occur, where one delayed packet blocks all others.
MRC also disables the Priority Flow Control (PFC) mechanism, which in classic lossless networks often causes congestion between different training jobs. Instead, it operates in a normal Ethernet "best-effort" mode and resolves packet losses with selective retransmission (SACK) and so-called packet trimming. During trimming, instead of dropping the entire packet, the switch cuts off its payload and preferentially sends an "empty" header to the destination, which immediately triggers a fast retransmission.
SRv6: static routing as the foundation of resilience
For networks with tens of thousands of nodes, dynamic routing protocols like BGP would be too slow and complex to manage. The MRC authors therefore chose a seemingly paradoxical solution: turn off dynamic routing completely. Instead, each packet carries in its IPv6 address an exact list of switches it should pass through — the so-called SRv6 micro-segment (uSID) format. Each switch then simply shifts the segment in the address and, according to a static table, decides which port to forward the packet to.
This approach has two key advantages: first, MRC knows exactly which path a packet took, and when it loses a packet, it immediately knows which path is faulty. Second, it removes dependency on the control plane of switches — even if a switch stops functioning properly but remains "up", MRC simply stops using that path without having to wait for the routing protocol to notice.
Multi-plane topology: more paths = lower costs
MRC was designed together with a new network topology. Instead of the conventional three-tier Clos architecture, where 100,000 GPUs require four levels of switches, MRC utilizes a multi-plane design. For example, an 800Gb/s network card can be split into eight independent 100Gb/s ports and build eight parallel two-tier networks. Each switch then has 512 ports instead of 64, which allows serving 131,072 GPUs in only two tiers.
In practice this means:
- Lower latency — the longest path passes through only three switches instead of five or seven.
- Lower costs and power consumption — for full bisection bandwidth it needs 33% fewer optical transceivers and 40% fewer switches than a traditional three-tier network.
- Higher resilience — losing one 100Gb/s link reduces node capacity by only 0.4%, whereas with an 800Gb/s single-plane it would be 3%.
Results from production operation
According to the research report, MRC is currently running in several massive training clusters of OpenAI and Microsoft and was used to train the latest frontier models for ChatGPT and Codex. The authors provide specific data from operations:
- In a cluster with 75,000 GPUs, training experienced on average several link flaps per minute between first and second level switches. Previously, these outages could interrupt training; with MRC, operators practically do not need to respond to them.
- During an optical transceiver outage on a T0 switch, which caused four links to flap simultaneously, synchronous training performance suffered about one minute of 25% degradation, but the job did not crash and fully recovered without human intervention.
- During an experimental failure of a T1 switch during a 75K GPU training run, approximately 580,000 packets were dropped, but the job throughput quickly stabilized because MRC immediately removed the affected paths.
In the testing environment, MRC on NVIDIA ConnectX-8 NICs achieved latency of 5.09 μs for intra-rack communication and 6.54 μs when crossing switch tiers. Transfer speed with 32 kB messages reached 770 Gb/s, which corresponds to 96% of the theoretical maximum of an 800Gb/s link. In NCCL collective operations at a scale of 42,000 GPUs, MRC achieved up to 92 GB/s per network card.
Open collaboration across the industry
One of the remarkable aspects of the project is the breadth of industry collaboration. MRC was implemented not only in NVIDIA ConnectX-8, but also in AMD Pollara and Vulcano and in Broadcom Thor Ultra. SRv6 support was added to NVIDIA Spectrum-4 and 5 switches (Cumulus, SONiC), Broadcom Tomahawk 5, and together with Arista also to EOS.
The authors published the MRC specification under an open license through the Open Compute Project (OCP). This means the technology is not locked into a single vendor's ecosystem and can be adopted by other cloud operators or research institutions.
What this means for Czech companies and Europe
For Czech users and companies, MRC does not have a direct impact on daily use of ChatGPT, but it affects the background in which the service operates. Lower costs of operating supercomputers may in the long term contribute to more affordable API or subscription prices. For Czech research institutions and universities building their own compute clusters for AI, the open MRC specification is an interesting alternative to proprietary solutions.
The European context is important also from the perspective of energy intensity. Training the largest models consumes enormous amounts of electricity. Reducing the number of switches and optical modules by tens of percent directly means lower data center consumption. At a time when the EU is tightening requirements on sustainability of digital infrastructure, this is a relevant technical step forward.
It is also worth mentioning that Microsoft — one of the key MRC partners — operates several cloud regions in Europe including Western Europe. Optimization of their training infrastructure may thus also be reflected in capacities available to European customers of Azure OpenAI Service.
Conclusion
MRC is not a product that a regular user could download, but it is precisely the kind of infrastructure innovation that enables the current pace of generative AI development. While the public watches new models and ChatGPT features, a protocol in the data centers of OpenAI and Microsoft quietly routes trillions of packets per second across hundreds of thousands of GPUs. That it can survive switch failures, flickering cables, and overloaded links without human intervention is a technical achievement on which every next update of artificial intelligence stands.
Is MRC available as open-source software?
MRC is primarily implemented in the hardware of network interface cards (NICs) and switch firmware. OpenAI and partners, however, have published the MRC protocol specification under an open license through the Open Compute Project (OCP). This means hardware manufacturers can implement it in their devices, but it is not software that could be installed on a regular server.
Can MRC speed up my home internet or LAN?
No. MRC is designed specifically for RDMA networks in data centers with tens of thousands of servers and requires specialized network cards (e.g., NVIDIA ConnectX-8, AMD Pollara). Its principles like packet spraying would be inefficient in a typical home network because home routers and switches do not support these functions and are not limited by the same obstacles as supercomputer clusters.
What is the difference between MRC and Ultra Ethernet Transport (UET)?
MRC draws inspiration from UET and shares some concepts with it, such as packet spraying and adaptive load balancing. The main difference is that MRC is a minimal extension of the existing RoCEv2 protocol, whereas UET is a more ambitious new transport. MRC supports only basic RDMA write operations, which is sufficient for AI training, but allows simpler deployment in existing infrastructures based on RoCE.