Memory Channel Controller

Far, disaggregated memory is transforming data centers, but it comes with performance overheads. The old idea of near-data processing (NDP) gets renewed to address this challenge. While hardware vendors are rushing to design NDP hardware, a critical piece of the puzzle is missing: the operating system abstraction. We propose Memory Channel Controllers – a modern take on mainframe I/O channels – to make NDP portable, virtualizable, and capable of fine-grained cooperation with the host CPU.

New interconnects, new hardware

Driven by protocols like CXL, data centers are moving towards disaggregated “far” memory to increase capacity and efficiency. However, this comes at a cost. Accessing this “far memory” incurs a significant latency tax compared to local DRAM, often reducing bandwidth and stalling the CPU.

When moving data is expensive, the logical solution is to move the computation. The industry knows this:

  • Marvell is integrating ARM Neoverse cores into their memory extension cards.
  • Samsung is demonstrating “Processing-Near-Memory” (PNM) engines within CXL devices.

But here is the problem: To the programmer, these systems look like two separate computers. We currently lack a unified model to access, program, and multiplex these remote processors. Without a proper OS abstraction, these powerful accelerators risk being isolated, hard-to-manage islands of compute.

Our take: an OS-centric abstraction

If we look back in time, back to the old days of mainframe computers, there were a class of storage that has similar characteristics: larger and cheaper than memory but slower: the spinning disk. In those IBM mainframes, the solution to address the I/O bottleneck is mainframe channel controllers - powerful and programmable units that offload I/O tasks from CPU.

Inspired by that, we propose the Memory Channel Controller (MCC) as the standard abstraction for modern near-data processing for far, disaggregated memory.

The MCC acts as a programmable unit that sits between the CPU and far memory. Instead of the CPU fetching data byte-by-byte (and stalling on latency), the application sends Channel Programs to the MCC. By executing logic directly next to the data, the MCC eliminates costly round-trips over the CXL link, hiding the high latency of far memory while freeing up the host CPU for other tasks.

We prioritize virtualization and portability as core design choices for this abstraction. Rather than exposing raw hardware constraints to the user, the OS manages MCCs as virtual resources: allocating, managing and multiplexing them. This allows developers to write portable Channel Programs (e.g., in a high-level DSL) that are decoupled from the specific underlying hardware implementation.

For more details, check out our APSys’25 paper!

Beyond “dump and wait”

Most existing accelerators (like GPUs) follow a “bulk offload” model: the CPU sends a massive chunk of data, waits, and gets a result back.

MCCs enable a third way: fine-grained interaction.

As modern interconnects (like CXL or Enzian’s ECI) expose memory transactions, it’s possible to construct hardware-based message-passing channels to enable efficient fine-grain data movement. This enables chatty channel programs. For example, the MCC can handle latency-sensitive work of the workload (e.g. pointer chasing) while the CPU handles complex business logic, with two parts communicating interactively in real-time.

Real hardware, not just simulation

We are not just simulating this. We are prototyping the MCC architecture on Enzian (check out other articles on this website!).

Using Enzian’s coherent interconnect (ECI), we emulate future CXL hardware today, allowing us to build and test the actual compiler, OS support, and hardware logic before commercial CXL processors hit the market.

Join the Discussion

This is an active project at the NetOS Group @ ETH Zurich. We are currently exploring applications in database processing, graph analytics and more.

If you are interested in how we can rethink near-data processing for the modern era, please reach out!

Publications

  1. Mainframe-Style Channel Controllers for Modern Disaggregated Memory Systems
    1. Zikai Liu
    2. Jasmin Schult
    3. Pengcheng Xu
    4. Timothy Roscoe
    APSys 2025: Proceedings of the 16th ACM SIGOPS Asia-Pacific Workshop on Systems, October 2025
  2. Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects
    1. Anastasiia Ruzhanskaia
    2. Pengcheng Xu
    3. David Cock
    4. Timothy Roscoe
    arXiv, April 2025