8th Computing Systems Research Day - 7 January 2025

Schedule

  • 12:00-12.15 | Welcome

  • Abstract

    Modern cloud applications need microsecond-level responsiveness, yet current virtualization approaches often cause millisecond-scale delays. This talk presents two complementary solutions that bring virtualized environments closer to bare-metal performance. First, Rorke is a microsecond-scale VM scheduler for oversubscribed cloud environments. By approximating processor sharing at the host and dynamically adapting time slices, Rorke cuts tail latency by over 10× for popular low-latency workloads—without harming throughput in non-oversubscribed scenarios. Second, Machnet is a userspace network stack designed for public clouds. Rather than relying on specialized NIC features unavailable in virtual NICs, Machnet uses a “Least Common Denominator” approach and a microkernel design to support flexible execution models. It achieves substantial latency and CPU efficiency gains, demonstrating 80% lower latency and 75% lower CPU utilization for a key-value store compared to today’s best solutions. Together, Rorke and Machnet bring virtualized infrastructure closer than ever to bare-metal levels of performance, setting a new standard for cloud computing efficiency.

    Bio

    Kostis Kaffes joined the Department of Computer Science at Columbia University as an assistant professor in June 2023. Kostis obtained an MSc and PhD from Stanford University in 2018 and 2022, respectively, and an undergraduate degree from the National Technical University of Athens in Greece in 2015. He is broadly interested in computer systems, cloud computing, and scheduling. He has worked on end-host, rack-scale, and cluster-scale scheduling for microsecond-scale tail latency. He has also been seeking ways to accelerate machine learning systems and use machine learning to improve operating systems management. Prior to Columbia, he spent a year at Google SRG.

  • 13:15-14:00 | Lunch Break

  • Abstract

    The shared and distributed memory capabilities of the emerging Compute Express Link (CXL) interconnect urge the rethink of traditional system software interfaces. In this talk, we will discuss the challenges of CXL-connected distributed systems and explore one such interface: remote fork over CXL fabrics for cluster-wide process cloning. In detail, we will introduce CXLfork that realizes zero-serialization, zero-copy process cloning across nodes. CXLfork utilizes globally-shared CXL memory for cluster-wide deduplication of process states and enables fine-grained control of state tiering between local and CXL memory. We will show how it can be integrated to Serverless Runtimes to achieve fearless concurrency. In detail, we will introduce CXLporter, an efficient horizontal autoscaler for serverless functions deployed over CXL fabrics. Overall, CXLfork attains a remote fork latency close to that of a local fork, outperforming state-of-practice by 2.26x on average, and reducing local memory consumption by 87% on average. Integrated to CXLporter it can achieve high throughput with 3x less local memory resources.

    Bio

    Chloe Alverti is a postdoctoral researcher at University of Urbana Champaign (UIUC), hosted by professor Josep Torrellas. Her ongoing research is part of the ACE Center for Evolvable Computing. She received her PhD in 2022 from the School of Electrical and Computer Engineering at National Technical University of Athens (NTUA), where she was a member of the Computing Systems Laboratory (CSLAB) supervised by professor Georgios Goumas. During her studies she spent 3 months as a visiting scholar at University of Wisconsin-Madison working with Professor Michael Swift. Before her PhD she worked for two years as a research assistant at Chalmers University of Technology advised by professor Per Stenstrom. Her research interests are focused on system software and hardware co-design for efficient memory access and virtualization, recently focusing on distributed systems.

  • Abstract

    An interference-ware memory orchestration framework that enables effective/optimized data placement decisions on memory-disaggregated cloud infrastructures, named Adrias is introduced. The key features of Adrias could be summarized through: i) its ability to forecast the tendency of system-wide metrics in the future, thus driving proactive memory orchestration decisions; ii) its accurate performance predictions for deployed applications w.r.t. memory heterogeneity (local/fast vs. remote/slow DRAM) and interference and iii) its power to leverage disaggregated memory with minimal impact on the performance of deployed applications without the employment of dynamic memory management mechanisms. Adrias exploits system-level performance monitoring information and leverages deep learning approaches to place incoming applications on the pool of available memory resources.

    Bio

    Dr. Dimosthenis Masouros received his Diploma and Ph.D. degrees from the Department of Electrical and Computer Engineering at the National Technical University of Athens, Greece, in 2016 and 2023, respectively. His research focuses on systems optimization, with an emphasis on leveraging machine learning techniques to address challenges in resource allocation, application scheduling and systems performance prediction. His current research interests include optimizing performance and energy efficiency in emerging paradigms such as serverless computing, Large Language Models, Federated Learning, and other related technologies. He has been actively involved in five European research projects and has authored over 40 peer-reviewed papers in leading international conferences and journals.

  • Abstract

    Large pages have been the de facto mitigation technique to address the translation overheads of virtual memory, with prior work mostly focusing on the large page sizes supported by the x86 architecture, i.e., 2MiB and 1GiB. ARMv8-A and RISC-V support additional intermediate translation sizes, i.e., 64KiB and 32MiB, via OS-assisted TLB coalescing, but their performance potential has largely fallen under the radar due to the limited system software support. In this paper, we propose Elastic Translations (ET), a holistic memory management solution, to fully explore and exploit the aforementioned translation sizes for both native and virtualized execution. ET implements mechanisms that make the OS memory manager coalescingaware, enabling the transparent and efficient use of intermediatesized translations. ET also employs policies to guide translation size selection at runtime using lightweight HW-assisted TLB miss sampling. We design and implement ET for ARMv8-A in Linux and KVM. Our real-system evaluation of ET shows that ET improves the performance of memory intensive workloads by up to 39% in native execution and by 30% on average in virtualized execution.

    Bio

    Stratos Psomadakis is a final-year PhD student at the National Technical University of Athens under the supervision of Prof. Georgios Goumas. My research interests lie in the intersection of Operating Systems and Hardware, with a focus on virtual memory and emerging ISA

  • 15:30-16:00 | Coffee Break

  • Abstract

    Language-agnostic composition environments—e.g., OSes, Shells, microservices, serverless—always held the promise of significant benefits, including in developer effort, financial costs, and component specialization. Unfortunately, these environments hinder the performance optimizations, strong correctness, and security guarantees that are typical of language-aware, semantics-first environments. In this talk, I will discuss how recent developments across fields allow overcoming these challenges, offer several benefits, and enable new opportunities for exciting research that has the potential for widespread impact.

    Bio

    Nikos Vasilakis is on the faculty of Computer Science at Brown University. His research encompasses software systems, programming languages, and security—with a current focus on automatically transforming systems to add new capabilities such as parallelism, distribution, isolation, and correctness. Prof. Vasilakis is also the chair of the Technical Steering Committee behind PaSh, a shell-script optimization system hosted by the Linux Foundation. More: https://nikos.vasilak.is and https://atlas.cs.brown.edu

  • 17:00-17:15 | Closing Remarks

Venue

Central Library of the NTUA, Multimedia Amphitheater (Zografou Campus)

---