Close

Session

Research Manuscript
:
Right on Time, Built to Last: New Frontiers in Critical System Design
DescriptionAs systems grow more complex, ensuring both time-critical performance and fault tolerance becomes essential for reliability and efficiency. This session showcases cutting-edge research tackling fault resilience, real-time processing, and efficient memory utilization. Topics include advanced ECC for CXL memory, statistical fault tolerance in LLM inference, and automated resource configuration for serverless workflows. Additional highlights cover time-aware traffic shaping (Megabits to Kilobits), DAG modeling for autonomous systems, predictive memory failure management, fault injection for GPGPU graph processing, and flexible error detection in real-time multi-core systems. Join us to explore the latest breakthroughs in resilient, real-time system design.
Event Type
Research Manuscript
TimeWednesday, June 253:30pm - 5:30pm PDT
Location3008, Level 3
Topics
Systems
Tracks
SYS6: Time-Critical and Fault-Tolerant System Design
Presentations
3:30pm - 3:45pm PDTMemSeer: Leveraging Memory Failure Distinctions and Multi-Grained Prediction in Ultra-Scale Heterogeneous X86/ARM Clusters
3:45pm - 4:00pm PDTCXL-ECC: an Efficient LRC-based on-CXL-Memory-eXpander-Controller ECC to Enhance Reliability and Performance of DRAM Error Correction
4:00pm - 4:15pm PDTMegabits Down to Kilobits: Memory-Efficient Time-Aware Shaping for TSN
4:15pm - 4:30pm PDTAARC: Automated Affinity-aware Resource Configuration for Serverless Workflows
4:30pm - 4:45pm PDTFlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time Systems
4:45pm - 5:00pm PDTGraphFI: An Efficient Fault Injection Framework for Graph Processing on GPGPUs
5:00pm - 5:15pm PDTConstruction of DAG Models for Autonomous Systems
5:15pm - 5:30pm PDTReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance