Close

Presentation

Mitigating Resource Contention for Responsive On-device Machine Learning Inferences
DescriptionOn-device machine learning applications are increasingly deployed in dynamic and open environments, where resource availability can fluctuate unpredictably. This variability, combined with limited computing resources, poses significant challenges to achieving high responsiveness. Existing frameworks like TensorFlowLite rely on static and coarse-grained resource allocation, leading to performance degradation under contention. To address this, we propose FlexOn, a framework that combines fine-grained model segmentation and dynamic resource selection to adapt to highly dynamic runtime conditions. A prototype built on TensorFlowLite demonstrates significant improvements in both average and tail latencies up to 54% and 58%, respectively, across three embedded devices under resource constraints.