Toast Lab has one full paper accepted by the The 31st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2025). It is titled "Layer Fusion-accelerated Online Scheduling for Multi-Tenancy on Heterogeneous DNN Accelerators". Mr Lei Jia is the first author. Chundong and Professor Siting Liu are corresponding authors.
Today, hardware accelerators are being deployed in cloud and edge computing to serve DNN inference jobs that multiple tenants keep issuing. The use of heterogeneous multi-core accelerator systems has been considered. The intricate nature of one such system and the dynamicity of multi-tenant jobs over time yet make the scheduling a complex problem. In this paper, the authors studied layer fusion techniques and propose a new scheduling algorithm named Lucas. Lucas aims to maximally meet the Quality of Service (QoS) requirements for all tenants when mapping their inference jobs onto heterogeneous accelerators. After breaking DNN layers into fine-grained units, Lucas online decides whether to perform multi-core layer fusion, single-core layer fusion, or layer-by-layer execution regarding factors such as the memory bandwidth consumption, the layers awaiting execution, and the cost of layer fusion. Evaluation shows that, compared to state-of-the-art schedulers, Lucas achieves significantly higher Service Level Agreement (SLA) compliance across various workloads.
The 31st IEEE International Conference on Parallel and Distributed Systems (ICPADS) will convene in Hefei, China, during December 14-17, 2025.