Multi-objective management and optimization of servers and data centers

Research Partners

Huawei Cloud Business Unit Huawei CloudBU
Heating Bits Heating Bits
EPFL EcoCloud Center EcoCloud

Sources of Funding

RECIPE H2020
MANGO H2020
Compusapien
DeepHealth H2020
EPFL / Huawei Cloud


This research line focuses on multi-objective resource management of heterogeneous High Performance Computing (HPC) servers and datacenters through machine learning-based approaches.

Our research leverages system-level resource management techniques, such as  Dynamic Voltage and Frequency Scaling (DVFS), task scheduling and allocation, and thread migration, to simultaneously satisfy different design- and run-time objectives and constraints including power/energy consumption, temperature, performance, and Quality-of-Service.

Darong Huang and David Atienza
Darong Huang and David Atienza

News

  Predicting the future with CloudProphet



Related Publications

Is the powersave governor really saving power?
Huang, Darong; Costero Valero, Luis Maria; Atienza Alonso, David
2024-02-12Conference PaperPublication funded by EPFL / Huawei Cloud (Intelligent Cloud Technologies Initiative)
CloudProphet: A Machine Learning-Based Performance Prediction for Public Clouds
Huang, Darong; Costero Valero, Luis Maria; Pahlevan, Ali; Zapater Sancho, Marina; Atienza Alonso, David
2024-01-23IEEE Transactions on Sustainable ComputingPublication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by EPFL / Huawei Cloud (Intelligent Cloud Technologies Initiative)
Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers
Huang, Darong; Pahlevan, Ali; Costero, Luis; Zapater Sancho, Marina; Atienza Alonso, David
2022-03-07IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsPublication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by DeepHealth H2020 (Deep-Learning and HPC to Boost Biomedical Applications for Health)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
Resource Management for Power-Constrained HEVC Transcoding Using Reinforcement Learning
Costero, Luis; Iranfar, Arman; Zapater Sancho, Marina; D. Igual, Francisco; Olcoz, Katzalin; Atienza Alonso, David
2020IEEE Transactions on Parallel and Distributed SystemsPublication funded by Compusapien (Next-gen computing systems inspired by the human brain)Publication funded by DeepHealth H2020 (Deep-Learning and HPC to Boost Biomedical Applications for Health)
A Machine Learning-Based Framework for Throughput Estimation of Time-Varying Applications in Multi-Core Servers
Iranfar, Arman; Silva De Souza, Wellington; Zapater Sancho, Marina; Olcoz, Katzalin; Xavier de Souza, Samuel; Atienza Alonso, David
2019Conference PaperPublication funded by RECIPE H2020 (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
A Machine Learning-Based Strategy for Efficient Resource Management of Video Encoding on Heterogeneous MPSoCs
Iranfar, Arman; Simon, William Andrew; Zapater Sancho, Marina; Atienza Alonso, David
2018Conference PaperPublication funded by MANGO H2020 (Exploring Manycore Architectures for Next-GeneratiOn HPC systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)
Machine Learning-Based Quality-Aware Power and Thermal Management of Multistream HEVC Encoding on Multicore Servers
Iranfar, Arman; Zapater Sancho, Marina; Atienza Alonso, David
2018Journal of IEEE Transactions on Parallel and Distributed Systems (TPDS)Publication funded by MANGO H2020 (Exploring Manycore Architectures for Next-GeneratiOn HPC systems)Publication funded by Compusapien (Next-gen computing systems inspired by the human brain)