TWIN4FT – Towards self-healing large-scale distributed infrastructures: Artificial Intelligence and Digital Twins
- Project funded by Région Auvergne-Rhone-Alpes
- Partners: ATOS, UGA
Distributed infrastructures include more and more computing resources and a large diversity of hardware components. Such large scale systems can also experience failures and performance bugs at the hardware and the software level. Such infrastructures are thus difficult to operate. It is even more difficult to make best use of the provided computing resources.
The TWIN4FT projets studies how ML appoaches can be used to operate efficiently large-scale distributed infrastructure. We follow an approach inspired from the digital twins concept.
Our objective is to model each component of a distributed infrastructure using a simple ML model and to aggregate the output of all these models to construct a precise model of the whole system. This model will be continuously fed with data coming from the real system and will be used to detect anomalies or take reconfiguration decisions.
We consider large-scale microservices applications as an example of distributed systems and we build a model of each service that estimates resource consumption based on the number of requests received by each service.