Speaker
Dr
Andrea Borghesi
(UniBO)
Description
Predictive maintenance in High-Performance Computing (HPC) systems is crucial for ensuring reliability, performance, and energy efficiency. By leveraging AI-driven models, we can detect anomalies and predict potential faults before they disrupt operations, minimizing downtime and repair costs. This talk will explore advanced anomaly and fault detection techniques in HPC environments, utilizing machine learning algorithms to monitor system behaviour and identify irregularities. Additionally, we will examine how AI can optimize energy usage, reducing the carbon footprint of these systems while maintaining high computational performance.