Real Time System Monitoring and Training Progress Visualisations for High-Performance Computing Systems(HPCs)
In Machine Learning and High Performance computing environments, it is crucial to monitor system performance, hardware usage and the efficiency of ongoing training process. This webpage gives a dynamic and an interactive interface that can present real-time data, historical trends and provides insights into cruicial system components such as CPU, GPU Utilisation, memory usage, HBM memory usage, log information and many more.