/
/
Compute Node 12 Hardware Failure

Compute Node 12 Hardware Failure

July 3, 10:31 pm - 
July 4, 2:31 am

Affected Area

Compute Node 12 

Incident Type

Device Failure/Malfunction

Root Cause

Compute Node 12 was shut down unexpectedly, affecting the customer’s cloud instance hosted on the node. The root cause was identified as a faulty control cable set. The motherboard and other components remained functional.

Resolution

Spare hardware was initially prepared for replacement. We attempted to replace the motherboard, RAID card, network card, and power supply unit, but the compute node continued to experience issues. After further inspection, the team decided to move all components to a new server chassis. The server and cloud instance were then brought back online.

Prevention