Degraded server

Incident Report for Tribal Habits

Resolved

We had a server on AWS have a bad deployment this morning, resulting in instability in that server. As platform usage grew through the morning, the server ultimately became degraded at about 9 am AEST. At that point, users on that server may have encountered 502 Bad Gateway Errors. Reloading the bad would often fix the issue, but the error may have returned as the server continued to degrade.

Our Load Balancer was triggered to reassign users to good servers around 10 am AEST and the degraded server was then terminated and new servers rebuilt. By 10:15 am all users on the degraded server were moved to good servers and the environment was stabilised.

This situation is extremely rare with AWS but our health alarms and load balancer worked as intended and resolved the situation quickly. We apologise for any inconvenience caused to any users.

Posted Sep 22, 2023 - 09:00 AEST