We rolled out a new backend deployment containing a database migration that had unforeseen side-effects and lead to excessive locking of a table with high access frequency. These locks eventually lead to the DB becoming unresponsive.
While the migration was tested on locally, in CI and on staging, the problem wasn’t caught earlier because it manifested with production level traffic only.
A backend database migration unexpectedly locked a table which caused ongoing requests to pile up and never finish.
Aborting the migration, scaling in and out again resolved the issue.
Deployment of a new backend version containing a migration that lead to excessive locking under high load.