At 1:15 AM UTC on March 13th, we updated a value in our production Consul server to remove a MIME-type black list entry. This change was expected and approved, but an error occurred in the manual update process. The error did not surface as a problem until our database credentials rotated on their regular schedule. Once our monitoring systems detected the problem, our SREs responded. A timeline of the response is detailed below (all times are in UTC):
We have implemented new procedures for all future updates to our Consul server. We have also identified two improvements to our dynamic configuration system. These changes will make our dynamic configuration more resilient to errors and notify us of errors immediately.