Elevated Request Latency

Incident Report for SCORM Cloud

Postmortem

First, we apologize for the unexpected service issues on June 9, 2020. Between 2:36 PM UTC and 2:45 PM UTC, SCORM Cloud experienced elevated response latency. Customers may have encountered request timeouts during this time. Response times were especially high at approximately 2:38 PM UTC.

This service degradation occurred because of an inefficient database query in an API call. The query was only invoked by the API for customers on our Trial (free) plan. The query execution time was proportional to an account's historical registration count.

On the 9th of June, a large customer account was downgraded to Trial status for billing reasons. After the downgrade, their API traffic continued as normal for the size of their account. The inefficient query overwhelmed backend database resources.

After identifying the issue, we implemented temporary mitigation measures. At the same time, we rewrote the inefficient query. The new implementation executes in a constant amount of time, regardless of account size. At 6:12 PM UTC we deployed the new query, and resolved the issue.

This specific problem will not reoccur. Our code review process should prevent the addition of any unbounded queries included in an API method. Additionally, we will review older parts of our code base for similar problems.

Again, we apologize for the inconvenience.

Posted Jun 10, 2020 - 20:12 UTC

Resolved

A fix for this incident has been deployed. We will attach a postmortem with more information to our status page later.

Posted Jun 09, 2020 - 18:14 UTC

Identified

The issue has been identified and a fix is being implemented. In the meantime, we have a workaround and we're monitoring the system.

Posted Jun 09, 2020 - 15:19 UTC

Investigating

Engineers are investigating unexpected high request latency.

Posted Jun 09, 2020 - 14:57 UTC

This incident affected: SCORM Cloud Website and SCORM Cloud API.