The past few days we saw delays on our API and queuing processes. Transaction creation could take around 30 seconds or would time out. We want to apologize for any problems experienced by our customers during this outage, and through this way want to inform our customers about mitigating steps we took to prevent this from reoccurring. We are committed to guarantee a high uptime, as you are used from our platform.
Problem
Past week we encountered three times a high load on our platform, which persisted for around 30 from start to end minutes. During this high load transactions where created after a delay, and in some occasions transaction creation resulted in time out messages for our customers in both portal and api.
We have been investigating these issues closely, as they where similar in nature and resulted in degraded performance on our systems. We have found that a certain service rebooted just before these issues ocurred and on reboot caused a very high load to our systems.
Fix
First we will make sure that this service reboots in a controlled fashion. This will give us granular control over database load, and therefore we expect our system to remain operational.
Secondly, we introduce more granular control over this service so when reboot occurs it can more easily be spread out to mitigate load, and to more easily identify any further bottlenecks in subservices.
These fixes have been introduced and will be further introduced today.
Mitigation
As communicated before we are in the midst of a migration to a new hosting party and to new hosting technology. This migration is expected to be completed end of this summer. New hosting technology will enable us to handle events and services in ways that these load issues do not occur anymore. A big part of our migration plan and the reason why we migrate to new technology is to further streamline our database and queuing processes so we take these findings and fixes into account.
This will help us guarantee our high uptime, and keep our customer’s environments operational and responsive in the future.