Users experience error 500 when creating transactions

Incident Report for Signhost Verified Signing

Postmortem

Summary

From September 14 around 15:00 UTC+2 we saw problems arise where the flow of events on our platform would slow down considerably. This resulted in our platform sometimes not accepting new transactions and error messages shown to users. After 10 minutes this behaviour went away again. This occurred for a few more days, every time around the same time (15:00 UTC +2). We diagnosed and solved this problem after being in close contact with our hosting party.

Problem

We saw multiple times in a row a problem where our database hard disk activity would spike without clear cause. We started analysis right away to find the cause of the issue, but because these hard disk activity spikes would only intermittently occur for a few minutes and then go away again, it was hard to find the cause of the issue right away. After searching our own processes and logging, we could not find the cause directly.

We contacted our hosting party with all found logging and information. During the research at the 23rd , we also switched off non essential platform services such as user creation, to make sure behaviour was caused by the area we suspected. After this research we found out that some configuration on the hosting party end was reset to default values, which resulted in our harddisk behaviour not being optimized like it was before. Restoring this configuration initially caused some more downtime as an emergency reboot was needed.

Solution

On September 23rd we restored the configuration together with our hosting party. This solved the issues.

Mitigation

We made better agreements about the specific configuration used with our hosting party, so situations like this with spiking load at specific moments will not occur again. Furthermore, we increased our disk IO capacity to have even more wiggle room when high hard disk load does occur.

Posted Sep 27, 2022 - 15:52 CEST

Resolved

This incident has been resolved.

Posted Sep 23, 2022 - 14:12 CEST

Identified

We are experiencing sudden peaks in processing times and traffic. This results in loading times and in cases errors where our system is not reachable to prevent further piling up of transactions in process.

However we do see that practically all transactions come through and that most of the transactions will be created and signed.

We are working hard on a solution and keep you updated on this matter.

Posted Sep 23, 2022 - 12:53 CEST

Investigating

We see that some users may experience error 500 while creating transactions.
We are currently investigating this matter and will keep you updated.

Posted Sep 23, 2022 - 11:09 CEST

This incident affected: API and Portal.