Issues with creating transactions and accessing the web portal.

Incident Report for Signhost Verified Signing

Postmortem

What happened?

On September 2 and 3, 2025, our platform experienced a major service outage that significantly impacted availability and transaction processing. The disruption was caused by a code deployment that introduced a bug resulting in excessive duplicate message processing. This overwhelmed our database connections and led to downtime across both days.

What we did

On September 2, we deployed a change and observed an outage shortly after. The team initiated a rollback to restore functionality. After reviewing the change, we re-deployed the same change later that day, believing the issue was unrelated. However, on September 3, a second major outage occurred around the same time. During this time, all SQL queries across the Signhost services failed due to database saturation.

After analyzing message logs, we had to conclude the code change was the root cause. The deployment intermittently triggered bursts of hundreds of thousands of duplicate messages for the same transaction events. These spikes occurred after hours of normal operation, making the issue difficult to detect early. Once confirmed, we permanently rolled back to the previous stable version. The system has remained stable since.

What caused the issue?

The root cause was a still unidentified bug in the change we deployed on September 2, that sporadically generated massive volumes of duplicate messages. These bursts overwhelmed the database, causing complete service outages. The duplicates were not caused by infinite loops but occurred in sudden, high-volume spikes.

What are we doing next?

To prevent similar incidents and improve our deployment safety for these kinds of changes, we are taking the following steps:

Deployment safeguards:

  • Gradual rollout strategy: Implement phased rollouts with real-time monitoring to detect anomalies before full deployment.

Monitoring improvements:

  • Monitoring system: We are transitioning to a more robust monitoring system, which will provide improved insights, faster anomaly detection, and more precise alerting across our infrastructure.

Root cause analysis:

  • Deep dive: Continue investigating the specific bug that caused the duplicate message generation to prevent recurrence in future deployments.
Posted Sep 12, 2025 - 15:30 CEST

Resolved

This incident has been resolved.
Posted Sep 03, 2025 - 18:31 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 03, 2025 - 18:08 CEST

Update

We are continuing to investigate this issue.
Posted Sep 03, 2025 - 17:38 CEST

Update

We have placed the platform in maintenance as we continue our investigation.
Posted Sep 03, 2025 - 17:38 CEST

Investigating

We are experiencing issues with creating transactions and entering the portal.
We are investigating the issues.
Posted Sep 03, 2025 - 17:18 CEST
This incident affected: API, UI / View, and Portal.