Transaction in status progress

Incident Report for Signhost Verified Signing

Postmortem

What happened?

On July 9, 2025, between 20:59 and 22:01 CEST, our platform experienced a service disruption that impacted availability. During this time, customers were unable to access the platform, and transactions were temporarily halted.

What we did

Our team noticed an increase in specific errors on the platform and immediately began investigating the root cause. We observed that queues were building up rapidly, and the platform was not processing the signing of transactions as expected. To prevent further degradation and ensure data integrity, we proactively placed the platform into maintenance mode. During this window, we identified a misconfiguration in our new containerized environment. The platform was running solely on one document signing service due to this configuration error. Although a second document signing service was active, it was not receiving any traffic. Once the issue was identified, we corrected the configuration and restored the affected server. The API was restored and operational within approximately 15 minutes, as there were no issues with creating transactions. However, access to the portal and signing transactions remained unavailable. The entire platform was fully brought back online only after the queues had been processed and stability was verified.

What caused the issue?

The root cause was a configuration error introduced during our transition to a containerized environment. Due to a misconfiguration, only one document signing service was actively handling traffic. When that server failed, the second server—although healthy—was not utilized, leading to a service outage.

What are we doing next?

To prevent similar issues from happening again and to ensure service continuity, we are taking the following steps:

  1. Audit on configuration:
    Conducting a thorough audit of our containerized environment configuration to identify and correct any potential misconfigurations.
  2. Prominent validation checks:
    Adding this scenario as a key validation checkpoint in our containerization rollout process to ensure traffic routing and failover mechanisms are correctly configured.

Moving to containerized architecture is a critical part of our broader cloud migration strategy. Our ultimate vision is to fully transition to Entrust’s EU hosting infrastructure. This migration will allow us to leverage high-availability capabilities and failover mechanisms to ensure service continuity. Additionally, it will provide a scalable and resilient platform that can meet evolving customer needs while adhering to strict regional compliance requirements to ensure that data resides in designated jurisdictions.

We expect to finalize the migration to a fully containerized environment this quarter.

Posted Jul 11, 2025 - 17:05 CEST

Resolved

This incident has been resolved.
Posted Jul 09, 2025 - 22:30 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 09, 2025 - 22:01 CEST

Identified

The issue has been identified and a fix is being implemented.
Posted Jul 09, 2025 - 21:49 CEST

Update

We are continuing to investigate this issue.
Posted Jul 09, 2025 - 21:11 CEST

Investigating

We are currently investigating this issue.
Posted Jul 09, 2025 - 20:59 CEST
This incident affected: API and Portal.