Summary
On Friday the 30th of December, we saw our postback delivery become slower and slower. To mitigate this, we parked all pending postbacks so ongoing processes could proceed. This resulted in postbacks being sent out slower out from approximately 13:30 CET until 18:30, and some postbacks not at all after we parked the existing queue because processing times became too slow. Transaction status for these postbacks had to be updated by performing a GET call.
Incident
Due to a high amount of postbacks to send out our postback service environment became less responsive and slowed down considerably. This resulted in postbacks being sometimes too slow to send out and forming a queue. After notifying these postback delays we parked the existing queue. Notification was impacted by less than optimal availability of colleagues due to the holidays. This resulted in new postbacks being sent out in real time again. In the following days we closely monitored the postback performance and we saw problems reoccuring. We looked at re-introducing the parked postback queue, but as this would introduce considerable risk into our ongoing transactions and operations, we decided not to.
Mitigation
We introduced multiple postback efficiency fixes in the past few days, to improve postback handling and speed. We also took a look at database optimization to improve postback related queries. Furthermore, we improved our monitoring to alert us sooner if postback slowdowns occur. Lastly, we investigated re-introducing parked postbacks into our system so if this occurs in the future we are able to introduce these without risk.