Slow internal message queue
Incident Report for Onomondo
Resolved
As of 16:00 we are all caught up with signalling logs.
all systems are back to normal and stable.
Posted Mar 30, 2024 - 19:08 CET
Update
We are still catching up on signalling logs, but otherwise all systems are back to normal and stable.

Current estimate for when signalling logs are caught up is 2024-03-30 09:00 UTC

We will leave this incident in place until we are caught up, but no more updates are expected until then.
Posted Mar 29, 2024 - 14:45 CET
Monitoring
We are done migrating all services, and all systems look good. We are still catching up on signalling logs, but otherwise we are up-to-date.

Some connectors messages and some webhooks can have been lost in the migration.
We will keep monitoring, and will share more updates as they become available. Thank you for your continued patience.
Posted Mar 29, 2024 - 09:56 CET
Update
Platform and connectors are still being affected by the slow consumption of our message queues.

We are in the process of moving the last core services over to the new message broker. Devices might still be kicked from the network until this has been completed.

We will share more updates as they become available. Thank you for your continued patience.
Next update in 1 hour.
Posted Mar 29, 2024 - 08:56 CET
Identified
Platform and connectors are still being affected by the slow consumption of our message queues.

The mitigations we put in place earlier are not enough for us to keep up. We are forced to do a broker upgrade which we are commencing work on now. This will lead to devices being kicked from the network, as we are migrating core features to the new message broker.

The mitigations are in the process of being deployed. We will share more updates as they become available. Thank you for your continued patience.
Posted Mar 29, 2024 - 06:21 CET
Update
Platform and connectors are still being affected by the slow consumption of our message queues. The mitigations are in the process of being deployed. We are still catching up, and are looking at several hours before we are through the backlog of messages. No device data loss has occurred. We will keep monitoring and will share more updates as they become available. Thank you for your continued patience.
The next update will be in 2 hours.
Posted Mar 29, 2024 - 04:49 CET
Update
Platform and connectors are still being affected by slow consumption of our message queues. The mitigations we have put in place have been confirmed to be working. We are still catching up, and are looking at several hours before we are through the backlog of messages. We will keep monitoring, and will share more updates as they become available. Thank you for your continued patience.
Next update will be in 2 hours.
Posted Mar 29, 2024 - 02:27 CET
Update
Platform and connectors are still being affected by slow consumption of our message queues. The mitigations we have put in place have been confirmed to be working. We are still catching up, and are looking at several hours before we are through the backlog of messages. We will keep monitoring, and will share more updates as they become available. Thank you for your continued patience.
Next update will be in 2 hours.
Posted Mar 28, 2024 - 23:24 CET
Update
Platform and connectors are still being affected by slow consumption of our message queues. The mitigations we have put in place have been confirmed to be working. We are still catching up, and are looking at several hours before we are through the backlog of messages. We will keep monitoring, and will share more updates as they become available. Thank you for your continued patience.
Next update will be in 2 hours.
Posted Mar 28, 2024 - 21:19 CET
Update
Platform and connectors are still being affected by slow consumption of our message queues. The mitigations we have put in place seem to be working, but we are still looking at several hours before we are caught up. We will keep monitoring, and will share more updates as they become available. Thank you for your continued patience.
Posted Mar 28, 2024 - 19:24 CET
Monitoring
Platform and connectors are still being affected by slow consumption of our message queues. We have implemented some mitigation features, and are monitoring to gauge the impact on our systems and will share more updates as they become available. Thank you for your continued patience.
Posted Mar 28, 2024 - 18:18 CET
Identified
We are facing issues with our system message queue. We are working hard to address the problem.
Impact:
- Webhooks might be delayed
- Messages for platform connectors might be delayed
- Signaling logs might be delayed
Posted Mar 28, 2024 - 17:12 CET
This incident affected: Connectors (TLS, HTTPS, AWS IoT Core, Microsoft Azure DPS (beta), MQTT (beta)) and Network, SMS, Webhooks, App, API.