Awell - Notice history

100% - uptime

[EU] Design - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Orchestration - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 99.98%
Jun 2024
Jul 2024
Aug 2024

[EU] Score API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Score Browser App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Hosted pages - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Care - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Studio - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[EU] Awell Platform - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024
100% - uptime

[UK] Design - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Orchestration - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Score API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Score Browser App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Care - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Studio - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Awell Platform - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[UK] Hosted Pages - Operational

100% - uptime

[US] Design - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Orchestration - GraphQL API - Operational

100% - uptime
Jun 2024 · 100%Jul · 99.87%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Score API - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Score Browser App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Hosted pages - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Care - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Studio - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024

[US] Awell Platform - Web App - Operational

100% - uptime
Jun 2024 · 100%Jul · 100.0%Aug · 100.0%
Jun 2024
Jul 2024
Aug 2024
100% - uptime

Mailgun Outbound Delivery - Operational

Third Party: Retool → Infrastructure - Operational

Third Party: Retool → Resource Queries - Operational

Third Party: Retool → Web Application and APIs - Operational

Third Party: Retool → Source Control - Operational

Notice history

Aug 2024

Jul 2024

[Sandbox] Orchestration - GraphQL API outage
  • Resolved
    Resolved

    Early this morning, our Sandbox environment experienced an outage due to a new data replication feature intended to improve the availability of the database cluster. During a routine maintenance operation, a database server failed to shut down gracefully because of the data replication configuration.

    While the server eventually restarted, it was not able to synchronize with the other servers, leading to a continuous loop of failed synchronization attempts which eventually made the cluster unresponsive.

    The Production environments have no risk of being impacted by this issue as they use a different configuration for data replication. The configuration of the Sandbox environment has been aligned with the Production environments to eliminate the risk of this occurring again.

    The issue has now been fully resolved. We apologize for any inconvenience caused and appreciate your understanding.

  • Monitoring
    Monitoring

    The database backup has been successfully restored. All services are back online. We will keep investigating the root cause of this issue and will post updates as we find more information.

  • Identified
    Identified

    Our database cluster started experiencing issues around 4:15 AM UTC. Despite our best attempt we were not able to restore it to a healthy state. In order to restore service we decided to restore data from the latest backup. This operation is ongoing and should complete within the next hour. More information will be posted once service is restored.

  • Investigating
    Investigating

    [Sandbox] Orchestration - GraphQL API cannot be accessed at the moment. This incident was created by an automated monitoring service.

Jun 2024

[US] Orchestration - GraphQL API outage
  • Update
    Update

    We’re happy to let you know that we’ve wrapped up all the steps to fix the recent outage. Our team has identified and resumed 26 stuck care flows.

    Here’s what we’ve done:

    1. Fixing Bottlenecks: We found the root causes of the bottlenecks and made the necessary changes to sort them out.

    2. Better Alerts: We’ve set up new alerting policies that will notify us proactively if something similar happens again.

    3. Improved Recovery: We’ve added more recovery strategies to make sure care flows don’t get stuck after incidents like this.

    Thank you for your patience and understanding as we worked through this. We’re committed to providing you with reliable service and will continue to improve our systems.

  • Update
    Update

    An unexpected increase in the usage of the product put two key systems under stress (Message broker and Application database). We have identified what caused the bottleneck in each system (memory leak in the message broker, throttled CPU in the application database) and made the necessary changes to remove them.

    In addition, we've created new alerting policies that will proactively inform us should a similar scenario play out. This will enable us to take mitigation actions early enough to prevent system failures.

    We are still investigating the impact of this incident and will post another update when it has been identified.

  • Update
    Update

    The team is investigating the root cause to ensure the issue does not reoccur. Further updates will be provided once the investigation is complete.

  • Resolved
    Resolved

    [US] Orchestration - GraphQL API is now operational! This update was created by an automated monitoring service.

  • Investigating
    Investigating

    [US] Orchestration - GraphQL API cannot be accessed at the moment. This incident was created by an automated monitoring service.

Jun 2024 to Aug 2024

Next