High Availability
Temporal Cloud's High Availability features use asynchronous replication across multiple isolation domains to provide enhanced resilience and a 99.99% SLA. When you enable High Availability features, Temporal deploys your primary and its replica in separate isolation domains, giving you control over the location of both. This redundancy, combined with failover capability, enhances availability during outages.
For an in-depth guide covering everything from why you need High Availability to setting it up in production and advanced options, read the High Availability White Paper.
Built-in reliability
Even without High Availability features, Temporal Cloud provides robust reliability and a 99.9% contractual Service Level Agreement (SLA) guarantee against service errors.
Each standard Temporal Namespace uses replication across three Availability Zones (AZs) to ensure high availability. An Availability Zone is akin to an isolated datacenter managed by a cloud hyperscaler, with independent power, networking, and cooling infrastructure.
Replication across AZs makes sure that any changes to Workflow state or History are saved in all three AZs before the Temporal Service acknowledges a change back to the Client. As a result, your standard Temporal Namespace stays operational even if one of its three AZs becomes unavailable. This provides the basis of the 99.9% service level agreement for Temporal Cloud Namespaces.
However some critical use cases--such as customer-facing applications--require even better availability. That is where Temporal Cloud's High Availabilty features come in.
High Availability features
High Availability features extend Temporal Cloud's replication offering across even more disparate isolation domains:
| Deployment | Description |
|---|---|
| Multi‑region Replication | Namespace is replicated across two cloud regions |
| Multi‑cloud Replication | Namespace is replicated across different cloud providers |
Key features
- Real-time replication — Temporal replicates your Namespace across distant isolation domains with no performance impact to your Workers or Workflows.
- Automatic failover with 20-minute RTO — Temporal manages failover with a 20-minute RTO. You can also trigger failover manually at any time, for example for testing.
- Transparent DNS routing — On failover, DNS reroutes your Namespace Endpoint to the active region. Requests that reach the replica are forwarded to the active region automatically.
- Sub-1-minute RPO — In a failover during an outage, the Recovery Point Objective is under one minute.
- Real-time lag monitoring — Monitor your Namespace's replication lag in real time to understand your current RPO.
- Conflict resolution — If the two regions are not fully in sync at the time of failover, Temporal's conflict resolution process reconciles discrepancies and ensures data integrity.
You can usually choose your replica region, but the replica must be on the same continent as the primary region. This means that a few Temporal Cloud regions do not yet support Multi-region Replication and/or Multi-cloud Replication. See Regions for a full list of supported replica regions.
You can't enable both Multi-region Replication and Multi-cloud Replciation on the same Namepsace at the same time.
Multi-cloud Replication
Multi-cloud Replication spreads a Namespace across entirely different cloud providers, keeping your Namespace running even during a cloud-wide outage. If a provider outage, service disruption, or network issue occurs, traffic automatically shifts to the replica.
Replicated data is encrypted and transmitted across the public internet between cloud providers. This internet connectivity also allows Workers in one cloud to reach the replica in a different cloud during failover. If you use private connectivity, additional architecture work may be required to ensure your Workers can reach the replica region.
When you adopt Temporal's High Availability features, don't forget to consider the reliability of your own workers, infrastructure, and dependencies. Issues like network outages, hardware failures, or misconfigurations in your own systems can affect your application performance.
For the highest level of reliability, distribute your dependencies across regions, and use our Multi-region or Multi-cloud replication features. Using physically separated regions improves the fault tolerance of your application.
Service levels and recovery objectives
Namespaces using High Availability have a 99.99% uptime SLA with sub-1-minute RPO and 20-minute RTO. For detailed information:
Failover
High Availability Namespaces can automatically or manually fail over to the replica if the primary is unavailable or unhealthy.
Target workloads
High Availability Namespaces are a great solution for Workloads where an outage would cause:
- Revenue loss
- Poor customer experience
- Problems stemming from policy/legal requirements that demand high availability
These are often major concerns for financial services, e-commerce, gaming, global SaaS platforms, bookings & reservations, delivery & shipping, and order management.
Same-region Replication
In selected regions, you can add a replica to a Namespace in the same region. Temporal operates a "cell architecture" and will replicate the Namespace across multiple cells in that region. This feature is currently in Public Preview in selected regions.