Skip to content

[doc][ybm] DR for Aeon #26325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: Configure disaster recovery for an Aeon cluster
headerTitle: Disaster Recovery
linkTitle: Disaster recovery
description: Enable Disaster recovery for clusters
headContent: Fail over to a replica cluster in case of unplanned outages
tags:
feature: early-access
menu:
preview_yugabyte-cloud:
parent: cloud-clusters
identifier: disaster-recovery-aeon
weight: 500
type: indexpage
showRightNav: true
---

Use xCluster Disaster Recovery (DR) to recover from an unplanned outage (failover) or to perform a planned switchover. Planned switchover is commonly used for business continuity and disaster recovery testing, and failback after a failover.

A DR configuration consists of the following:

- a Source cluster, which serves both reads and writes.
- a Target cluster, which can also serve reads.

## RPO and RTO for failover and switchover

Data from the Source is replicated asynchronously to the Target (which is read only). Due to the asynchronous nature of the replication, DR failover results in non-zero recovery point objective (RPO). In other words, data not yet committed on the Target _can be lost_ during a failover. The amount of data loss depends on the replication lag, which in turn depends on the network characteristics between the clusters. By contrast, during a switchover RPO is zero, and no data is lost, because the switchover waits for all data to be committed on the Target _before_ switching over.

The recovery time objective (RTO) for failover or switchover is very low, and determined by how long it takes applications to switch their connections from one cluster to another. Applications should be designed in such a way that the switch happens as quickly as possible.

DR further allows for the role of each cluster to switch during planned switchover and unplanned failover scenarios.

![Disaster recovery](/images/yb-platform/disaster-recovery/disaster-recovery.png)

{{<lead link="../../../yugabyte-platform/back-up-restore-universes/disaster-recovery/#xcluster-dr-vs-xcluster-replication">}}
[xCluster DR vs xCluster Replication](../../../yugabyte-platform/back-up-restore-universes/disaster-recovery/#xcluster-dr-vs-xcluster-replication)
{{</lead>}}

&nbsp;

{{<index/block>}}

{{<index/item
title="Set up Disaster Recovery"
body="Designate a cluster to act as a Target."
href="disaster-recovery-setup/"
icon="fa-thin fa-umbrella">}}

{{<index/item
title="Unplanned failover"
body="Fail over to the Target in case of an unplanned outage."
href="disaster-recovery-failover/"
icon="fa-thin fa-cloud-bolt-sun">}}

{{<index/item
title="Planned switchover"
body="Switch over to the Target for planned testing and failback."
href="disaster-recovery-switchover/"
icon="fa-thin fa-toggle-on">}}

{{<index/item
title="Add and remove tables and indexes"
body="Perform DDL changes to databases in replication."
href="disaster-recovery-tables/"
icon="fa-thin fa-plus-minus">}}

{{</index/block>}}

## Schema changes

Table and index-level schema changes must be performed in the same order as follows:

1. The Source cluster.
2. The Target cluster.

You don't need to make any changes to the DR configuration.

{{<lead link="./disaster-recovery-tables/">}}
To learn more, refer to [Manage tables and indexes](./disaster-recovery-tables/)
{{</lead>}}

## Limitations

- Currently, automatic replication of DDL (SQL-level changes such as creating or dropping tables or indexes) is not supported. For more details on how to propagate DDL changes from the Source to the Target, see [Manage tables and indexes](./disaster-recovery-tables/).

- If a database operation requires a full copy, any application sessions on the database on the DR target will be interrupted while the database is dropped and recreated. Your application should either retry connections or redirect reads to the Source.
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
title: Unplanned failover to a target Aeon cluster
headerTitle: Unplanned failover
linkTitle: Failover
description: Unplanned failover to a target cluster
headContent: Failover of application traffic to the DR target
menu:
preview_yugabyte-cloud:
parent: disaster-recovery-aeon
identifier: disaster-recovery-failover-aeon
weight: 30
type: docs
---

Unplanned failover is the process of switching application traffic to the Target cluster in case the Source cluster becomes unavailable. One of the common reasons for such a scenario is an outage of the primary region.

## Perform failover

Use the following procedure to perform an unplanned failover to the Target and resume applications.

If the Source is terminated for some reason, do the following:

1. Stop the application traffic to ensure no more updates are attempted.

1. Navigate to your Source cluster **Disaster Recovery** tab.

1. Note the **Potential data loss on failover** to understand the extent of possible data loss as a result of the outage, and determine if the extent of data loss is acceptable for your situation.

- The potential data loss is computed as the safe time lag that existed at the current safe time on the Target.
- Use the **Tables** tab to understand which specific tables have the highest safe time lag and replication lag.

For more information on replication metrics, refer to [Replication](../../../../launch-and-manage/monitor-and-alert/metrics/replication/).

1. To proceed, click **Switchover** and choose **Failover**.

1. Enter the name of the Target and click **Failover**.

1. Click **Restart Replication**.

1. Resume the application traffic on the new Source.

At this point, the DR configuration is halted and needs to be repaired.

![Disaster recovery failed](/images/yb-platform/disaster-recovery/disaster-recovery-failed.png)

## Repair DR after failover

There are two options to repair a DR that has failed over:

- If the original Source has recovered and is fully functional with no active alerts, you can configure DR to use the cluster as a Target.
- If the original Source cannot be recovered, create a new cluster to be configured to act as the Target (see [Prerequisites](../disaster-recovery-setup/#prerequisites)).

In both cases, repairing DR involves making a full copy of the databases through the backup-restore process.

To repair DR, do the following:

1. Navigate to your (new) Source cluster **Disaster Recovery** tab.

1. Click **Repair DR** to display the **Repair DR** dialog.

![Repair DR](/images/yb-platform/disaster-recovery/disaster-recovery-repair.png)

1. If the current Target (formerly the Source) has recovered and is fully functional with no active alerts, choose **Reuse the current Target**.

To use a new cluster as the Target, choose **Select a new cluster as Target** and select the cluster.

1. Click **Initiate Repair**.

After the repair is complete, if your eventual desired configuration is for the Target (that is, the former Source if you chose Reuse, or the new one you added to DR to act as Target) to be the Source, follow the steps for [Planned switchover](../disaster-recovery-switchover/).

{{< warning title="Important" >}}
Do not attempt a switchover if you have not first repaired DR.
{{< /warning >}}
Loading