This feature is in preview and subject to change. To share feedback and/or issues, contact Support.
Physical cluster replication is supported in CockroachDB self-hosted clusters.
New in v23.2: Cutover in physical cluster replication (PCR) allows you to switch from the active primary cluster to the passive standby cluster that has ingested replicated data. When you complete the replication stream to initiate a cutover, the job stops replicating data from the primary, sets the standby virtual cluster to a point in time (in the past or future) where all ingested data is consistent, and then makes the standby virtual cluster ready to accept traffic.
Cutback using a new PCR stream switches operations back to the original primary cluster (or a new cluster) after a cutover event.
This page describes:
- Cutover from the primary cluster to the standby cluster.
- Cutback from the original standby cluster (after it was promoted during cutover) to the original primary cluster.
- Job management after a cutover or cutback.
Cutover and cutback do not redirect traffic automatically to the standby cluster. Once the cutover or cutback is complete, you must redirect application traffic to the standby (new) cluster. If you do not redirect traffic manually, writes to the primary (original) cluster may be lost.
Cutover
The cutover is a two-step process on the standby cluster:
Before you begin
During PCR, jobs running on the primary cluster will replicate to the standby cluster. Before you cut over to the standby cluster, or cut back to the original primary cluster, consider how you will manage running (replicated) jobs between the clusters. Refer to Job management for instructions.
Step 1. Initiate the cutover
To initiate a cutover to the standby cluster, you can specify the point in time for the standby's promotion. That is, the standby cluster's live data at the point of cutover. Refer to the following sections for steps:
LATEST
: The most recent replicated timestamp.- Point-in-time:
- Past: A past timestamp within the cutover window.
- Future: A future timestamp for planning a cutover.
Cut over to the most recent replicated time
To initiate a cutover to the most recent replicated timestamp, you can specify LATEST
when you start the cutover. The latest replicated time may be behind the actual time if there is replication lag in the stream. Replication lag is the time between the most up-to-date replicated time and the actual time.
To view the current replication timestamp, use:
SHOW VIRTUAL CLUSTER application WITH REPLICATION STATUS;
id | name | data_state | service_mode | source_tenant_name | source_cluster_uri | replication_job_id | replicated_time | retained_time | cutover_time -----+--------------------+-------------+--------------+--------------------+-----------------------------------------------------------------------------------------------------+--------------------+------------------------------+-------------------------------+--------------- 5 | application | replicating | none | application | postgresql://user:redacted@host?options=-ccluster%3Dsystem&sslmode=verify-full&sslrootcert=redacted | 911803003607220225 | 2023-10-26 17:36:52.27978+00 | 2023-10-26 14:36:52.279781+00 | NULL
Run the following from the standby cluster's SQL shell to start the cutover:
ALTER VIRTUAL CLUSTER application COMPLETE REPLICATION TO LATEST;
The
cutover_time
is the timestamp at which the replicated data is consistent. The cluster will revert any replicated data above this timestamp to ensure that the standby is consistent with the primary at that timestamp:cutover_time ---------------------------------- 1695922878030920020.0000000000 (1 row)
Cut over to a point in time
You can control the point in time that the PCR stream will cut over to.
To select a specific time in the past, use:
SHOW VIRTUAL CLUSTER application WITH REPLICATION STATUS;
The
retained_time
response provides the earliest time to which you can cut over.id | name | data_state | service_mode | source_tenant_name | source_cluster_uri | replication_job_id | replicated_time | retained_time | cutover_time ---+--------------------+--------------------+--------------+--------------------+----------------------------------------------------------------------------------------------------------------------+--------------------+-------------------------------+-------------------------------+--------------- 3 | application | replicating | none | application | postgresql://{user}:redacted@{hostname}:26257?options=-ccluster%3Dsystem&sslmode=verify-full&sslrootcert=redacted | 899090689449132033 | 2023-09-11 22:29:35.085548+00 | 2023-09-11 16:51:43.612846+00 | NULL (1 row)
Specify a timestamp:
ALTER VIRTUAL CLUSTER application COMPLETE REPLICATION TO SYSTEM TIME '-1h';
Refer to Using different timestamp formats for more information.
Similarly, to cut over to a specific time in the future:
ALTER VIRTUAL CLUSTER application COMPLETE REPLICATION TO SYSTEM TIME '+5h';
A future cutover will proceed once the replicated data has reached the specified time.
To monitor for when the replication stream completes, use SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS
to find the replication stream's replication_job_id
, which you can pass to SHOW JOB WHEN COMPLETE job_id
as the job_id
. Refer to the SHOW JOBS
page for details and an example.
Step 2. Complete the cutover
The completion of the replication is asynchronous; to monitor its progress use:
SHOW VIRTUAL CLUSTER application WITH REPLICATION STATUS;
id | name | data_state | service_mode | source_tenant_name | source_cluster_uri | replication_job_id | replicated_time | retained_time | cutover_time -----+---------------------+-----------------------------+--------------+--------------------+---------------------------------------------------------------------------------------------------------------------+--------------------+------------------------------+-------------------------------+--------------------------------- 4 | application | replication pending cutover | none | application | postgresql://{user}:{password}@{hostname}:26257?options=-ccluster%3Dsystem&sslmode=verify-full&sslrootcert=redacted | 903895265809498113 | 2023-09-28 17:41:18.03092+00 | 2023-09-28 16:09:04.327473+00 | 1695922878030920020.0000000000 (1 row)
Refer to Physical Cluster Replication Monitoring for the Responses and Data state of
SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS
fields.Once complete, bring the standby's virtual cluster online with:
ALTER VIRTUAL CLUSTER application START SERVICE SHARED;
id | name | data_state | service_mode -----+---------------------+--------------------+--------------- 1 | system | ready | shared 2 | template | ready | none 3 | application | ready | shared (3 rows)
To make the standby's virtual cluster the default for connection strings, set the following cluster setting:
SET CLUSTER SETTING server.controller.default_target_cluster='application';
At this point, the primary and standby clusters are entirely independent. You will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to the standby (now primary). To manage replicated jobs on the promoted standby, refer to Job management.
To enable PCR again, from the new primary to the original primary (or a completely different cluster), refer to Cut back to the primary cluster.
Cut back to the primary cluster
After cutting over to the standby cluster, you may need to move back to the original primary cluster, or a completely different cluster. This process is manual and requires starting a new PCR stream.
For example, if you had set up PCR between a primary and standby cluster and then cut over to the standby, the workflow to cut back to the original primary cluster would be as follows:
- Original primary cluster = Cluster A
- Original standby cluster = Cluster B
- Cluster B is now serving application traffic after the cutover.
- Drop the application virtual cluster from the cluster A with
DROP VIRTUAL CLUSTER
. - Start a PCR stream that sends updates from cluster B to cluster A. Refer to Start replication.
At this point, Cluster A is once again the primary and Cluster B is once again the standby. The clusters are entirely independent. To direct application traffic to the primary (Cluster A), you will need to use your own network load balancers, DNS servers, or other network configuration to direct application traffic to Cluster A. To manage replicated jobs on the promoted standby, refer to Job management.
To enable physical cluster replication again, from the primary to the standby (or a completely different cluster), refer to Set Up Physical Cluster Replication.
Job management
During a replication stream, jobs running on the primary cluster will replicate to the standby cluster. Once you have completed a cutover (or a cutback), refer to the following sections for details on resuming jobs on the promoted cluster.
Backup schedules
Backup schedules will pause after cutover on the promoted cluster. Take the following steps to resume jobs:
- Verify that there are no other schedules running backups to the same collection of backups, i.e., the schedule that was running on the original primary cluster.
- Resume the backup schedule on the promoted cluster.
If your backup schedule was created on a cluster in v23.1 or earlier, it will not pause automatically on the promoted cluster after cutover. In this case, you must pause the schedule manually on the promoted cluster and then take the outlined steps.
Changefeeds
Changefeeds will fail on the promoted cluster immediately after cutover to avoid two clusters running the same changefeed to one sink. We recommend that you recreate changefeeds on the promoted cluster.
Scheduled changefeeds will continue on the promoted cluster. You will need to manage pausing or canceling the schedule on the promoted standby cluster to avoid two clusters running the same changefeed to one sink.