Merge pull request #666 from EnterpriseDB/DOCS-3050

mpfuster · web-flow · commit 40bd239e09cf · 2026-01-30T10:01:46.000Z
WarehousePG Copy documentation
diff --git a/advocacy_docs/supported-open-source/warehousepg/index.mdx b/advocacy_docs/supported-open-source/warehousepg/index.mdx
@@ -6,6 +6,7 @@ navigation:
 - edbpggpsupp
 - observability
 - flowserver
+- whpg-copy
 directoryDefaults:
     iconName: BigData
 navRootedTo: /edb-postgres-ai/analytics/
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/index.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/index.mdx
@@ -0,0 +1,15 @@
+---
+title: WarehousePG Copy
+description: Covers the usage WarehousePG Copy to copy objects from one WarehousePG cluster to another.
+navigation:
+- release_notes
+- overview
+- installing
+- using
+- reference
+navRootedTo: /supported-open-source/warehousepg/
+---
+
+WarehousePG (WHPG) Copy, or `wgph-copy` is a high-performance utility designed for migrating data between a source and a destination WarehousePG (WHPG) cluster. By transferring data directly between segment hosts, it bypasses the coordinator bottleneck to achieve massive throughput.
+
+WarehousePG Copy is versioned independently of WarehousePG. Refer to [Supported platforms](overview/supported_platforms/) for compatibility information.
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/installing.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/installing.mdx
@@ -0,0 +1,70 @@
+---
+title: Installing WarehousePG Copy
+navTitle: Installing
+description: Learn how to install WarehousePG Copy.
+---
+
+You must install WarehousePG (WHPG) Copy on each host in your source and destination WarehousePG clusters. Optionally, you can also install and run the utility from a separate host, as long as it has access to both the source and destination WHPG coordinators.
+
+## Downloading and installing the utility
+
+1. On both source and destination coordinators, download the package from the EDB repository:
+
+    <TabContainer syncKey="download">
+    <Tab title="RHEL 8, 9">
+
+    ```bash
+    export EDB_SUBSCRIPTION_TOKEN=<your-token>
+    export EDB_REPO=gpsupp
+    curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
+    sudo dnf download edb-whpg-copy
+    ```
+    
+    </Tab>
+    <Tab title="RHEL 7">
+
+    ```bash
+    export EDB_SUBSCRIPTION_TOKEN=<your-token>
+    export EDB_REPO=gpsupp
+    curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
+    sudo yumdownloader edb-whpg-copy
+    ```
+
+    </Tab>
+    </TabContainer>
+
+1. On both source and destination coordinators, create a file `all_hosts` which lists all hosts in each WHPG cluster. For example:
+
+    ```ini
+    cdw
+    scdw
+    sdw1
+    sdw2
+    sdw3
+    ```
+
+1. From each coordinator, use the `gpsync` utility to transfer the package to all hosts in the cluster and then use use the `gpssh` utility to install the package:
+
+    <TabContainer syncKey="install">
+    <Tab title="RHEL 8, 9">
+
+    ```bash 
+    gpsync -f all_hosts <whpg-copy-package-name> =:/tmp
+    gpssh -f all_hosts -e 'sudo dnf install -y /tmp/<whpg-copy-package-name>'
+    ```
+    
+    Where `<whpg-copy-package-name>` is the name of the WarehousePG Copy package file you downloaded.
+
+    </Tab>
+    <Tab title="RHEL 7">
+
+    ```bash 
+    gpsync -f all_hosts <whpg-copy-package-name> =:/tmp
+    gpssh -f all_hosts -e 'sudo yum install -y /tmp/<whpg-copy-package-name>'
+    ```
+
+    Where `<whpg-copy-package-name>` is the name of the WarehousePG Copy package file you downloaded.
+
+    </Tab>
+    </TabContainer>
+
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/index.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/index.mdx
@@ -0,0 +1,16 @@
+---
+title: Overview of WarehousePG Copy
+navTitle: Overview
+description: Overview of WarehousePG Copy.
+navigation:
+- supported_platforms
+- known_issues
+---
+
+The WarehousePG (WHPG) Copy utility, or `whpg-copy`, is a high-performance migration tool designed to move data between source and destination WHPG clusters. By executing transfers directly between segment hosts, it bypasses the coordinator node bottleneck to achieve massive throughput.
+
+WarehousePG Copy streamlines the migration of database objects and datasets by managing the following critical tasks:
+- **Schema migration:** Automatically replicates schemas and table structures on the destination cluster if they are not already present.
+- **High-velocity data transfer:** Moves data efficiently using parallelized workers and optional compression to optimize network bandwidth.
+- **Dependency resolution:** Intelligently analyzes table relationships—such as foreign keys—to ensure data is loaded in the correct logical sequence.
+- **Operational resilience:** Provides comprehensive success and failure reporting, and automatically generates retry configurations to handle interrupted or failed tasks without restarting the entire process.
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/known_issues.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/known_issues.mdx
@@ -0,0 +1,11 @@
+---
+title: Known Issues
+navTitle: Known Issues
+description: Learn about known issues in WarehousePG Copy version 1.
+---
+
+These are the currently known issues and limitations identified in the WarehousePG Copy release. Where applicable, we have included workarounds to help you mitigate the impact of these issues. These issues are actively tracked and are planned for resolution in a future release.
+
+- **Support for password authentication**: `whpg-copy` currently only supports password authentication through a [password file](https://www.postgresql.org/docs/12/libpq-pgpass.html). Use `.pgpass` or the `PGPASSFILE` environment variable to set a password for the connection. If you specify a password with the `whpg-copy` command or in the TOML-based configuration file, it will be ignored.
+- **Views and materialized views:** `whpg-copy` supports the creation of both regular views and materialized views on the destination cluster. However, it does not perform a refresh of materialized views after the copy operation. You must manually execute `REFRESH MATERIALIZED VIEW` on the destination if you need the data to be up-to-date.
+- **Version compatibility:** Migration is supported between clusters of the same version or when upgrading from WarehousePG 6.x to WarehousePG 7.x. However, downward migration from version 7.x to 6.x is not supported.
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/supported_platforms.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/overview/supported_platforms.mdx
@@ -0,0 +1,15 @@
+---
+title: Supported platforms
+navTitle: Supported platforms
+description: Provides information for determining the platform support for WarehousePG Copy.
+---
+
+WarehousePG Copy 1.0 is compatible with the following versions of WarehousePG:
+
+- WarehousePG (WHPG) version 6.x running on RH7 or RH8.
+- WarehousePG version 7.x running on RH8 or RH9.
+
+!!! Important
+    Migration is supported between clusters of the same version or when upgrading from WarehousePG 6.x to WarehousePG 7.x. However, downward migration from version 7.x to 6.x is not supported.
+
+
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/config_file.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/config_file.mdx
@@ -0,0 +1,196 @@
+---
+title: WarehousePG Copy configuration file
+navTitle: whpg-copy configuration file
+description: The complete reference to the WarehousePG Copy configuration file.
+---
+
+This is the TOML-based configuration file reference for WarehousePG Copy TOML-based.
+
+!!! Note 
+    Command-line arguments take precedence over settings defined in a TOML file.
+
+You can generate a sample configuration file using the [whpg-copy config-example](whpg-copy-utility#config-example) command.
+
+```yaml
+# ==============================================================================
+# whpg-copy Configuration Example
+# ==============================================================================
+# Source database connection URL.
+# Supports standard PostgreSQL connection strings.
+src_url = "postgres://gpadmin:@10.0.0.1:5432/source_db"
+# Destination database connection URL.
+dst_url = "postgres://gpadmin:@10.0.0.2:5432/target_db"
+# Tables to include in the copy operation. To have uppercase letters or other
+# special characters in schema or table names, follow PostgreSQL's qualified
+# identifier rules to quote them.
+# Default: None
+include_table = [
+    "public.users",
+    "\"Inventory\".\"StockItems\""
+]
+# Tables to exclude.
+# Default: None
+exclude_table = [
+    "public.temp_cache",
+    "sales.test_data"
+]
+# Enable compression during data transfer to reduce network bandwidth usage.
+# Recommended for transfers over WAN or slow networks.
+# Default: true
+compression = true
+# Copy partitioned tables through their leaf partitions in parallel.
+# Disable this to enforce data goes through the root partition table or
+# intermediate partition table.
+# Default: true
+through_partition_leaves = true
+# How to handle existing tables on the destination:
+# "append"        : Insert data into existing tables.
+# "truncate"      : Clear the destination table before copying.
+# "skip-existing" : Do not copy if the table already exists.
+# Default: append
+target_mode = "append"
+# Validation method to perform after copying data:
+# "none"     : No validation.
+# "count"    : Compare row counts between source and destination.
+# "checksum" : Calculate and compare data hashes.
+# Default: none
+validate_method = "count"
+# Number of parallel workers to run.
+# Default: 4
+workers = 4
+# Listening port range on the destination for data transferring. Those
+# ports need to be enabled to be accessed from the source segments.
+# whpg-copy will try to start listening on the port one by one from
+# the range. At least one port is required in the range.
+port_range = "60000-60001"
+# Define rules to rename schemas or tables during the copy process.
+# Each rule requires at least a source pattern.
+#
+# public.sales -> new_schema.sales
+[[mapping_rules]]
+src_table= "sales"
+dst_schema = "new_schema"
+# old_schema.raw_logs -> new_schema.processed_logs
+[[mapping_rules]]
+src_schema = "old_schema"
+src_table = "raw_logs"
+dst_schema = "new_schema"
+dst_table = "processed_logs"
+[[mapping_rules]]
+src_schema = "old_schema(\\d+)"
+dst_schema = "new_schema${1}"
+src_table = "old_table(\\d+)"
+dst_table = "new_table${1}"
+# Run the operation without actually changing any data on the destination.
+dry_run = false
+```
+
+## Keywords and values
+
+
+**src_url**
+
+Connection string for the source database. Supports standard PostgreSQL connection strings. It follows the format `postgres://[user@]host[:port][/dbname]`.
+
+**dst_url**
+
+Connection string for the destination database. It follows the format `postgres://[user@]host[:port][/dbname]`.
+
+**include_table**
+
+Specifies tables to include. Use the format `schema.table` to specify the relations. If you are using special characters, follow PostgreSQL's qualified identifier rules to quote them.
+
+**exclude_table**
+
+Specifies tables to exclude. Uses the same format as `include-table`.
+
+**compression** 
+
+Enables or disables ZSTD compression during data transfer. Recommended for transfers over WAN or slow networks. Default is `true`.
+
+**through_partition_leaves**
+
+If `true` (default), copies data directly between leaf partitions in parallel. If `false`, data goes through the specified root/intermediate partition table.
+
+**target_mode**
+
+Determines how to handle existing tables on the destination. The supported options are:
+- **append**: (Default) Inserts data into existing tables.
+- **truncate**: Truncates the destination table before copying.
+- **skip-existing** : Skips the copy operation if the table already exists.
+
+**validate_method**
+
+Validation to perform after copying. The supported options are:
+- **none**: (Default) No validation.
+- **count**: Compares row counts.
+- **checksum**: Calculates and compares data hashes.
+
+**workers**
+
+Specifies the number of concurrent worker tasks. Default is 4.
+
+**port_range**
+
+Defines the ports on the destination cluster used to receive data from the source segments. `whpg-copy` scans this range sequentially and binds to the first available port it finds. You must specify at least one port and ensure the entire range is accessible from the source segment hosts.
+
+**mapping_rules**
+
+Mapping rules allow for powerful renaming and selection logic using regular expressions (Regex). You can define multiple `[[mapping_rules]]` blocks in your configuration file. The supported options are:
+
+- **src_schema**, **scr_table**: Define the source objects. These fields use standard Rust regular expression patterns to match your source objects. Patterns are automatically anchored (wrapped in `^` and `$`). For example, `src_table = "users"` matches only the table `"users"`, not `"super_users"`. To match multiple tables, use the `.*` wildcard.
+- **dst_schema**, **dst_table**: Define the destination objects. These fields support Regex's Capture Groups. If your source pattern contains groups in parentheses `()`, you can reference them in the destination using `${1}`, `${2}`, etcetera.  
+- **sql**: Custom SQL query to use for extracting data from the source table. Instead of copying the entire table, `whpg-copy` will execute this SQL and copy its result. Supports placeholders: `${src_schema}` and `${src_table}`, the utility will automatically replace them with the escaped source objects. This is ideal for joining tables, masking sensitive data, or changing data types on the fly.
+
+!!! Note
+   If your mapping rule involves a rename (the destination schema or table name is different from the source), `whpg-copy` cannot automatically create the table on the destination cluster. You must ensure the destination table exists with the correct schema before initiating the copy.
+!!!
+
+**dry_run**
+
+Run the operation without changing any data on the destination (default is `false`).
+
+## Examples
+
+- Copy the table `public.users` to a table named `public.customers`:
+
+    ```
+    [[mapping_rules]]
+    src_schema = "public"
+    src_table = "users"
+    dst_table = "customers"
+    ```
+
+- Copy all tables in schema `legacy` to the schema `archived`:
+
+    ```
+    [[mapping_rules]]
+    src_schema = "legacy"
+    src_table = ".*"        # Match all tables in 'legacy' schema
+    dst_schema = "archived"
+    ```
+
+- Copy using Capture Groups to dynamically rename tables during the copy operation. 
+
+    By using parentheses `()` in your `src_table` pattern, you can "save" parts of the table name and "paste" them into the new name.
+
+    ```
+    [[mapping_rules]]
+    src_table = "data_(\\d+)_(\\d+)"
+    dst_table = "record_${1}_v${2}"
+    ```
+
+    The `src_table` pattern looks for tables starting with `data_`, followed by two groups of digits.
+    
+    The `dst_table` template defines how the new table should be named using the saved groups.
+
+    As a result, a table named `data_2023_01` will be copied as `record_2023_v01`.
+
+
+- Use a custom SQL only to copy orders from 2024 onwards:
+
+    ```
+    [[mapping_rules]]
+    src_table = "orders"
+    sql = "SELECT * FROM ${src_schema}.${src_table} WHERE order_date >= '2024-01-01'"
+    ```
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/index.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/index.mdx
@@ -0,0 +1,16 @@
+---
+title: WarehousePG Copy reference
+navTitle: Reference
+description: The complete reference to WarehousePG Copy commands and configuration files.
+deepToc: true
+navigation:
+- whpg-copy-utility 
+- config_file
+---
+
+Command reference for WarehousePG Copy.
+
+Reference information for WarehousePG Copy command options and configuration file:
+
+- [whpg-copy command](whpg-copy-utility)
+- [whpg-copy configuration file](config_file)
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/whpg-copy-utility.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/reference/whpg-copy-utility.mdx
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/release_notes/1.0.0_rel_notes.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/release_notes/1.0.0_rel_notes.mdx
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/release_notes/index.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/release_notes/index.mdx
diff --git a/advocacy_docs/supported-open-source/warehousepg/whpg-copy/using.mdx b/advocacy_docs/supported-open-source/warehousepg/whpg-copy/using.mdx