Skip to content

Commit 40bd239

Browse files
authored
Merge pull request #666 from EnterpriseDB/DOCS-3050
WarehousePG Copy documentation
2 parents 5936cda + e59f097 commit 40bd239

File tree

12 files changed

+624
-0
lines changed

12 files changed

+624
-0
lines changed

advocacy_docs/supported-open-source/warehousepg/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ navigation:
66
- edbpggpsupp
77
- observability
88
- flowserver
9+
- whpg-copy
910
directoryDefaults:
1011
iconName: BigData
1112
navRootedTo: /edb-postgres-ai/analytics/
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: WarehousePG Copy
3+
description: Covers the usage WarehousePG Copy to copy objects from one WarehousePG cluster to another.
4+
navigation:
5+
- release_notes
6+
- overview
7+
- installing
8+
- using
9+
- reference
10+
navRootedTo: /supported-open-source/warehousepg/
11+
---
12+
13+
WarehousePG (WHPG) Copy, or `wgph-copy` is a high-performance utility designed for migrating data between a source and a destination WarehousePG (WHPG) cluster. By transferring data directly between segment hosts, it bypasses the coordinator bottleneck to achieve massive throughput.
14+
15+
WarehousePG Copy is versioned independently of WarehousePG. Refer to [Supported platforms](overview/supported_platforms/) for compatibility information.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
title: Installing WarehousePG Copy
3+
navTitle: Installing
4+
description: Learn how to install WarehousePG Copy.
5+
---
6+
7+
You must install WarehousePG (WHPG) Copy on each host in your source and destination WarehousePG clusters. Optionally, you can also install and run the utility from a separate host, as long as it has access to both the source and destination WHPG coordinators.
8+
9+
## Downloading and installing the utility
10+
11+
1. On both source and destination coordinators, download the package from the EDB repository:
12+
13+
<TabContainer syncKey="download">
14+
<Tab title="RHEL 8, 9">
15+
16+
```bash
17+
export EDB_SUBSCRIPTION_TOKEN=<your-token>
18+
export EDB_REPO=gpsupp
19+
curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
20+
sudo dnf download edb-whpg-copy
21+
```
22+
23+
</Tab>
24+
<Tab title="RHEL 7">
25+
26+
```bash
27+
export EDB_SUBSCRIPTION_TOKEN=<your-token>
28+
export EDB_REPO=gpsupp
29+
curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
30+
sudo yumdownloader edb-whpg-copy
31+
```
32+
33+
</Tab>
34+
</TabContainer>
35+
36+
1. On both source and destination coordinators, create a file `all_hosts` which lists all hosts in each WHPG cluster. For example:
37+
38+
```ini
39+
cdw
40+
scdw
41+
sdw1
42+
sdw2
43+
sdw3
44+
```
45+
46+
1. From each coordinator, use the `gpsync` utility to transfer the package to all hosts in the cluster and then use use the `gpssh` utility to install the package:
47+
48+
<TabContainer syncKey="install">
49+
<Tab title="RHEL 8, 9">
50+
51+
```bash
52+
gpsync -f all_hosts <whpg-copy-package-name> =:/tmp
53+
gpssh -f all_hosts -e 'sudo dnf install -y /tmp/<whpg-copy-package-name>'
54+
```
55+
56+
Where `<whpg-copy-package-name>` is the name of the WarehousePG Copy package file you downloaded.
57+
58+
</Tab>
59+
<Tab title="RHEL 7">
60+
61+
```bash
62+
gpsync -f all_hosts <whpg-copy-package-name> =:/tmp
63+
gpssh -f all_hosts -e 'sudo yum install -y /tmp/<whpg-copy-package-name>'
64+
```
65+
66+
Where `<whpg-copy-package-name>` is the name of the WarehousePG Copy package file you downloaded.
67+
68+
</Tab>
69+
</TabContainer>
70+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
title: Overview of WarehousePG Copy
3+
navTitle: Overview
4+
description: Overview of WarehousePG Copy.
5+
navigation:
6+
- supported_platforms
7+
- known_issues
8+
---
9+
10+
The WarehousePG (WHPG) Copy utility, or `whpg-copy`, is a high-performance migration tool designed to move data between source and destination WHPG clusters. By executing transfers directly between segment hosts, it bypasses the coordinator node bottleneck to achieve massive throughput.
11+
12+
WarehousePG Copy streamlines the migration of database objects and datasets by managing the following critical tasks:
13+
- **Schema migration:** Automatically replicates schemas and table structures on the destination cluster if they are not already present.
14+
- **High-velocity data transfer:** Moves data efficiently using parallelized workers and optional compression to optimize network bandwidth.
15+
- **Dependency resolution:** Intelligently analyzes table relationships—such as foreign keys—to ensure data is loaded in the correct logical sequence.
16+
- **Operational resilience:** Provides comprehensive success and failure reporting, and automatically generates retry configurations to handle interrupted or failed tasks without restarting the entire process.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: Known Issues
3+
navTitle: Known Issues
4+
description: Learn about known issues in WarehousePG Copy version 1.
5+
---
6+
7+
These are the currently known issues and limitations identified in the WarehousePG Copy release. Where applicable, we have included workarounds to help you mitigate the impact of these issues. These issues are actively tracked and are planned for resolution in a future release.
8+
9+
- **Support for password authentication**: `whpg-copy` currently only supports password authentication through a [password file](https://www.postgresql.org/docs/12/libpq-pgpass.html). Use `.pgpass` or the `PGPASSFILE` environment variable to set a password for the connection. If you specify a password with the `whpg-copy` command or in the TOML-based configuration file, it will be ignored.
10+
- **Views and materialized views:** `whpg-copy` supports the creation of both regular views and materialized views on the destination cluster. However, it does not perform a refresh of materialized views after the copy operation. You must manually execute `REFRESH MATERIALIZED VIEW` on the destination if you need the data to be up-to-date.
11+
- **Version compatibility:** Migration is supported between clusters of the same version or when upgrading from WarehousePG 6.x to WarehousePG 7.x. However, downward migration from version 7.x to 6.x is not supported.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Supported platforms
3+
navTitle: Supported platforms
4+
description: Provides information for determining the platform support for WarehousePG Copy.
5+
---
6+
7+
WarehousePG Copy 1.0 is compatible with the following versions of WarehousePG:
8+
9+
- WarehousePG (WHPG) version 6.x running on RH7 or RH8.
10+
- WarehousePG version 7.x running on RH8 or RH9.
11+
12+
!!! Important
13+
Migration is supported between clusters of the same version or when upgrading from WarehousePG 6.x to WarehousePG 7.x. However, downward migration from version 7.x to 6.x is not supported.
14+
15+
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
title: WarehousePG Copy configuration file
3+
navTitle: whpg-copy configuration file
4+
description: The complete reference to the WarehousePG Copy configuration file.
5+
---
6+
7+
This is the TOML-based configuration file reference for WarehousePG Copy TOML-based.
8+
9+
!!! Note
10+
Command-line arguments take precedence over settings defined in a TOML file.
11+
12+
You can generate a sample configuration file using the [whpg-copy config-example](whpg-copy-utility#config-example) command.
13+
14+
```yaml
15+
# ==============================================================================
16+
# whpg-copy Configuration Example
17+
# ==============================================================================
18+
# Source database connection URL.
19+
# Supports standard PostgreSQL connection strings.
20+
src_url = "postgres://gpadmin:@10.0.0.1:5432/source_db"
21+
# Destination database connection URL.
22+
dst_url = "postgres://gpadmin:@10.0.0.2:5432/target_db"
23+
# Tables to include in the copy operation. To have uppercase letters or other
24+
# special characters in schema or table names, follow PostgreSQL's qualified
25+
# identifier rules to quote them.
26+
# Default: None
27+
include_table = [
28+
"public.users",
29+
"\"Inventory\".\"StockItems\""
30+
]
31+
# Tables to exclude.
32+
# Default: None
33+
exclude_table = [
34+
"public.temp_cache",
35+
"sales.test_data"
36+
]
37+
# Enable compression during data transfer to reduce network bandwidth usage.
38+
# Recommended for transfers over WAN or slow networks.
39+
# Default: true
40+
compression = true
41+
# Copy partitioned tables through their leaf partitions in parallel.
42+
# Disable this to enforce data goes through the root partition table or
43+
# intermediate partition table.
44+
# Default: true
45+
through_partition_leaves = true
46+
# How to handle existing tables on the destination:
47+
# "append" : Insert data into existing tables.
48+
# "truncate" : Clear the destination table before copying.
49+
# "skip-existing" : Do not copy if the table already exists.
50+
# Default: append
51+
target_mode = "append"
52+
# Validation method to perform after copying data:
53+
# "none" : No validation.
54+
# "count" : Compare row counts between source and destination.
55+
# "checksum" : Calculate and compare data hashes.
56+
# Default: none
57+
validate_method = "count"
58+
# Number of parallel workers to run.
59+
# Default: 4
60+
workers = 4
61+
# Listening port range on the destination for data transferring. Those
62+
# ports need to be enabled to be accessed from the source segments.
63+
# whpg-copy will try to start listening on the port one by one from
64+
# the range. At least one port is required in the range.
65+
port_range = "60000-60001"
66+
# Define rules to rename schemas or tables during the copy process.
67+
# Each rule requires at least a source pattern.
68+
#
69+
# public.sales -> new_schema.sales
70+
[[mapping_rules]]
71+
src_table= "sales"
72+
dst_schema = "new_schema"
73+
# old_schema.raw_logs -> new_schema.processed_logs
74+
[[mapping_rules]]
75+
src_schema = "old_schema"
76+
src_table = "raw_logs"
77+
dst_schema = "new_schema"
78+
dst_table = "processed_logs"
79+
[[mapping_rules]]
80+
src_schema = "old_schema(\\d+)"
81+
dst_schema = "new_schema${1}"
82+
src_table = "old_table(\\d+)"
83+
dst_table = "new_table${1}"
84+
# Run the operation without actually changing any data on the destination.
85+
dry_run = false
86+
```
87+
88+
## Keywords and values
89+
90+
91+
**src_url**
92+
93+
Connection string for the source database. Supports standard PostgreSQL connection strings. It follows the format `postgres://[user@]host[:port][/dbname]`.
94+
95+
**dst_url**
96+
97+
Connection string for the destination database. It follows the format `postgres://[user@]host[:port][/dbname]`.
98+
99+
**include_table**
100+
101+
Specifies tables to include. Use the format `schema.table` to specify the relations. If you are using special characters, follow PostgreSQL's qualified identifier rules to quote them.
102+
103+
**exclude_table**
104+
105+
Specifies tables to exclude. Uses the same format as `include-table`.
106+
107+
**compression**
108+
109+
Enables or disables ZSTD compression during data transfer. Recommended for transfers over WAN or slow networks. Default is `true`.
110+
111+
**through_partition_leaves**
112+
113+
If `true` (default), copies data directly between leaf partitions in parallel. If `false`, data goes through the specified root/intermediate partition table.
114+
115+
**target_mode**
116+
117+
Determines how to handle existing tables on the destination. The supported options are:
118+
- **append**: (Default) Inserts data into existing tables.
119+
- **truncate**: Truncates the destination table before copying.
120+
- **skip-existing** : Skips the copy operation if the table already exists.
121+
122+
**validate_method**
123+
124+
Validation to perform after copying. The supported options are:
125+
- **none**: (Default) No validation.
126+
- **count**: Compares row counts.
127+
- **checksum**: Calculates and compares data hashes.
128+
129+
**workers**
130+
131+
Specifies the number of concurrent worker tasks. Default is 4.
132+
133+
**port_range**
134+
135+
Defines the ports on the destination cluster used to receive data from the source segments. `whpg-copy` scans this range sequentially and binds to the first available port it finds. You must specify at least one port and ensure the entire range is accessible from the source segment hosts.
136+
137+
**mapping_rules**
138+
139+
Mapping rules allow for powerful renaming and selection logic using regular expressions (Regex). You can define multiple `[[mapping_rules]]` blocks in your configuration file. The supported options are:
140+
141+
- **src_schema**, **scr_table**: Define the source objects. These fields use standard Rust regular expression patterns to match your source objects. Patterns are automatically anchored (wrapped in `^` and `$`). For example, `src_table = "users"` matches only the table `"users"`, not `"super_users"`. To match multiple tables, use the `.*` wildcard.
142+
- **dst_schema**, **dst_table**: Define the destination objects. These fields support Regex's Capture Groups. If your source pattern contains groups in parentheses `()`, you can reference them in the destination using `${1}`, `${2}`, etcetera.
143+
- **sql**: Custom SQL query to use for extracting data from the source table. Instead of copying the entire table, `whpg-copy` will execute this SQL and copy its result. Supports placeholders: `${src_schema}` and `${src_table}`, the utility will automatically replace them with the escaped source objects. This is ideal for joining tables, masking sensitive data, or changing data types on the fly.
144+
145+
!!! Note
146+
If your mapping rule involves a rename (the destination schema or table name is different from the source), `whpg-copy` cannot automatically create the table on the destination cluster. You must ensure the destination table exists with the correct schema before initiating the copy.
147+
!!!
148+
149+
**dry_run**
150+
151+
Run the operation without changing any data on the destination (default is `false`).
152+
153+
## Examples
154+
155+
- Copy the table `public.users` to a table named `public.customers`:
156+
157+
```
158+
[[mapping_rules]]
159+
src_schema = "public"
160+
src_table = "users"
161+
dst_table = "customers"
162+
```
163+
164+
- Copy all tables in schema `legacy` to the schema `archived`:
165+
166+
```
167+
[[mapping_rules]]
168+
src_schema = "legacy"
169+
src_table = ".*" # Match all tables in 'legacy' schema
170+
dst_schema = "archived"
171+
```
172+
173+
- Copy using Capture Groups to dynamically rename tables during the copy operation.
174+
175+
By using parentheses `()` in your `src_table` pattern, you can "save" parts of the table name and "paste" them into the new name.
176+
177+
```
178+
[[mapping_rules]]
179+
src_table = "data_(\\d+)_(\\d+)"
180+
dst_table = "record_${1}_v${2}"
181+
```
182+
183+
The `src_table` pattern looks for tables starting with `data_`, followed by two groups of digits.
184+
185+
The `dst_table` template defines how the new table should be named using the saved groups.
186+
187+
As a result, a table named `data_2023_01` will be copied as `record_2023_v01`.
188+
189+
190+
- Use a custom SQL only to copy orders from 2024 onwards:
191+
192+
```
193+
[[mapping_rules]]
194+
src_table = "orders"
195+
sql = "SELECT * FROM ${src_schema}.${src_table} WHERE order_date >= '2024-01-01'"
196+
```
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
title: WarehousePG Copy reference
3+
navTitle: Reference
4+
description: The complete reference to WarehousePG Copy commands and configuration files.
5+
deepToc: true
6+
navigation:
7+
- whpg-copy-utility
8+
- config_file
9+
---
10+
11+
Command reference for WarehousePG Copy.
12+
13+
Reference information for WarehousePG Copy command options and configuration file:
14+
15+
- [whpg-copy command](whpg-copy-utility)
16+
- [whpg-copy configuration file](config_file)

0 commit comments

Comments
 (0)