You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then we create the instniator instance, and set the number of multistarts to 32.
27
+
Then we create the instantiator instance and set the number of multistarts to 32.
28
28
```python
29
29
from qfactorjax.qfactor_sample_jax import QFactorSampleJax
30
30
@@ -89,9 +89,9 @@ For other usage examples, please refer to the [examples directory](https://githu
89
89
90
90
## Setting Up a Multi-GPU Environment
91
91
92
-
To run BQSKit with multiple GPUs, you need to set up the BQSKit runtime properly. Each worker should be assigned to a specific GPU by leveragig NVIDIA's CUDA_VISIBLE_DEVICES enviorment variable. Several workers can use the same GPU by utilizing [NVIDIA's MPS](https://docs.nvidia.com/deploy/mps/). You can set up the runtime on a single server ( or interactive node on a cluster) or using SBATCH on several nodes. You can find scripts to help you set up the runtime in this [link](https://github.com/BQSKit/bqskit-qfactor-jax/tree/main/examples/bqskit_env_scripts).
92
+
To run BQSKit with multiple GPUs, you need to set up the BQSKit runtime properly. Each worker should be assigned to a specific GPU by leveraging NVIDIA's CUDA_VISIBLE_DEVICES environment variable. Several workers can use the same GPU by utilizing [NVIDIA's MPS](https://docs.nvidia.com/deploy/mps/). You can set up the runtime on a single server ( or interactive node on a cluster) or using SBATCH on several nodes. You can find scripts to help you set up the runtime in this [link](https://github.com/BQSKit/bqskit-qfactor-jax/tree/main/examples/bqskit_env_scripts).
93
93
94
-
You may configure the number of GPUs to use on each server and also the number of workers on each GPU. If you use too many workers on the same GPU, you will run out of memory and experince an out-of-memory exception. If you are using QFactor, you may use the following table as a starting configuration and adjust the number of workers according to your specific circuit, unitary size, and GPU performance. If you are using QFactor-Sample, start with a single worker and increase if the memory premits it. You can use the `nvidia-smi` command to check the GPU usage during execution; it specifies the utilization of the memory and the execution units.
94
+
You may configure the number of GPUs to use on each server and also the number of workers on each GPU. If you use too many workers on the same GPU, you will run out of memory and experience an out-of-memory exception. If you are using QFactor, you may use the following table as a starting configuration and adjust the number of workers according to your specific circuit, unitary size, and GPU performance. If you are using QFactor-Sample, start with a single worker and increase if the memory permits it. You can use the `nvidia-smi` command to check the GPU usage during execution; it specifies the utilization of the memory and the execution units.
95
95
96
96
| Unitary Size | Workers per GPU |
97
97
|----------------|------------------|
@@ -101,7 +101,7 @@ You may configure the number of GPUs to use on each server and also the number o
101
101
| 7 | 2 |
102
102
| 8 and more | 1 |
103
103
104
-
Make sure that in your Python script you are creating the compiler object with the appropriate IP address. When running on the same node as the server, you can use \`localhost\` as the IP address.
104
+
Make sure that in your Python script, you are creating the compiler object with the appropriate IP address. When running on the same node as the server, you can use \`localhost\` as the IP address.
105
105
106
106
```python
107
107
with Compiler('localhost') as compiler:
@@ -110,12 +110,12 @@ with Compiler('localhost') as compiler:
110
110
111
111
112
112
### Single Server Multiple GPUs Setup
113
-
This section of the guide explains the main concepts in the [single_server_env.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/single_server_env.sh) script template and how to use it. The script creates a GPUenabled BQSKit runtime and is easily configured for any system.
113
+
This section of the guide explains the main concepts in the [single_server_env.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/single_server_env.sh) script template and how to use it. The script creates a GPU-enabled BQSKit runtime and is easily configured for any system.
114
114
115
-
After you configure the template (replacing every <> with an appropriate value) run it, and then in a seperate shell execute your python scirpt that uses this runtime enviorment.
115
+
After you configure the template (replacing every <> with an appropriate value) run it, and then in a separate shell execute your python script that uses this runtime environment.
116
116
117
-
The enviorment script has the following parts:
118
-
1. Variable configuration - choosing the number of GPUs to use, and the number of workrs per GPU. Moreover, the scratch dir path is configured, later to be used for logging.
117
+
The environment script has the following parts:
118
+
1. Variable configuration - choosing the number of GPUs to use, and the number of workers per GPU. Moreover, the scratch dir path is configured and later used for logging.
119
119
```bash
120
120
#!/bin/bash
121
121
hostname=$(uname -n)
@@ -149,7 +149,7 @@ wait_for_server_to_connect(){
149
149
done
150
150
}
151
151
```
152
-
3. Creating the log directory, and deleting any old log files that conflicts with the current run logs.
152
+
3. Creating the log directory, and deleting any old log files that conflict with the current run logs.
153
153
```bash
154
154
mkdir -p $scratch_dir/bqskit_logs
155
155
@@ -162,7 +162,7 @@ echo "Will start bqskit runtime with id $unique_id gpus = $amount_of_gpus and wo
162
162
rm -f $manager_log_file
163
163
rm -f $server_log_file
164
164
```
165
-
4. Starting NVIDA MPS to allow an efficient execution of multiple works on a single GPU.
165
+
4. Starting NVIDA MPS to allow efficient execution of multiple works on a single GPU.
6. Starting the BQSKit server indicating that there is a single manager in the current server. Waiting untill the server connects to the manager before continuing to start the workers.
178
+
6. Starting the BQSKit server indicating that there is a single manager in the current server. Waiting until the server connects to the manager before continuing to start the workers.
179
179
```bash
180
180
echo"starting BQSKit server"
181
181
bqskit-server $hostname -vvv &>>$server_log_file&
182
182
server_pid=$!
183
183
184
184
wait_for_server_to_connect
185
185
```
186
-
7. Starting the workrs, each seeing only a specific GPU.
186
+
7. Starting the workers, each seeing only a specific GPU.
187
187
```bash
188
188
echo"Starting $total_amount_of_workers workers on $amount_of_gpus gpus"
This section of the guide explains the main concepts in the [init_multi_node_multi_gpu_slurm_run.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/init_multi_node_multi_gpu_slurm_run.sh)[run_workers_and_managers.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/run_workers_and_managers.sh) scripts and how to use them. After configuring the scripts (updating every <>), place both of them in the same directory and initate a an SBATCH command. These scripts assume a SLURM enviorment, but can be easily ported to other disterbutation systems.
205
+
This section of the guide explains the main concepts in the [init_multi_node_multi_gpu_slurm_run.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/init_multi_node_multi_gpu_slurm_run.sh)[run_workers_and_managers.sh](https://github.com/BQSKit/bqskit-qfactor-jax/blob/main/examples/bqskit_env_scripts/run_workers_and_managers.sh) scripts and how to use them. After configuring the scripts (updating every <>), place both of them in the same directory and initiate an SBATCH command. These scripts assume a SLURM environment but can be easily ported to other distribution systems.
206
206
207
207
```bash
208
208
sbatch init_multi_node_multi_gpu_slurm_run.sh
209
209
```
210
210
211
-
The rest of this section exaplains in detail both of the scripts.
211
+
The rest of this section explains both of the scripts in detail.
212
212
213
213
#### init_multi_node_multi_gpu_slurm_run
214
214
This is a SLURM batch script for running a multi-node BQSKit task across multiple GPUs. It manages job submission, environment setup, launching the BQSKit server and workers on different nodes, and the execution of the main application.
@@ -227,9 +227,9 @@ This is a SLURM batch script for running a multi-node BQSKit task across multipl
227
227
scratch_dir=<temp_dir>
228
228
```
229
229
230
-
2. Shell environment setup - Please consulte with your HPC system admin to choose the apropriate modules to load that will enable you to JAX on NVDIA's GPUs. You may use NERSC's Perlmutter [documentation](https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#jax) as a reference.
230
+
2. Shell environment setup - Please consult with your HPC system admin to choose the appropriate modules to load that will enable you to JAX on NVDIA's GPUs. You may use NERSC's Perlmutter [documentation](https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#jax) as a reference.
231
231
```bash
232
-
### load any modules needed and activate the conda enviorment
232
+
### load any modules needed and activate the conda environment
233
233
module load <module1>
234
234
module load <module2>
235
235
conda activate <conda-env-name>
@@ -257,9 +257,9 @@ while [ "$(cat "$managers_started_file" | wc -l)" -lt "$n" ]; do
257
257
done
258
258
```
259
259
260
-
5. Starting the BQSKit server on the main node, and using SLURM's `SLURM_JOB_NODELIST`enviorment variable to indicate the BQSKit server the hostnames of the managers.
260
+
5. Starting the BQSKit server on the main node, and using SLURM's `SLURM_JOB_NODELIST`environment variable to indicate the BQSKit server the hostnames of the managers.
261
261
```bash
262
-
echo"starting BQSKit server on main node"
262
+
echo"starting BQSKit server on the main node"
263
263
bqskit-server $(scontrol show hostnames "$SLURM_JOB_NODELIST"| tr '\n''')&>$scratch_dir/bqskit_logs/server_${SLURM_JOB_ID}.log &
264
264
server_pid=$!
265
265
@@ -334,7 +334,7 @@ stop_mps_servers() {
334
334
}
335
335
```
336
336
337
-
Finaly, the script chekcs if GPUs are not needed, it spwans the manager with its default behaviour, else suing the "-x" argument, it indicates to the manager to wait for connecting workers.
337
+
Finally, the script checks if GPUs are not needed, it spawns the manager with its default behavior, else using the "-x" argument, it indicates to the manager to wait for connecting workers.
338
338
```bash
339
339
if [ $amount_of_gpus-eq 0 ];then
340
340
echo"Will run manager on node $node_id with n args of $amount_of_workers_per_gpu"
0 commit comments