Skip to content

Commit 98b4491

Browse files
committed
change cuda version
1 parent ed41781 commit 98b4491

File tree

2 files changed

+23
-27
lines changed

2 files changed

+23
-27
lines changed

README.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
11
# ComputeEngineGPU-Terraform
22

33
## 사용방법
4-
1. `src/s3_init` 디렉토리로 이동한다.
4+
### 처음 사용하는 경우
55
```bash
6-
cd src/s3_init
6+
make init
77
```
8-
2. tfstate를 원격으로 관리할 s3와 dynamoDB를 생성한다.
8+
### 나중에 사용하는 경우
99
```bash
10-
terraform init
11-
terraform plan
12-
terraform apply
10+
make
1311
```
14-
3. `src` 디렉토리로 이동한다.
12+
### gcp 인스턴스만 삭제하고 싶은 경우
1513
```bash
16-
cd ..
14+
make clean
1715
```
18-
4. instance를 생성한다. -> 1, 3시 테스트 성공, 4시 테스트 성공
16+
17+
### aws s3까지 삭제하고 싶은 경우 (완전 초기화)
18+
```bash
19+
make fclean
20+
```
21+
22+
### 꿀팁
23+
사용가능한 딥러닝 이미지를 보고 싶은 경우
1924
```bash
20-
terraform init
21-
terraform plan
22-
terraform apply
25+
gcloud compute images list --project deeplearning-platform-release | grep cu123
2326
```

src/modules/worker/main.tf

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ resource "google_compute_instance" "gpu_instance" {
1111

1212
boot_disk {
1313
initialize_params {
14-
// cuda image that works with cuda 11.8
15-
image = "deeplearning-platform-release/pytorch-latest-cu118"
14+
// cuda image that works with cuda 12.3
15+
image = "deeplearning-platform-release/common-cu123-notebooks-ubuntu-2204"
1616
type = "pd-ssd"
1717
size = 150
1818
}
@@ -103,7 +103,11 @@ resource "google_compute_instance" "gpu_instance" {
103103
"sudo systemctl start docker",
104104
"sudo systemctl enable docker",
105105
"echo ${var.dockerhub_pwd} | docker login -u ${var.dockerhub_id} --password-stdin", # dockerhub login
106-
"docker pull falconlee236/rl-image:graph-mamba" # pull train docker iamge
106+
"docker pull falconlee236/rl-image:parco-cuda123", # pull train docker iamge
107+
"sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key C99B11DEB97541F0",
108+
"sudo apt-add-repository https://cli.github.com/packages",
109+
"sudo apt update",
110+
"sudo apt install gh", # github cli install
107111
]
108112
connection {
109113
type = "ssh"
@@ -113,15 +117,4 @@ resource "google_compute_instance" "gpu_instance" {
113117
host = google_compute_instance.gpu_instance.network_interface[0].access_config[0].nat_ip
114118
}
115119
}
116-
}
117-
118-
/*
119-
단계
120-
1. 121 cuda 버전 성공 (o)
121-
2. github action destory 성공 (x)
122-
3. gcp cloud storage bucket 생성 + 마운트 성공 (o)
123-
4. docker image container volume 수정하는거 성공
124-
5. dockerimage 12.1 버전으로 빌드, push + tag
125-
6. dockerimage 11.7 버전으로 빌드, push + tag
126-
7. github action apply 성공
127-
*/
120+
}

0 commit comments

Comments
 (0)