[GPU] RockyLinux 9.4 환경에서 RKE2 구성 및 GPU 테스트 - IV - Nvidia Container To…
페이지 정보
작성자 꿈꾸는여행자 작성일 25-01-10 11:15 조회 49 댓글 0본문
안녕하세요.
꿈꾸는여행자입니다.
지난 내용에 계속하여 올립니다.
이번 항목에서는
Nvidia Container Toolkit 설치에 대한 요건 확인 내용입니다.
상세 내역은 아래와 같습니다.
감사합니다.
> 아래
________________
목차
2. Installing the NVIDIA Container Toolkit - [Options]
2.1. Installation
2.1.1. Prerequisites
2.1.2. Installing with Yum or Dnf
2.1.2.1. Configure the production repository:
2.1.2.2. Install the NVIDIA Container Toolkit packages:
1.3.3.2. GPU 상태 확인:
2.2. Configuration
2.2.1. Prerequisites
2.2.2. Configuring containerd (for Kubernetes)
2.2.2.1. Configure the container runtime by using the nvidia-ctk command:
2.2.2.2. Restart containerd:
2.2.2.3. NVIDIA 런타임이 설정되었는지 확인
________________
2. Installing the NVIDIA Container Toolkit - [Options]
Installing the NVIDIA Container Toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf
2.1. Installation
2.1.1. Prerequisites
Install the NVIDIA GPU driver for your Linux distribution. NVIDIA recommends installing the driver by using the package manager for your distribution.
For information about installing the driver with a package manager, refer to the NVIDIA Driver Installation Quickstart Guide.
Alternatively, you can install the driver by downloading a .run installer. Refer to the NVIDIA Official Drivers page.
2.1.2. Installing with Yum or Dnf
2.1.2.1. Configure the production repository:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Optionally, configure the repository to use experimental packages:
sudo yum-config-manager --enable nvidia-container-toolkit-experimental
2.1.2.2. Install the NVIDIA Container Toolkit packages:
sudo dnf remove nvidia-container-toolkit
sudo dnf install nvidia-container-toolkit
[root@host ~]# sudo dnf install nvidia-container-toolkit
Last metadata expiration check: 0:19:49 ago on Thu 17 Oct 2024 01:50:08 PM KST.
Package nvidia-driver-3:560.35.03-1.el9.x86_64 is already installed.
Dependencies resolved.
===================================================================================================================================
Package Architecture Version Repository Size
===================================================================================================================================
Installing:
nvidia-container-toolkit x86_64 1.16.2-1 cuda-rhel8-x86_64 1.2 M
Installing dependencies:
libnvidia-container-tools x86_64 1.16.2-1 cuda-rhel8-x86_64 39 k
libnvidia-container1 x86_64 1.16.2-1 cuda-rhel8-x86_64 1.0 M
nvidia-container-toolkit-base x86_64 1.16.2-1 cuda-rhel8-x86_64 5.6 M
Transaction Summary
===================================================================================================================================
Install 4 Packages
Total download size: 7.8 M
Installed size: 26 M
Downloading Packages:
(1/4): libnvidia-container-tools-1.16.2-1.x86_64.rpm 134 kB/s | 39 kB 00:00
(2/4): libnvidia-container1-1.16.2-1.x86_64.rpm 1.8 MB/s | 1.0 MB 00:00
(3/4): nvidia-container-toolkit-1.16.2-1.x86_64.rpm 2.2 MB/s | 1.2 MB 00:00
(4/4): nvidia-container-toolkit-base-1.16.2-1.x86_64.rpm 13 MB/s | 5.6 MB 00:00
-----------------------------------------------------------------------------------------------------------------------------------
Total 11 MB/s | 7.8 MB 00:00
Running transaction check
1.3.3.2. GPU 상태 확인:
nvidia-smi
[root@host ~]# nvidia-smi
Thu Oct 17 14:49:37 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro T1000 Off | 00000000:01:00.0 Off | N/A |
| N/A 49C P8 1W / 50W | 59MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2150 G /usr/libexec/Xorg 50MiB |
| 0 N/A N/A 3638 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------------------+
[root@host ~]#
________________
2.2. Configuration
2.2.1. Prerequisites
You installed a supported container engine (Docker, Containerd, CRI-O, Podman).
You installed the NVIDIA Container Toolkit.
2.2.2. Configuring containerd (for Kubernetes)
2.2.2.1. Configure the container runtime by using the nvidia-ctk command:
sudo nvidia-ctk runtime configure --runtime=containerd
root@host ~]# sudo nvidia-ctk runtime configure --runtime=containerd
INFO[0000] Config file does not exist; using empty config
WARN[0000] could not infer options from runtimes [runc]; using defaults
INFO[0000] Wrote updated config to /etc/containerd/config.toml
INFO[0000] It is recommended that containerd daemon be restarted.
[root@host ~]#
The nvidia-ctk command modifies the /etc/containerd/config.toml file on the host. The file is updated so that containerd can use the NVIDIA Container Runtime.
2.2.2.2. Restart containerd:
* 3분 소요
sudo systemctl restart containerd
sudo systemctl restart rke2-server
2.2.2.3. NVIDIA 런타임이 설정되었는지 확인
NVIDIA 런타임이 containerd에 올바르게 설정되었는지 확인하려면 다음 명령어를 사용하여 설정을 확인할 수 있습니다.
* NVIDIA 런타임 설정이 다음과 같이 표시되어야 합니다:
sudo cat /etc/containerd/config.toml | grep -A 2 nvidia
[root@host ~]# sudo cat /etc/containerd/config.toml | grep -A 2 nvidia
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
--
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
[root@host ~]#
________________
댓글목록 0
등록된 댓글이 없습니다.