ROCm简单入门 - 使用AMD显卡加速PyTorch

ROCm 在如今已经成为继 CUDA 之后，第二大 GPU 并行计算平台，就 PyTorch 而言，PyTorch 的 ROCm 版本在 Python 应用程序接口层面使用了相同的语义所以从现有的代码迁移到 ROCm 版本的 PyTorch 几乎不需要进行任何修改。尽管 ROCm 可能相比 CUDA 存在一定的性能损失，但 AMD GPU 以相对较低的硬件价格使得 AMD+ROCm 的搭配成为人工智能方面不二的性价比之选

本文暗雨冥将简单介绍如何在 AMD GPU 上使用 ROCm 加速 PyTorch，并补充部分官方教程中遗漏的部分细节，让我们开始吧~

硬件/系统配置#

暗雨冥使用的是 AMD Radeon RX 7800 XT + AMD Ryzen R5 9600X + 32GB DDR5 的配置，该配置仅供参考，具体硬件需求请参考AMD 的官方文档

系统方面，AMD 官方支持 Ubuntu,Red Hat Enterprise Linux(RHEL),SUSE Linux Enterprise Server(SLES) 三大主流商业 Linux 发行版，与其颇有关系的发行版如 Linux Mint,Rocky Linux,OpenSUSE 等大概率也可以正常使用，但 AMD 官方看上去更希望用户使用 Ubuntu（不少文档只提供 Ubuntu 版本），暗雨冥因此在此为了避免潜在的问题也选择了基于 Ubuntu 22.04 LTS 的 Zorin OS 17.2（主要是长得好看ヾ(≧▽≦*)o）

*注暂不支持 Windows，如需在 Windows 平台上使用 ROCm，需借助 WSL2，这部分请直接参考AMD 官方文档

ROCm 安装#

ROCm 的安装实际上非常简单，参考 AMD 的官方文档在 Ubuntu 上，直接执行以下命令即可完成安装：

1
sudo apt update
2
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
3
sudo usermod -a -G render,video $LOGNAME # 将当前用户添加至 render 和 video 组以便无需 root 权限即可访问 AMD GPU
4
wget https://repo.radeon.com/amdgpu-install/6.2.2/ubuntu/jammy/amdgpu-install_6.2.60202-1_all.deb #jammy 为 Ubuntu 22.04的代号，对于 Ubuntu 24.04 及其衍生版，请将 jammy 替换为 noble
5
sudo apt install ./amdgpu-install_6.2.60202-1_all.deb
6
sudo apt update
7
sudo apt install amdgpu-dkms rocm

在此之后，还需做一些额外的配置

配置 ld#

1
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
2
/opt/rocm/lib
3
/opt/rocm/lib64
4
EOF
5
sudo ldconfig

将 ROCm 的可执行文件添加至 PATH#

Plan A:使用 update-alternatives 大多数 Linux 发行版都有 update-alternatives 工具。它有助于管理命令或程序的多个版本。有关 update-alternatives 的更多信息，请参阅 Linux man 文档。使用以下指令完成配置：
1. 列出所有被支持的 ROCm 命令：
```
1
update-alternatives --list rocm
```
1. 如果安装了多个 ROCm 版本，update-alternatives 会选择使用最新版本。如需指定想要使用的 ROCm 版本，请使用此命令：
```
1
update-alternatives --config rocm
```
Plan B:使用 environment-modules environment-modules 工具简化了 shell 初始化。它允许你使用模块文件修改会话环境。更多信息，请参阅 Environment Modules 文档。使用以下指令完成配置：
1. 列出可用的 ROCm 版本：
```
1
module avail
```
2.如果安装了多个 ROCm 版本，使用以下命令选择所需的版本
```
1
module load rocm/<version>
```
Plan C:手动配置 ROCm 模块文件位于 /opt/rocm-/lib/rocmmod 目录下，如果以上方法均无法满足需求，可手动将 ROCm 的可执行文件添加至 PATH 如，在 .bashrc 中添加以下内容：
```
1
export PATH=$PATH:/opt/rocm-6.2.2/bin
```

验证内核驱动程序，ROCm，软件包安装状态#

1
dkms status
2
rocminfo
3
clinfo
4
apt list --installed #这一步可能会列出大量已安装的软件包

参考输出请转到文末处查看

重启以确保 ROCm 配置生效#

1
reboot

PyTorch 安装#

AMD 官方推荐使用 docker 镜像，以方便管理，可参考AMD 的官方文档在这里由于暗雨冥懒得装 Docker，直接选择 pip 安装这里可以直接按照 PyTorch 官网指引，执行以下命令安装

1
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

如果存在网络问题，可以考虑使用下载工具下载对应的 whl 文件，再使用 pip 安装如果不出意外，PyTorch 就成功安装啦~ 我们可以简单验证一下，在 Python 中执行以下指令验证 PyTorch 是否成功安装：

1
import torch
2
x = torch.rand(5, 3)
3
print(x)

输出应该与下文类似：

1
tensor([[0.3380, 0.3845, 0.3217],
2
        [0.8337, 0.9050, 0.2650],
3
        [0.2979, 0.7141, 0.9069],
4
        [0.1449, 0.1132, 0.1375],
5
        [0.4675, 0.3947, 0.1426]])

在 Python 中执行以下指令验证 ROCm 是否正常工作：

1
import torch
2
torch.cuda.is_available()

如果得到了 True，那么恭喜你，至此大功告成但如果很不幸，ROCm 不可用，可以继续往下看执行以下指令，查看日志，并尝试找出可疑的输出，并善用搜索

1
export AMD_LOG_LEVEL=7
2
python -c "import torch;print(torch.cuda.is_available())"

值得一提的是，如果rocm-smi等工具无异常，有很大可能由于用户不在 render 组内，执行以下命令重新添加用户至 render 及 video 组

1
sudo usermod -a -G render,video $LOGNAME

完成后请重启系统

1
reboot

附#

验证内核驱动程序，ROCm，软件包安装状态命令行参考输出#

1
# dkms status
2
amdgpu/6.8.5-2041575.22.04, 6.8.0-49-generic, x86_64: installed (original_module exists)
3
amdgpu/6.8.5-2041575.22.04, 6.8.0-52-generic, x86_64: installed (original_module exists)

1
# rocminfo
2
[37mROCk module version 6.8.5 is loaded[0m
3
=====================
4
HSA System Attributes
5
=====================
6
Runtime Version:         1.14
7
Runtime Ext Version:     1.6
8
System Timestamp Freq.:  1000.000000MHz
9
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
10
Machine Model:           LARGE
11
System Endianness:       LITTLE
12
Mwaitx:                  DISABLED
13
DMAbuf Support:          YES
14

15
==========
16
HSA Agents
17
==========
18
*******
19
Agent 1
20
*******
21
  Name:                    AMD Ryzen 5 9600X 6-Core Processor
22
  Uuid:                    CPU-XX
23
  Marketing Name:          AMD Ryzen 5 9600X 6-Core Processor
24
  Vendor Name:             CPU
25
  Feature:                 None specified
26
  Profile:                 FULL_PROFILE
27
  Float Round Mode:        NEAR
28
  Max Queue Number:        0(0x0)
29
  Queue Min Size:          0(0x0)
30
  Queue Max Size:          0(0x0)
31
  Queue Type:              MULTI
32
  Node:                    0
33
  Device Type:             CPU
34
  Cache Info:
35
    L1:                      49152(0xc000) KB
36
  Chip ID:                 0(0x0)
37
  ASIC Revision:           0(0x0)
38
  Cacheline Size:          64(0x40)
39
  Max Clock Freq. (MHz):   5484
40
  BDFID:                   0
41
  Internal Node ID:        0
42
  Compute Unit:            12
43
  SIMDs per CU:            0
44
  Shader Engines:          0
45
  Shader Arrs. per Eng.:   0
46
  WatchPts on Addr. Ranges:1
47
  Memory Properties:
48
  Features:                None
49
  Pool Info:
50
    Pool 1
51
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
52
      Size:                    31870192(0x1e64cf0) KB
53
      Allocatable:             TRUE
54
      Alloc Granule:           4KB
55
      Alloc Recommended Granule:4KB
56
      Alloc Alignment:         4KB
57
      Accessible by all:       TRUE
58
    Pool 2
59
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
60
      Size:                    31870192(0x1e64cf0) KB
61
      Allocatable:             TRUE
62
      Alloc Granule:           4KB
63
      Alloc Recommended Granule:4KB
64
      Alloc Alignment:         4KB
65
      Accessible by all:       TRUE
66
    Pool 3
67
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
68
      Size:                    31870192(0x1e64cf0) KB
69
      Allocatable:             TRUE
70
      Alloc Granule:           4KB
71
      Alloc Recommended Granule:4KB
72
      Alloc Alignment:         4KB
73
      Accessible by all:       TRUE
74
  ISA Info:
75
*******
76
Agent 2
77
*******
78
  Name:                    gfx1100
79
  Uuid:                    GPU-3fbe3742bc309e9e
80
  Marketing Name:          AMD Radeon RX 7800 XT
81
  Vendor Name:             AMD
82
  Feature:                 KERNEL_DISPATCH
83
  Profile:                 BASE_PROFILE
84
  Float Round Mode:        NEAR
85
  Max Queue Number:        128(0x80)
86
  Queue Min Size:          64(0x40)
87
  Queue Max Size:          131072(0x20000)
88
  Queue Type:              MULTI
89
  Node:                    1
90
  Device Type:             GPU
91
  Cache Info:
92
    L1:                      32(0x20) KB
93
    L2:                      4096(0x1000) KB
94
    L3:                      65536(0x10000) KB
95
  Chip ID:                 29822(0x747e)
96
  ASIC Revision:           0(0x0)
97
  Cacheline Size:          64(0x40)
98
  Max Clock Freq. (MHz):   2169
99
  BDFID:                   768
100
  Internal Node ID:        1
101
  Compute Unit:            60
102
  SIMDs per CU:            2
103
  Shader Engines:          3
104
  Shader Arrs. per Eng.:   2
105
  WatchPts on Addr. Ranges:4
106
  Coherent Host Access:    FALSE
107
  Memory Properties:
108
  Features:                KERNEL_DISPATCH
109
  Fast F16 Operation:      TRUE
110
  Wavefront Size:          32(0x20)
111
  Workgroup Max Size:      1024(0x400)
112
  Workgroup Max Size per Dimension:
113
    x                        1024(0x400)
114
    y                        1024(0x400)
115
    z                        1024(0x400)
116
  Max Waves Per CU:        32(0x20)
117
  Max Work-item Per CU:    1024(0x400)
118
  Grid Max Size:           4294967295(0xffffffff)
119
  Grid Max Size per Dimension:
120
    x                        4294967295(0xffffffff)
121
    y                        4294967295(0xffffffff)
122
    z                        4294967295(0xffffffff)
123
  Max fbarriers/Workgrp:   32
124
  Packet Processor uCode:: 232
125
  SDMA engine uCode::      22
126
  IOMMU Support::          None
127
  Pool Info:
128
    Pool 1
129
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
130
      Size:                    16760832(0xffc000) KB
131
      Allocatable:             TRUE
132
      Alloc Granule:           4KB
133
      Alloc Recommended Granule:2048KB
134
      Alloc Alignment:         4KB
135
      Accessible by all:       FALSE
136
    Pool 2
137
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
138
      Size:                    16760832(0xffc000) KB
139
      Allocatable:             TRUE
140
      Alloc Granule:           4KB
141
      Alloc Recommended Granule:2048KB
142
      Alloc Alignment:         4KB
143
      Accessible by all:       FALSE
144
    Pool 3
145
      Segment:                 GROUP
146
      Size:                    64(0x40) KB
147
      Allocatable:             FALSE
148
      Alloc Granule:           0KB
149
      Alloc Recommended Granule:0KB
150
      Alloc Alignment:         0KB
151
      Accessible by all:       FALSE
152
  ISA Info:
153
    ISA 1
154
      Name:                    amdgcn-amd-amdhsa--gfx1100
155
      Machine Models:          HSA_MACHINE_MODEL_LARGE
156
      Profiles:                HSA_PROFILE_BASE
157
      Default Rounding Mode:   NEAR
158
      Default Rounding Mode:   NEAR
159
      Fast f16:                TRUE
160
      Workgroup Max Size:      1024(0x400)
161
      Workgroup Max Size per Dimension:
162
        x                        1024(0x400)
163
        y                        1024(0x400)
164
        z                        1024(0x400)
165
      Grid Max Size:           4294967295(0xffffffff)
166
      Grid Max Size per Dimension:
167
        x                        4294967295(0xffffffff)
168
        y                        4294967295(0xffffffff)
169
        z                        4294967295(0xffffffff)
170
      FBarrier Max Size:       32
171
*******
172
Agent 3
173
*******
174
  Name:                    gfx1100
175
  Uuid:                    GPU-XX
176
  Marketing Name:          AMD Radeon Graphics
177
  Vendor Name:             AMD
178
  Feature:                 KERNEL_DISPATCH
179
  Profile:                 BASE_PROFILE
180
  Float Round Mode:        NEAR
181
  Max Queue Number:        128(0x80)
182
  Queue Min Size:          64(0x40)
183
  Queue Max Size:          131072(0x20000)
184
  Queue Type:              MULTI
185
  Node:                    2
186
  Device Type:             GPU
187
  Cache Info:
188
    L1:                      16(0x10) KB
189
    L2:                      256(0x100) KB
190
  Chip ID:                 5056(0x13c0)
191
  ASIC Revision:           1(0x1)
192
  Cacheline Size:          64(0x40)
193
  Max Clock Freq. (MHz):   2200
194
  BDFID:                   5376
195
  Internal Node ID:        2
196
  Compute Unit:            2
197
  SIMDs per CU:            2
198
  Shader Engines:          1
199
  Shader Arrs. per Eng.:   1
200
  WatchPts on Addr. Ranges:4
201
  Coherent Host Access:    FALSE
202
  Memory Properties:       APU
203
  Features:                KERNEL_DISPATCH
204
  Fast F16 Operation:      TRUE
205
  Wavefront Size:          32(0x20)
206
  Workgroup Max Size:      1024(0x400)
207
  Workgroup Max Size per Dimension:
208
    x                        1024(0x400)
209
    y                        1024(0x400)
210
    z                        1024(0x400)
211
  Max Waves Per CU:        32(0x20)
212
  Max Work-item Per CU:    1024(0x400)
213
  Grid Max Size:           4294967295(0xffffffff)
214
  Grid Max Size per Dimension:
215
    x                        4294967295(0xffffffff)
216
    y                        4294967295(0xffffffff)
217
    z                        4294967295(0xffffffff)
218
  Max fbarriers/Workgrp:   32
219
  Packet Processor uCode:: 21
220
  SDMA engine uCode::      9
221
  IOMMU Support::          None
222
  Pool Info:
223
    Pool 1
224
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
225
      Size:                    15935096(0xf32678) KB
226
      Allocatable:             TRUE
227
      Alloc Granule:           4KB
228
      Alloc Recommended Granule:2048KB
229
      Alloc Alignment:         4KB
230
      Accessible by all:       FALSE
231
    Pool 2
232
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
233
      Size:                    15935096(0xf32678) KB
234
      Allocatable:             TRUE
235
      Alloc Granule:           4KB
236
      Alloc Recommended Granule:2048KB
237
      Alloc Alignment:         4KB
238
      Accessible by all:       FALSE
239
    Pool 3
240
      Segment:                 GROUP
241
      Size:                    64(0x40) KB
242
      Allocatable:             FALSE
243
      Alloc Granule:           0KB
244
      Alloc Recommended Granule:0KB
245
      Alloc Alignment:         0KB
246
      Accessible by all:       FALSE
247
  ISA Info:
248
    ISA 1
249
      Name:                    amdgcn-amd-amdhsa--gfx1100
250
      Machine Models:          HSA_MACHINE_MODEL_LARGE
251
      Profiles:                HSA_PROFILE_BASE
252
      Default Rounding Mode:   NEAR
253
      Default Rounding Mode:   NEAR
254
      Fast f16:                TRUE
255
      Workgroup Max Size:      1024(0x400)
256
      Workgroup Max Size per Dimension:
257
        x                        1024(0x400)
258
        y                        1024(0x400)
259
        z                        1024(0x400)
260
      Grid Max Size:           4294967295(0xffffffff)
261
      Grid Max Size per Dimension:
262
        x                        4294967295(0xffffffff)
263
        y                        4294967295(0xffffffff)
264
        z                        4294967295(0xffffffff)
265
      FBarrier Max Size:       32
266
*** Done ***

1
# clinfo
2
Number of platforms:         1
3
  Platform Profile:         FULL_PROFILE
4
  Platform Version:         OpenCL 2.1 AMD-APP (3625.0)
5
  Platform Name:         AMD Accelerated Parallel Processing
6
  Platform Vendor:         Advanced Micro Devices, Inc.
7
  Platform Extensions:         cl_khr_icd cl_amd_event_callback
8

9

10
  Platform Name:         AMD Accelerated Parallel Processing
11
Number of devices:         2
12
  Device Type:           CL_DEVICE_TYPE_GPU
13
  Vendor ID:           1002h
14
  Board name:           AMD Radeon RX 7800 XT
15
  Device Topology:         PCI[ B#3, D#0, F#0 ]
16
  Max compute units:         30
17
  Max work items dimensions:       3
18
    Max work items[0]:         1024
19
    Max work items[1]:         1024
20
    Max work items[2]:         1024
21
  Max work group size:         256
22
  Preferred vector width char:       4
23
  Preferred vector width short:       2
24
  Preferred vector width int:       1
25
  Preferred vector width long:       1
26
  Preferred vector width float:       1
27
  Preferred vector width double:     1
28
  Native vector width char:       4
29
  Native vector width short:       2
30
  Native vector width int:       1
31
  Native vector width long:       1
32
  Native vector width float:       1
33
  Native vector width double:       1
34
  Max clock frequency:         2169Mhz
35
  Address bits:           64
36
  Max memory allocation:       14588628168
37
  Image support:         Yes
38
  Max number of images read arguments:     128
39
  Max number of images write arguments:     8
40
  Max image 2D width:         16384
41
  Max image 2D height:         16384
42
  Max image 3D width:         16384
43
  Max image 3D height:         16384
44
  Max image 3D depth:         8192
45
  Max samplers within kernel:       16
46
  Max size of kernel argument:       1024
47
  Alignment (bits) of base address:     1024
48
  Minimum alignment (bytes) for any datatype:   128
49
  Single precision floating point capability
50
    Denorms:           Yes
51
    Quiet NaNs:           Yes
52
    Round to nearest even:       Yes
53
    Round to zero:         Yes
54
    Round to +ve and infinity:       Yes
55
    IEEE754-2008 fused multiply-add:     Yes
56
  Cache type:           Read/Write
57
  Cache line size:         64
58
  Cache size:           32768
59
  Global memory size:         17163091968
60
  Constant buffer size:         14588628168
61
  Max number of constant args:       8
62
  Local memory type:         Local
63
  Local memory size:         65536
64
  Max pipe arguments:         16
65
  Max pipe active reservations:       16
66
  Max pipe packet size:         1703726280
67
  Max global variable size:       14588628168
68
  Max global variable preferred total size:   17163091968
69
  Max read/write image args:       64
70
  Max on device events:         1024
71
  Queue on device max size:       8388608
72
  Max on device queues:         1
73
  Queue on device preferred size:     262144
74
  SVM capabilities:
75
    Coarse grain buffer:       Yes
76
    Fine grain buffer:         Yes
77
    Fine grain system:         No
78
    Atomics:           No
79
  Preferred platform atomic alignment:     0
80
  Preferred global atomic alignment:     0
81
  Preferred local atomic alignment:     0
82
  Kernel Preferred work group size multiple:   32
83
  Error correction support:       0
84
  Unified memory for Host and Device:     0
85
  Profiling timer resolution:       1
86
  Device endianess:         Little
87
  Available:           Yes
88
  Compiler available:         Yes
89
  Execution capabilities:
90
    Execute OpenCL kernels:       Yes
91
    Execute native function:       No
92
  Queue on Host properties:
93
    Out-of-Order:         No
94
    Profiling :           Yes
95
  Queue on Device properties:
96
    Out-of-Order:         Yes
97
    Profiling :           Yes
98
  Platform ID:           0x7e6eab7f0ff0
99
  Name:             gfx1101
100
  Vendor:           Advanced Micro Devices, Inc.
101
  Device OpenCL C version:       OpenCL C 2.0
102
  Driver version:         3625.0 (HSA1.1,LC)
103
  Profile:           FULL_PROFILE
104
  Version:           OpenCL 2.0
105
  Extensions:           cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
106

107

108
  Device Type:           CL_DEVICE_TYPE_GPU
109
  Vendor ID:           1002h
110
  Board name:           AMD Radeon Graphics
111
  Device Topology:         PCI[ B#21, D#0, F#0 ]
112
  Max compute units:         1
113
  Max work items dimensions:       3
114
    Max work items[0]:         1024
115
    Max work items[1]:         1024
116
    Max work items[2]:         1024
117
  Max work group size:         256
118
  Preferred vector width char:       4
119
  Preferred vector width short:       2
120
  Preferred vector width int:       1
121
  Preferred vector width long:       1
122
  Preferred vector width float:       1
123
  Preferred vector width double:     1
124
  Native vector width char:       4
125
  Native vector width short:       2
126
  Native vector width int:       1
127
  Native vector width long:       1
128
  Native vector width float:       1
129
  Native vector width double:       1
130
  Max clock frequency:         2200Mhz
131
  Address bits:           64
132
  Max memory allocation:       13869907552
133
  Image support:         Yes
134
  Max number of images read arguments:     128
135
  Max number of images write arguments:     8
136
  Max image 2D width:         16384
137
  Max image 2D height:         16384
138
  Max image 3D width:         16384
139
  Max image 3D height:         16384
140
  Max image 3D depth:         8192
141
  Max samplers within kernel:       16
142
  Max size of kernel argument:       1024
143
  Alignment (bits) of base address:     1024
144
  Minimum alignment (bytes) for any datatype:   128
145
  Single precision floating point capability
146
    Denorms:           Yes
147
    Quiet NaNs:           Yes
148
    Round to nearest even:       Yes
149
    Round to zero:         Yes
150
    Round to +ve and infinity:       Yes
151
    IEEE754-2008 fused multiply-add:     Yes
152
  Cache type:           Read/Write
153
  Cache line size:         64
154
  Cache size:           16384
155
  Global memory size:         16317538304
156
  Constant buffer size:         13869907552
157
  Max number of constant args:       8
158
  Local memory type:         Local
159
  Local memory size:         65536
160
  Max pipe arguments:         16
161
  Max pipe active reservations:       16
162
  Max pipe packet size:         985005664
163
  Max global variable size:       13869907552
164
  Max global variable preferred total size:   16317538304
165
  Max read/write image args:       64
166
  Max on device events:         1024
167
  Queue on device max size:       8388608
168
  Max on device queues:         1
169
  Queue on device preferred size:     262144
170
  SVM capabilities:
171
    Coarse grain buffer:       Yes
172
    Fine grain buffer:         Yes
173
    Fine grain system:         No
174
    Atomics:           No
175
  Preferred platform atomic alignment:     0
176
  Preferred global atomic alignment:     0
177
  Preferred local atomic alignment:     0
178
  Kernel Preferred work group size multiple:   32
179
  Error correction support:       0
180
  Unified memory for Host and Device:     1
181
  Profiling timer resolution:       1
182
  Device endianess:         Little
183
  Available:           Yes
184
  Compiler available:         Yes
185
  Execution capabilities:
186
    Execute OpenCL kernels:       Yes
187
    Execute native function:       No
188
  Queue on Host properties:
189
    Out-of-Order:         No
190
    Profiling :           Yes
191
  Queue on Device properties:
192
    Out-of-Order:         Yes
193
    Profiling :           Yes
194
  Platform ID:           0xxxxxxxxxxxxx
195
  Name:             gfx1036
196
  Vendor:           Advanced Micro Devices, Inc.
197
  Device OpenCL C version:       OpenCL C 2.0
198
  Driver version:         3625.0 (HSA1.1,LC)
199
  Profile:           FULL_PROFILE
200
  Version:           OpenCL 2.0
201
  Extensions:           cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

1
# apt list --installed
2
正在列表...
3
...
4
amd-smi-lib/jammy,now 24.6.3.60202-116~22.04 amd64 [已安装，自动]
5
amd64-microcode/jammy-updates,jammy-security,now 3.20191218.1ubuntu2.3 amd64 [已安装，自动]
6
amdgpu-core/jammy,jammy,now 1:6.2.60202-2041575.22.04 all [已安装，自动]
7
amdgpu-dkms-firmware/jammy,jammy,now 1:6.8.5.60202-2041575.22.04 all [已安装，自动]
8
amdgpu-dkms/jammy,jammy,now 1:6.8.5.60202-2041575.22.04 all [已安装]
9
amdgpu-install/jammy,jammy,now 6.2.60202-2041575.22.04 all [已安装]
10
amdgpu-lib/jammy,now 1:6.2.60202-2041575.22.04 amd64 [已安装，自动]
11
...
12
rocm-cmake/jammy,now 0.13.0.60202-116~22.04 amd64 [已安装]
13
rocm-core-asan/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
14
rocm-core/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
15
rocm-dbgapi/jammy,now 0.76.0.60202-116~22.04 amd64 [已安装]
16
rocm-debug-agent/jammy,now 2.0.3.60202-116~22.04 amd64 [已安装]
17
rocm-dev/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
18
rocm-developer-tools/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
19
rocm-device-libs/jammy,now 1.0.0.60202-116~22.04 amd64 [已安装]
20
rocm-gdb/jammy,now 14.2.60202-116~22.04 amd64 [已安装]
21
rocm-hip-libraries/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
22
rocm-hip-runtime-dev/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
23
rocm-hip-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
24
rocm-hip-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
25
rocm-language-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
26
rocm-llvm/jammy,now 18.0.0.24355.60202-116~22.04 amd64 [已安装]
27
rocm-ml-libraries/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
28
rocm-ml-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
29
rocm-opencl-dev/jammy,now 2.0.0.60202-116~22.04 amd64 [已安装]
30
rocm-opencl-icd-loader/jammy,now 1.2.60202-116~22.04 amd64 [已安装]
31
rocm-opencl-runtime/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
32
rocm-opencl-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
33
rocm-opencl/jammy,now 2.0.0.60202-116~22.04 amd64 [已安装]
34
rocm-openmp-sdk/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
35
rocm-smi-lib/jammy,now 7.3.0.60202-116~22.04 amd64 [已安装]
36
rocm-utils/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
37
rocm/jammy,now 6.2.2.60202-116~22.04 amd64 [已安装]
38
rocminfo/jammy,now 1.0.0.60202-116~22.04 amd64 [已安装]
39
...
40
hip-dev/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装，自动]
41
hip-doc/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装，自动]
42
hip-runtime-amd/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装，自动]
43
hip-samples/jammy,now 6.2.41134.60202-116~22.04 amd64 [已安装，自动]
44
hipblas-dev/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装，自动]
45
hipblas/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装，自动]
46
hipblaslt-dev/jammy,now 0.8.0.60202-116~22.04 amd64 [已安装，自动]
47
hipblaslt/jammy,now 0.8.0.60202-116~22.04 amd64 [已安装，自动]
48
hipcc/jammy,now 1.1.1.60202-116~22.04 amd64 [已安装，自动]
49
hipcub-dev/jammy,now 3.2.0.60202-116~22.04 amd64 [已安装，自动]
50
hipfft-dev/jammy,now 1.0.15.60202-116~22.04 amd64 [已安装，自动]
51
hipfft/jammy,now 1.0.15.60202-116~22.04 amd64 [已安装，自动]
52
hipfort-dev/jammy,now 0.4.0.60202-116~22.04 amd64 [已安装，自动]
53
hipify-clang/jammy,now 18.0.0.60202-116~22.04 amd64 [已安装，自动]
54
hiprand-dev/jammy,now 2.11.0.60202-116~22.04 amd64 [已安装，自动]
55
hiprand/jammy,now 2.11.0.60202-116~22.04 amd64 [已安装，自动]
56
hipsolver-dev/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装，自动]
57
hipsolver/jammy,now 2.2.0.60202-116~22.04 amd64 [已安装，自动]
58
hipsparse-dev/jammy,now 3.1.1.60202-116~22.04 amd64 [已安装，自动]
59
hipsparse/jammy,now 3.1.1.60202-116~22.04 amd64 [已安装，自动]
60
hipsparselt-dev/jammy,now 0.2.1.60202-116~22.04 amd64 [已安装，自动]
61
hipsparselt/jammy,now 0.2.1.60202-116~22.04 amd64 [已安装，自动]
62
hiptensor-dev/jammy,now 1.3.0.60202-116~22.04 amd64 [已安装，自动]
63
hiptensor/jammy,now 1.3.0.60202-116~22.04 amd64 [已安装，自动]
64
...

暗雨冥的花田

今晚吃什么

硬件/系统配置#

ROCm 安装#

配置 ld#

将 ROCm 的可执行文件添加至 PATH#

验证内核驱动程序，ROCm，软件包安装状态#

重启以确保 ROCm 配置生效#

PyTorch 安装#

附#

验证内核驱动程序，ROCm，软件包安装状态命令行参考输出#

暗雨冥的花田

今晚吃什么

硬件/系统配置#

ROCm 安装#

配置 ld#

将 ROCm 的可执行文件添加至 PATH#

验证内核驱动程序，ROCm，软件包 安装状态#

重启以确保 ROCm 配置生效#

PyTorch 安装#

附#

验证内核驱动程序，ROCm，软件包 安装状态命令行参考输出#

验证内核驱动程序，ROCm，软件包安装状态#

验证内核驱动程序，ROCm，软件包安装状态命令行参考输出#