Week 4 Report for Refining ROCm Packages in Gentoo

The forth week working on packaging ROCm is quite smooth. There are some bug fixes, and also major improvements on rocm.eclass.

Bug fixes cover rocBLAS and rocFFT. For rocBLAS, I backported a patch to sci-libs/rocBLAS-5.0.2-r1 and dev-util/Tensile-r1, to pass `-j N` from ${MAKEOPTS} to TensileCreateLibrary when building rocBLAS, which fixed [1]. As of rocFFT, I corrected its BDEPEND [2], added missing sys-libs/omp for omp.h [3], and let it depend on dev-util/rocm-cmake-5.0.2-r1 which does not install files to unexpected paths [4]. However, as the gcc-12.1.0 lands, bugs about clang expanding __noinline__ macro in g++-v12/bits/shared_ptr_base.h emeregs [5,6]. Details can be seen on [5], and I’m working on resolving this (see PR [7]).

For rocm.eclass, I finished the draft for three major functions: USE_EXPAND, src_configure and src_test. I also wrote get_amdgpu_flags function used by src_configure.

The use expand. I haven’t write a profiles/desc/amdgpu_targets.desc, so the descriptions are missing.

My latest work on rocm.eclass is located at https://github.com/littlewu2508/gentoo/blob/rocm-5.1.3/eclass/rocm.eclass. Below are its status and my questions I’d like to share:

1. Default architectures. Now I implement the USE_EXPAND of AMDGPU_TARGETS, I need to specify the default value of each use. The straightforward way is to enable all targets by default, but that can be **extremely** slow and disk-hungry when compiling ROCm libraries such as rocBLAS or rocFFT (expect to compile for several hours if the CPU is not powerful enough). Currently I defined a variable OFFICIAL_AMDGPU_TARGETS, which is referenced from ROCm installation documents [8]. Although the support range is much larger, and different components have their own support matrices, AMD promise to fully support these enterprise cards. For enterprise users, they can just emerge ROCm packages without setting specific use flag, and have out-of-box experience on Gentoo. For users with consumer end cards, they can read the wiki page (covered later in my GSoC project) and seek instructions to set the correct use flag.

2. Whether setting -DSKIP_RPATH=true in mycmakeargs. Previously this is set to avoid including rpath if USE=benchmark when building ROCm packages like sci-libs/roc-* and sci-libs/hip-*. The test and benchmark executables are named “clients” (take rocBLAS as example, clients are programs that uses functions and link librocblas.so). In order to run tests and benchmarks before install libraries to system, rpath is set on these executables, but gentoo does not have a src_benchmark phase, so the benchmark binaries is just installed, and user can run it afterwards (actually I use it in my research to tune algorithms). So there should not be rpath in benchmark binaries, and this is achieved by setting -DSKIP_RPATH=true. However, after this, test program cannot execute because rpath is also eliminated, so I have to specify LD_LIBRARY_PATH in src_test manually. Another resolution is not skipping rpath, but run chrpath on affected binaries, which means maintainers have to write a dedicated src_install and remember to add chrpath command applying on every new executables when bumping versions. The third solution is to patch CMakeLists.txt to include rpath only in test programs, but this method also introduce more maintenance work. What’s your opinion?

3. Detect AMDGPU in src_test. This blocks https://bugs.gentoo.org/817440, and I also raise questions in the bug report. Tinderbox cannot run tests on ROCm packages like rocBLAS, because there is no AMDGPU available. I implement the detection mechanism, with one problem left: if no GPU available, fail the test or exit normally? Personally, I think the best solution is to detect AMDGPU during pretend or setup phase, turn off the test USE flag if no GPU available, or the architecture compiled does not match the detected GPU. But is operating USE flag inside ebuild phase functions possible?

Despite these issues I managed a working version of rocm.eclass, and used it on rocBLAS. The use expand works successful, while src_test can properly detect hardware and execute in both sandboxed vanilla Gentoo, and non-sandboxed Gentoo prefix. There are still things to work on rocm.eclass:

1. ROCM_USEDEP, similar to PYTHON_USEDEP. For example, hipBLAS uses architectures gfx906 and gfx1030, then its dependency, rocBLAS, must contains gfx906 and gfx1030.
2. SRC_URI.
3. A way to automatically add PORTAGE_USERNAME to render group, to access amdgpu and perform src_test. I don’t have any clue on this yet, maybe meta package in acct-group can do this?

In the coming week I’ll finish rocm.eclass as planed, and send out for early review. Meanwhile I’ll continue fixing bugs [5,6,9,10], answering questions about enabling rocm in packages [11,12], and prepare to land ROCm-5.1.3. One of my friend is also plugging Radeon VII on there arm64 server, and if everything goes well I can try ROCm on arm64 (in kernel document, the GPGPU driver, amdkfd, support amd64, arm64 and ppc64), and add the ~arm64 KEYWORD in the future.

[1] https://bugs.gentoo.org/852236
[2] https://bugs.gentoo.org/836248
[3] https://bugs.gentoo.org/850937
[4] https://bugs.gentoo.org/836274
[5] https://bugs.gentoo.org/857126
[6] https://bugs.gentoo.org/857660
[7] https://github.com/gentoo/gentoo/pull/26311
[8] https://docs.amd.com/bundle/ROCm-Getting-Started-Guide-v5.1.3/page/Overview_of_ROCm_Installation.html
[9] https://bugs.gentoo.org/842366
[10] https://bugs.gentoo.org/836275
[11] https://github.com/gentoo/gentoo/pull/25836
[12] https://github.com/gentoo/gentoo/pull/25837

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published.