Week 2 Report for Refining ROCm Packages in Gentoo

The second week of refining ROCm ebuilds is quite busy. I deployed docker to perform clean build which find two hidden bugs in hip, and there is also progress on completing rocm-5.1.3 against vanilla llvm/clang.

After learning a lesson at bug #853184, I realized that a clean environment to build and test is essential to find hidden bugs, especially missing dependencies. I find the cause of #853184 and fixed that in [1]. With the help of clean build, I found bug #853718 and fixed that in [2]. I also reproduced bug #843263 and provide a fix in [3].

I also fixed an old bug #853184 with [12].

Another bug fix is [4] for a serious issue of incorrect manifest, bug #851792 and #851795. Andrew Ammerlaan also pointed out the QA issue of directly calling python3 to execute scripts instead of using EPYTHON. I will consider that in week3.

Then it’s about progress on rocm-5.1.3 against vanilla llvm/clang. The major achievements are:

1. Michał Górny told me the policy of packaging llvm/clang, so the brutal patch in [6] is not suitable. I studied the patch and find it unnecessary, as long as we add `–rocm-path=/usr` and `–hip-device-lib-path=/usr/lib/amdgcn/bitcode` when calling clang to compile hip sources. So I patched hipcc.pl in dev-util/hip and comgr-compiler.cpp in dev-libs/rocm-comgr to explicitly add `–rocm-path=/usr`. Notice that the patch for rocm-comgr is not obvious, because a test suite called “compile_hip_test_in_process” won’t appear and fail unless dev-util/hip is merged (hip depend on rocm-comgr but does not depend on hip), so I guess that’s why Debian and Fedora has not encounter this issue. I suppose they are also packaging hip, and will meet similar problems, so it would be really helpful if ROCm team of major distributions can discuss and share information on packaging hip.

2. I packaged dev-util/hip-5.1.3, it’s currently in [7]. It currently works, although I’m not satisfied with tens of sed commands and ten patches needed — upstream of hip currently is not distribution-friendly. I fixed the cmake issue mentioned in week1’s report, also mentioned in [8]. I also encountered bug when trying to turn on USE=profile, and the solution is backporting two patches (see details in [9]), meaning that this release of hip is not able to build itself due to some important fix not included. Plus the hard-coded clang-runtime include paths and abused `-isystem`, I really find hip the most chanllenging one among ROCm packages.

3. Blender still works after the removing the patch of clang mentioned in 1., and details can be found in [10]. I also tried backporting a patch to enable using HIP cycles (a render engine for blender) on Radeon VII, but failed with GPU memory access error, which indicates that hip needs further tuning [11].

4. Version 5.1.3 ebuilds are in good shape [7], including low-level runtimes {roct-thunk-interface, rocr-runtime, rocminfo}, and toolchains {rocm-device-libs, rocm-comgr, rocm-cmake, hip}, waiting for PR. The commits are squashed, while you can see my original history of battling against hip in the unrebased tree [16]. rocBLAS is also bumped to 5.1.3 and running tests, but I decide to rewrite it and make use of rocm.eclass later.

5. rocm-comgr upstream noticed my bug report [17].

So now hip-5.1.3 seems to be ready, and my test system does not show bugs. I’ll PR my rocm-5.1.3 branch [7] right after [3] get merged.

In the next week I shall land make hip-5.1.3 in ::gentoo, and prepare a draft of rocm.eclass. There will also be bug fixes, concentrating on rocBLAS not respecting MAKEOPTS (#852236), rocprofiler QA issue [5], rocFFT build issue using hip-5.1.3 [13]. For the long term, I’ll also investigate the embedded header in libhipamd64.so and libhiprtc-builtins.so which blocks CuPy, and how well vanilla libomp supports ROCm openmp offloading compared to aomp(llvm-roc) which is related to rocSPARSE [14].

Summary: I fixed existing bugs in ::gentoo so the blockers are gone [15]. I finished the dev-util/hip-5.1.3 and its 5.1.3 dependencies. The hacks applied to hip is too much — it would be helpful to share information with other distribution developers, and reflect those issues/open PR to upstream.

[1] https://github.com/gentoo/gentoo/pull/26018
[2] https://gitweb.gentoo.org/repo/gentoo.git/commit/93ff73188c29fe12088f6166df669847cde9b2b4
[3] https://github.com/gentoo/gentoo/pull/26090
[4] https://github.com/gentoo/gentoo/pull/25891
[5] https://github.com/gentoo/gentoo/pull/25891#issuecomment-1163481516
[6] https://github.com/gentoo/gentoo/pull/25999
[7] https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3-submit
[8] https://bugs.gentoo.org/693200#c23
[9] https://github.com/ROCm-Developer-Tools/hipamd/issues/18#issuecomment-1167198811
[10] https://bugs.gentoo.org/693200#c24
[11] https://developer.blender.org/D15242
[12] https://github.com/gentoo/gentoo/pull/26039
[13] https://bugs.gentoo.org/693200#c25
[14] https://github.com/gentoo/gentoo/pull/25318
[15] https://github.com/justxi/rocm/issues/8#issuecomment-1166165426
[16] https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3
[17] https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45

This entry was posted in ROCm Packages. Bookmark the permalink.

Leave a Reply

Your email address will not be published.