Week 1 Report for Refining ROCm Packages in Gentoo

This is my first week of GSoC at Gentoo, and I found it unexpectedly exciting. The center of first week is around making dev-util/hip rely on vanilla clang. In https://github.com/littlewu2508/gentoo/tree/blender-rocm, I bumped rocm-device-libs, rocm-comgr, hip to 5.1.3 and use vanilla llvm/clang as backend; after that I bumped blender to 3.2.0 and enables its HIP cycles, and it worked on Radeon 6700XT (see [1])! That means I made a good start on replacing llvm-roc with system llvm, which is originally the last thing in my GSoC proposal. So, I changed the plan a bit, to move the last week’s plan forward.

The story begun when I heard blender 3.2.0 is finally released with HIP cycles support on Linux, so I decided to try it out. Also I searched the bugzilla and noticed a proposal to use llvm.eclass and rocm USE-flag[1].

After a quick bump for media-gfx/blender and its required dependencies, I enabled the HIP cycles in ebuild and started emerging. The build is surprisingly smooth, since build commands are simply calling hipcc without too many arguments which is already in good shape. However, blender was aborted when I tried to use HIP cycles at runtime — the error suggest that more than one llvm libs are linked in. I realized that some dependencies like mesa linked vanilla llvm while blender itself has to link llvm-roc since it has components compiled with hipcc. I reported my trial in [1] and Sebastian Parborg confirmed the reason of my failure, so I opened another bug about llvm-roc at [2]. There I stated the situation and give two possible solutions: use vanilla clang as hip’s backend, or make llvm-roc another slot of llvm/clang. That is actually my last-week-plan in GSoC proposal, but at that time I didn’t realize the importance of making llvm-roc compatible of system llvm, since I had never encountered a package that both use llvm and HIP. In the bug report I announced that the second solution should be easier so I preferred that, but in my heart I think the first one is more elegant, so I would try it first and fallback to the second solution if I failed. As a result, I started my journey on removing llvm-roc from the ROCm dependency tree.

The first thing is to modify rocm-device-libs. With the help of Michał Górny (who pointed out that packages should not assume llvm to have the “BUILD_SHARED_LIBS=ON” and link llvm components in [2], knowledge++), I patched the source made it only rely on llvm:14 (Fedora developers have also discussed about this and they would like to upstream their patches). Then it’s rocm-comgr, where I encounter serious problems. With the help from Yuyi Wang, I figured out a patch [3] (however I do not understand why Debian and Fedora don’t need it) and I prepare to upstream it to ROCm team in the future. After that only four test failures remain, but it took me a long time to debug, and I found both Debian, Fedora team and me has not to come to a solution yet, so I decided to open a github issue to upstream at https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45. During ebuild writing I used llvm.eclass to determine llvm prefix and `clang -print-resource-dir` to locate the CLANG_RESOURCE_DIR which is in `/usr/lib/clang/<version>` but not the default relative path in llvm-project — knowledge+=2.

Then it was all about HIP. I encountered many issues about finding the correct include locations, and they are fixed one-by-one. At last I came to a new hipvars.pm and a patch to hipcc.pl, disabling poisoning `-isystem` and correcting many paths. Now directly calling hipcc works, and blender rendered successfully using HIP cycles! I was amazed at this result.

Then I continued to test — compiling rocBLAS-5.1.3 using this new ROCm toolchain. Sadly, there are paths that should be corrected in cmake files. I’ve done some fixes, but there still needs more to let rocBLAS get configured. Bumping the high-level libs using this new toolchain would be the major task of the coming week. Another job is finalize and push low-level runtimes and toolchain into ::gentoo via PRs, starting from https://github.com/gentoo/gentoo/pull/25785. I’ll also fix existing bugs when I bump the versions of those in sci-libs. For https://bugs.gentoo.org/852236 I already have a solution. For bugs of not respecting CFLAGS/LDFLAGS I shall investigate, and I think the problem is in common with https://bugs.gentoo.org/851792. I’ll check them one-by-one.

So, the plan is changed as follows:

I am currently half way in the middle of week 11’s task. So plan of week 11 is merged into week 1, meaning that tasks in week 1-10 are postpone one week.

Also, since I’m using ROCm-5.1.3 as the test place of the new toolchain, I would like to make use of rocm.eclass, if possible. That means the original week 5-8 would be moved after week 2 (between CuPy and TensorFlow).

In conclusion, in the first week I was persuaded by [1] that [2] is an important blocker, so the task in week 11 is no longer optional but essential, and get prompted. The good news is I’m getting nice progress on this issue, and I believe I’m the first Gentoo user to package and use blender-3.2 with HIP cycles. The bad news is I’m not finished with hacking cmake modules for HIP.

[1] https://bugs.gentoo.org/693200
[2] https://bugs.gentoo.org/851702
[3] https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45#issuecomment-1155975910

Also, there is a  summary of bug reports in week 1:
1. 822828, 693200, 851702, 842405, 842405

The summary of closed pull requests during week 1:
1. https://github.com/gentoo/gentoo/pull/25861

The summary of currently opened pull requests:

1. rocprofiler QA fixes: https://github.com/gentoo/gentoo/pull/25891 Status: open for review
2. dev-libs/ocl-icd prefix adoption: https://github.com/gentoo/gentoo/pull/25785 Status: fixing
3. sys-devel/clang ROCm patch: https://github.com/gentoo/gentoo/pull/25999 Status: open for review
4. dev-util/premake prefix adoption (this is related to https://github.com/GPUOpen-LibrariesAndSDKs/HIPRTSDK) https://github.com/gentoo/gentoo/pull/25825 Status: open for review

This entry was posted in ROCm Packages. Bookmark the permalink.

Leave a Reply

Your email address will not be published.