This week I have leant a lot from Ulrich’s comments on rocm.eclass. I polished the eclass to v3 and send to gentoo-dev mailing list. However, I observed another error introduced in v3, and I’ll include a fix for it in the v4 in the following days.
Another half of my time is spent on testing sci-libs/roc-* packages on various platforms, utilizing rocm.eclass. I can say that rocm.eclass did its job as expected, so I believe after v4 it can be merged.
With src_test enabled, I have found various test failures. rocBLAS-5.1.3 fails 3 tests on Radeon RX 6700XT, slightly exceeding tolerance, which seems not a big issue; rocFFT-5.1.3 fails 16 suites on Radeon VII [1], which is serious and confirmed by upstream, so I suggest masking <code>amdgpu_targets_gfx906</code> USE flag for rocFFT-5.1.3; just today I observe MIOpen is failing many tests, probably due to vanilla clang. I’ll open issues and report those test failures to upstream. Running tests suite takes a lot of time, and often drain the GPU. It may takes more than 15 hours testing rocBLAS, even on performant CPU like Ryzen 5950X. If I use the GPU to render graphics (run a desktop environment) and do test simultaneously, it often result in amdgpu driver failure. I hope one day we can have a testing farm for ROCm packages, but that would be expensive because there are a lot of GPU architectures, and the compilation takes a lot of time.
I planned to finish the draft of wiki pages [2,3], but turns out I’m running out of time. I’ll catch up in week 11. My mentor is also busy in week 10, so my PR about rocm-opencl-runtime is still pending for review. Now we are working on solving the dependency issue of ROCm packages — gcc-12 and gcc-11.3.0 incompatibilities. Due to two bugs, the current stable gcc, gcc-11.3.0 cannot compile some ROCm packages [4], and the current unstable gcc, gcc-12, is unable to compile nearly all ROCm packages [5].
I’ll continue to do what’s postponed in week 10 — landing rocm.eclass and sci-libs packages, preparing cupy, fixing bugs, and writing the wiki pages. I’ll investigate MIOpen’s situation as well.
[1] https://github.com/ROCmSoftwarePlatform/rocFFT/issues/369
[2] https://wiki.gentoo.org/wiki/ROCm
[3] https://wiki.gentoo.org/wiki/HIP
[4] https://bugs.gentoo.org/842405
[5] https://bugs.gentoo.org/857660