10 June 2019

10th of June

Farm status
Intel GPUs
Running Asteroids overnight

Nvidia GPUs
Two running Seti

Raspberry Pis
All running Einstein BRP4 work


Other news
The two GTX 1660 Ti equipped Ryzens are running Seti 24/7. I had one of them lock up and had to power it off. When it came back up it decided to trash a bunch of CPU work units that had been in progress. I have seen this behaviour from them where they just lock up for no apparent reason. Given all four Ryzen machines have done this it seems something specific to them. They've had BIOS upgrades and two of them have also had GPU upgrades. I've even seen it when they are idle. I haven't been able to pin point the cause. Given these four machines are slated for replacement soon I am not going to waste any more time trying to debug the issue.


AMD Linux driver experiment
I have an old HD 7770 graphics card that has been sitting in its box for a few years now. They were released in February 2012 so they are ancient in GPU terms. I decided to fire up one of the i3's as an AMD cruncher in order to see just how bad it is getting AMD's drivers to work under Linux. AMD GPUs are good at running OpenCL apps, much better than Nvidia.

I'll point out I am running Debian and AMD only release their Linux drivers for Red Hat or Ubuntu. Ubuntu is based on Debian so how hard could it be? The hardware part is simple just fit the card into the PCIe slot and plug a 6 pin power cable in. The machine is happy to display through its DVI port without any issue. I install a clean copy of Debian and it complains about missing AMD firmware. Debian have a package called firmware-amd-graphics which fixes that. I install it and reboot and I how have a high-res desktop working.

The next part is to get OpenCL going which is when it all falls apart. First you need to install a few packages from the Debian repo:

sudo apt install build-essential dkms

Now we need to download the latest amdgpu-pro drivers which I did on a windows machine and then stuck them on a USB thumb drive to copy them across. At the time I write this they are called amdgpu-pro-17.40-492261.tar.xz so they need to be unpacked using the command:

tar -xJpf amdgpu-pro-*.tar.xz

At this point you'll have a bunch of .deb files and an install script. You'll notice they have their driver version number (17.40-492261) in all the file names. When they bring out a new version expect these numbers to change. After this we then need to install them one by one, but we don't need all of them just to get OpenCL. We would do the following:

sudo dpkg -i amdgpu-pro-core_17.40-492261_all
sudo dpkg -i libopencl1-amdgpu-pro_17.40-492261_amd64
sudo dpkg -i clinfo-amdgpu-pro_17.40-492261_amd64
sudo dpkg -i opencl-amdgpu-pro-icd_17.40-492261_amd64
sudo dpkg -i amdgpu-pro-dkms_17.40-492261_all
sudo dpkg -i libdrm2-amdgpu-pro_2.4.82-492261_amd64
sudo dpkg -i ids-amdgpu-pro_1.0.0-492261_all
sudo dpkg -i libdrm-amdgpu-pro-amdgpu1_2.4.82-492261_amd64

I got as far as the dkms when it failed to build. AMD only support the current long term release kernel and so it fails under the 4.19 kernel. I think Ubuntu are on the 4.18 kernel at the moment so there isn't much I can do about this.

Looking at the install script the ids-amdgpu-pro* isn't referenced so I suspect its not needed, but seeing as it failed before that point I can't tell.

I will be sticking with Nvidia because at least their drivers are simple enough to install (yes they have a dkms component as well) and work on current release kernels. AMD really need to get their act together with their drivers, they could be moving so much more hardware if they fixed their software.

No comments: