19 April 2019

A journey I would rather not go on

The two ASUS GTX 1660 Ti cards arrived. Being eager to get the new toy going I went to install one of them. Hardware installation was fine. I already had a PCIe power cable with a 6+2 pin power connector running the older graphics card. Swapped it out no issue there, power up the machine and got an ASUS logo followed by the Debian desktop. All looking good so far.

I went to check the BOINC logs and it couldn’t work out what model GPU it was, so time to reinstall the current (410) driver. It got part the way through before complaining about unmet dependencies. But its from the repo so why does it have unmet dependencies? I decided to try removing it and rebooting. Oh great no desktop now. At least I can log into the box remotely.

Next I try installing the driver from Debian Buster (the next release). No that has unmet dependencies as well. Lets try the version from Debian Experimental (418.56) as its more up to date. It wants to install 800 updates. Okay last resort before I give up and put the old card back into the machine, lets do a dist-upgrade to get to Debian Buster. Two hours later its finished. Reboot and we have an ASUS logo and the new dark-themed (more like a grey camouflage look) desktop. It still doesn’t recognise the GPU though. Debian Buster still has the 410 driver.

Okay now try installing the driver from Experimental. It installed okay. Lets hold our breath, cross your fingers and reboot. I get an ASUS logo and the camo desktop. Well that bit is still working at least. I check the BOINC log and now it recognises the GPU. Hooray. Lets see if it can be used for compute. I set BOINC to no cache and allow it to fetch work, it downloads 16 CPU and one GPU task. I disable work fetch and watch. The GPU task isn’t moving. Uh oh. Lets give it a bit of time. After about 30 seconds it jumps to 23% done and slowly starts counting up. Looking good. It gets to about 50% and oh crap its gone back to 0% and started counting up again. I keep watching as it gets past 50% and makes its way up to 100% and then uploads. I’m not too sure what happened there but it looks like it worked. I know we’ve gone from CUDA 10.0 to 10.1 with the driver update.

I try to shut it down the following morning once the CPU tasks have finished. I login as root and try to shut it down. “Shutdown now” command not found. Oh wonderful. A bit of googling and I find out we have to use “systemctl poweroff” and “systemctl reboot” now. The service command is also gone, we use “systemctl stop xxx” or “systemctl start xxx” to stop or start services.

Where to now? Next I will update the Seti Multi-beam app. The one I have is CUDA 9 and there is a CUDA 10.1 version. Hopefully that will work, but don’t hold your breath...

Update 25 April 
I raised a bug for Debian. They seem to have fixed the driver dependencies for Experimental and moved it up to Sid. The drivers at Stretch and Stretch-backports are still broken.

I tried re-installing Stretch, upgrading to Buster and then the driver from Sid - The machine hangs at boot time and won't display the desktop at all.

I have also tried downloading the driver directly from Nvidia however to install it you need to get gcc and various other dependencies sorted out by hand.

Update 11 May
Debian have pushed the 418.56-2 driver through to stretch-backports. This works and I have finally got the GTX 1660 Ti running. I even upgraded the driver on the GTX 1060 machines and they are running fine as well.

1 comment:

Mark G James said...
This comment has been removed by the author.