24 November 2019

24th of November

Farm status
Intel GPUs
All running Einstein O2MD1 work

Nvidia GPUs
All running Einstein O2MD1 work

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
The farm had a few days off during the week. Currently its cool enough to have everything running. The x64 nodes are all doing the O2MD1 search on their CPUs. I am back to running the AMD machines on half the cores as they are now down to 6 hours and 45 minutes per work unit.


Einstein SSL/TLS security
They are updating their site security and so the minimum versions of the BOINC client that will work with them are:
From the 27th of Nov 2019 - BOINC 7.4.36
From the 25th of May 2020 - BOINC 7.10

If you're still running an older BOINC client you'll have to update to a newer one.


SuperHost
The original idea is documented HERE

In order to simplify things I would drop the idea of the SuperHost doing verification and leave it as it happens today (ie the projects verify the results).

I posted to the BOINC dev mailing list that I could provide some funding for a developer to work on this. That was in response to a BOINC wish list/roadmap email. I also dropped a couple of private emails to people in the BOINC community on the subject. I didn't get any response so have to assume there is no interest.

17 November 2019

17th of November

Farm status
Intel GPUs
All off

Nvidia GPUs
All running Einstein O2MD1 search

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Catastrophic fire danger
Last week saw a catastrophic fire danger declared so everything was switched off. There has been a smoke haze over the city for the last couple of weeks from bush fires up north. The last week saw a number of fires around Sydney's suburbs, most of which were dealt with quickly.

Despite this we had a couple of cool days so I had most of the x64 computers running the Einstein O2MD1 work on their CPU's. They take around 15-17 hours when running on all threads.


More Intel bugs
Announced this week were a couple more security bugs with the Intel CPUs. The bugs have been known for a while but were waiting on mitigations being made available before they publicly announce them. With all the mitigations being put into Intel they're now somewhat slower than the AMD CPUs.


Debian Buster point release
Debian did a 10.2 point release this weekend.

I haven't upgraded the Nvidia GPU machines from Stretch because of things not working. I'll need to try upgrading one to see if they have fixed the issues yet. My main issue was Buster puts the screen into 1024x768 resolution and can't be changed. This doesn't occur under Stretch. I raised a Debian bug in September but it doesn't appear for have made any progress.

03 November 2019

3rd of November

Farm status
Intel GPUs
One running Einstein O2MD1 work

Nvidia GPUs
All running Einstein O2MD1 work

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Einstein work
Most of the week was hot so everything except the Pis were off. Now its a little cooler the farm are running Einstein O2MD1 (Observation Run 2 Multi-Directed search 1) work on their CPUs. The run times vary between 14-17 hours on the AMD machines and 19-22 hours on the Intel machines. Some work units are faster than others, even on the same machine.

27 October 2019

27th of October

Farm status
Intel GPUs
All off

Nvidia GPUs
One running Einstein gravity wave work

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
Its been warm for the last fortnight so most of the farm has been off. I had all of the Nvidia GPU machines running Einstein gravity wave work on their CPU's last week.


Pimp my storage server
While it was warm I decided to pimp, I mean upgrade, one of the storage servers. Its a 2U rack mount with 12 x 3.5" HDD drive bays. I currently have 4 x 8TB drives which after allowing for single drive redundancy gives a theoretical 24TB usable space, but in practice its 21TB.

I've got lots of spare HDD's but they are of varying sizes from 320GB up to 4TB. These days for a storage server they recommend using the largest size drives available.

The first upgrade was memory. When running ZFS they recommend to increase the memory first as it will use it for caching. Its referred to as the ARC (adaptive replacement cache). I got 8 x 16GB sticks (128GB) which filled all the available RAM sockets. The server supports up to 32GB memory sticks but given its ECC memory it would have cost quite a bit more.

The second upgrade was an Intel Optane 900P 280GB SSD as a cache drive. In ZFS terms its called L2ARC (level 2 adaptive replacement cache). Its a half height PCIe x4 card so you can use it in a 2U server. They're recommended by "Serve the home" as cache drives at the moment.

07 October 2019

7th of October

Farm status
Intel GPUs
All off

Nvidia GPUs
Running Einstein gravity wave work overnight

Raspberry Pis
All except two running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
Weather warm during the day so crunching overnight.


Storage servers
I spent the weekend working on the tower case storage server. The case has room for 10 drives however the top two drive bays are intended for a CD burner. The other 8 drive bays are designed for 3.5" drives. I currently have 4 x 4TB WD SE drives in there. Originally this machine used to run Windows Server 2008 with a RAID controller. I swapped it over to Linux last year.

This week I removed the RAID controller card and plugged the drives into the motherboard SATA controller to simplify things. I updated the OS to the latest Debian and reinstalled ZFS onto it. I went to restore from the Drobo which decided one its drives was missing. A quick power off and reseat the 3 drives got it going in a degraded state. I copied the files back from the Drobo while it was repairing itself. That has the tower case storage server up and running.

For the rack mount (disk based) storage server I ordered more memory. It currently has the 32GB that was in the tower case storage server and I want it back. The rack mount will get 8 x 16GB of ECC memory bringing it up to 128GB.

I looked at adding a cache drive to the storage servers but the recommendations for ZFS seems to be increase the memory to its maximum first. Only add an SSD as a cache drive once you get above 64GB of memory as the L2ARC (cache drive) uses main memory.

28 September 2019

28th of September

Farm status
Intel GPUs
One running Einstein Gravity Wave work

Nvidia GPUs
All running Einstein Gravity Wave work

Raspberry Pis
All except two running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
Last week I mentioned issues installing ZFS. I raised a bug report with Debian. A work around was posted for the installer issue and I have now got ZFS running on one of the storage servers. I haven't tried the other one yet but expect the same process will work on it.

Also last week I tried updating one of the Nvidia GPU machines to Debian Buster, their current release. It locks up while booting but can be accessed via ssh. From there I can install the Nvidia drivers. It will boot after that but the screen will only do 1024x768 resolution. I raised another bug for Debian. In the mean time I have gone back to running Debian Stretch which works fine.

CPDN announced on the 18th of June they were going to run the OpenIFS climate models. Normally they run 32 bit versions of the UK Met Office climate models. Since then I increased the memory in the Intel GPU machines to be able to run these. We're still waiting for these climate models to become available.

14 September 2019

14th of September

Farm status
Intel GPUs
One running Einstein Gravity Wave work

Nvidia GPUs
All four running Einstein Gravity Wave work

Raspberry Pis
All except two running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
I've been working on one of two storage servers for the past two weeks. Its a hard disk based server in a 2U rack mount case. Hardware wise its all working, but the software is causing problems. I can't get ZFS installed on it. It fails to build. I've tried different Debian releases and all of them seem to be broken. I bought this one almost a year ago. It was intended to replace my current (tower case) server which has less drive bays, thus limited scope for expansion.

I mentioned two storage servers. The other one is a NVMe based one in a 1U rack mount case. Apart from the fans being really loud and annoying its hardware also works fine. It too needs to run ZFS so I have been trying to get the quieter one going first.

I have updated the Intel GPU machines to Debian Buster. Unfortunately Buster locks up on the Nvidia GPU machines and won't even give a logon prompt. Time to raise another bug with Debian.

25 August 2019

25th of August

Farm status
Intel GPUs
All off

Nvidia GPUs
Have been running Seti, now running Einstein.

Raspberry Pis
Running Einstein BRP4 work


Other news
I assembled the 4th Ryzen 3600 (aka GPU compute node v3) machine and have had it running in. I also have it on a power point power meter to see how much juice they use. At peak load with the  CPU and GPU fully utilised it jumped to 240 watts (at a supposed 240 volts - according to the meter its getting 236 volts). Running Milkyway work on the GPU is less taxing and it barely gets to 200 watts. Running just the CPU at full load its using 140 watts. Idle was 60 watts.

It was a bit warmer yesterday afternoon so I had to stop GPU crunching, hence running Einstein gravity wave work on the CPU only. The Ryzen 3600 is faster than my i7-8700 machines doing the same work. Both CPU's have the same core count but the Intel has a lower clock speed (3.2GHz base) and doesn't support memory faster than 2666MHz.


11 August 2019

11th of August

Farm status
Intel GPUs
All off

Nvidia GPUs
Three running Seti

Raspberry Pis
All running Einstein BRP4 work


GPU compute nodes
Version 2 of the GPU compute nodes was a Ryzen 1700 with a GTX 1660 Ti (they originally had GTX 1060's but I upgraded them to the 1660 Ti when they came out). Version 3 is a Ryzen 3600 with the same GTX 1660 Ti.

I got another two GTX 1660 Ti's and have assembled another of the v3 GPU compute nodes. I also decommissioned another v2 as well as an i3-6100t that was acting as a GPU compute node (with a GTX 1060). I still have one more v3 to assemble and one more Ryzen 1700 to decommission. I will be selling off the older compute nodes.

The first v3 GPU compute node that I built has managed to get its recent average up to 73,400 so far for Seti. They take a bit over a month to reach their highest average and this one has been going just shy of one month. The second v3 GPU compute node that I built has got up to a recent average of 53,300 and its only been running for two weeks.

With just three GPU compute nodes running I've climbed to 61st place on the Seti rankings. I don't expect it to last too long though as once the first v3 GPU compute node is a month old I will get it doing some Einstein CPU work.

28 July 2019

Another one bites the dust

Farm status
Intel GPUs
All off

Nvidia GPUs
Three running Seti work

Raspberry Pis
All running Einstein BRP4 work


GPU compute nodes
The 3 new GPU compute nodes arrived as parts. I have assembled one, which has replaced another of the Ryzen 1700 GPU compute nodes. That is two done - the ones with GTX 1660 Tis. I will work on the other two during the week. I also need to buy a couple more GTX 1660 Ti graphics cards for these last two.

I have Dariks Boot and Nuke running on one of the former GPU compute nodes at the moment erasing the hard disk. Its only been going for 16 hours so far with an estimate of another 6 hours. I then have to repeat the process on the other node before selling them.

The former GPU compute nodes got their RAC (Recent Average Credit) up to 75,600 for Seti so the new ones need to beat that, despite the fact they have less cores. The first one I replaced is currently up to 58,600 and still climbing.


The old and the new
The "old" GPU compute nodes (v2) consisted of an ASUS X370-Pro motherboard, Ryzen 1700 CPU, 32GB of memory, 2TB HDD, GTX 1060 3GB graphics and a 750 watt bronze rated power supply. These pull 235 watts at the wall socket (240 volts) under load.

The new GPU compute nodes (v3) consist of an ASUS X570-P motherboard, Ryzen 3600 CPU, 16GB of memory, 1TB NVMe SSD, GTX 1660 Ti graphics and a 550 watt gold rated power supply.

I need to plug in the power meter to get an idea of how much electricity the new ones use. In theory they should be about the same as the old GPU compute nodes but as you can see there are a few differences between them.

16 July 2019

GPU compute node v3

Here are some pictures of the latest build. Its a GPU compute node. The last picture is the old and new compute nodes side by side.

Parts list
Case  Fractal Design Meshify C
Power supply  Seasonic 550w gold rated
Motherboard  ASUS X570-P/CSM
CPU  Ryzen 5 3600
Cooler  Noctua NF-U9S
Memory  HyperX Predator 16GB (2 x 8GB) DDR4-3200MHz
Storage  Samsung 970EVO Plus 1TB
GPU  ASUS GTX 1660 Ti (6GB)
 








14 July 2019

14th of July

Farm status
Intel GPUs
All off

Nvidia GPUs
Three running Seti

Raspberry Pis
Twelve running Einstein, four being upgraded to Buster.


New GPU compute node
I got the parts for my new GPU compute nodes. They're Ryzen 3600's on X570 motherboards and will be used to replace the existing Ryzen 1700's. I have assembled the first one and its been running for 2 days now. The graphics card was swapped out of the older machine. Despite having two less CPU cores than the machines they replace they're faster so the amount of work being produced seems to be about the same.

I have an issue with the memory thinking its slower than it really is. I purchased DDR4-3200 and the machine is telling me its DDR4-2400. I have raised a bug report with ASUS regarding this. The issue of later kernels (5 or later) not being able to boot doesn't effect me as Debian is still using a 4.x kernel at the moment. I expect there will be a few BIOS updates to resolve issues like this.

Given this success I will order parts for the other 3 GPU compute nodes. I also need to order a couple more GTX 1660 Ti's as I only got two last time and I have four GPU compute nodes. The supplier has run out of stock of Ryzen 3600's and X570 motherboards already so it might take a while for the parts to arrive. I will post some pictures of the new builds in a separate blog post.


Another storage server
I  ordered a 1U rack mount U.2 storage server. You can never have fast enough storage right? It only has 8 drive bays but that is enough for a fast storage server. It will take a while to arrive.

The idea is to have this backing up to the slower (disk-based) storage server which has a much larger capacity. The disk-based storage server has more drive bays and hard disks hold more data. Both servers have dual 10GbE networking.

06 July 2019

6th of July

Farm status
Intel GPUs
Five running Einstein 02AS20 work overnight

Nvidia GPUs
Two running Seti work

Raspberry Pis
Twelve running Einstein BRP4 work
Four running Seti work


Ryzen 3 launch
Tomorrow (Sunday the 7th of July) we're expecting the official launch of the 3rd generation Ryzen. AMD have already released some information but not the full line up so it will be interesting to see what models they have available.


Debian Buster release
Today is meant to be the official release of Debian Buster, even though the Raspberry Pi foundation released their version of it, Raspbian Buster, last week.

I downloaded the latest installer (RC2 at the time) from Debian and clean installed it on one of the i3's that I am using as a guinea pig. It couldn't display the desktop properly so I've raised a bug for that. I had a problem with ntp always using the DHCP supplied server which was also raised as a bug. I suspect they have changed the way they start ntp so its possible its just a configuration issue. I can live with the ntp issue for the time being, but the desktop display I consider a show-stopper so I will be sticking with Debian Stretch for a while.


Other purchases
I am sounding out a supplier or two around a flash-based storage server to add to the network. The idea is to have a fast storage server and a slower storage server. I already have a disk-based storage server (ie the slower one) and this will allow for quick file access. I'm waiting on a couple of quotes, but I am sure it won't be cheap, even with a minimal number of SSDs.

I was reading a review of a Gigabyte branded flash-based server R181-NA0 here which prompted me to ask for quotes. It seems nobody sells the Gigabyte ones in Australia.

I had considered the ASUS Hyper M.2 card but even if you put 4 x 2TB NVMe SSD's on it and run them in a RAID configuration you can't get past 6TB usable unless you have multiple cards. The Hyper M.2 only works on their motherboards and needs a PCIe x16 slot.

22 June 2019

22nd of June

Farm status
Intel GPUs
Four running Asteroids work.

Nvidia GPUs
Two running Seti work
One running Asteroids work

Raspberry Pis
All running Einstein work


Memory upgrades
It seems CPDN have finally decided its time to start offering 64 bit apps. I think this was driven in part by most Linux distributions dropping support for 32 bit and they have some new apps that want large amounts of memory. Now that they have done so I added memory to the Intel GPU machines to bring them up to 32Gb.

I had the memory for quite a while, the same brand/model as I originally put into them, but have waited until now to install it as none of the projects need large memory. Run times on tasks seem to have improved slightly as a result of the memory upgrades. I'm not sure why as the CPU's have dual channel memory controllers and they were using two memory slots. I am not complaining, I would just like to know why it made a difference.


Guinea Pig
The i3/AMD cruncher that I mentioned in my previous post has been used as a bit of guinea pig for things. It had a 1TB SSHD in there that I swapped our for a Samsung 850Pro SSD. I swapped out the HD7770 graphics card for a GTX1060 so its now an Nvidia GPU machine. I put Debian Buster on it to see how it behaves.

I am not sure if its Debian Buster or the way I have the screen hooked up, its plugged into the on-board VGA that the i3 provides rather than the ports on the back of the GTX1060. I don't have any spare DisplayPort or DVI-I adapters. It doesn't seem to display correctly. NTP doesn't work properly, they've replaced iptables and the service command is replaced by systemctl commands. Oh and they don't use Xorg any more. Debian have announced they'll be releasing Buster on the 6th of July.

I think I might hold off on upgrading all the machines to Buster until they have had their first point release, which is usually a bunch of fixes that didn't make it in time for the official release.


Asteroids wasn't working
While using the i3 I found out that my app_info for Asteroids doesn't work so I trashed a bunch of work over the last couple of days until I managed to fix it. When I checked none of the GPU machines had done any GPU work for Asteroids, they have only been doing CPU work.

The only reason why I use an app_info is because their server insists on using an app that uses the sse3 cpu instructions instead of avx instructions. The avx app is quite a bit faster. The apps are the same ones supplied by the project, but this way I can specify which one it uses.

10 June 2019

10th of June

Farm status
Intel GPUs
Running Asteroids overnight

Nvidia GPUs
Two running Seti

Raspberry Pis
All running Einstein BRP4 work


Other news
The two GTX 1660 Ti equipped Ryzens are running Seti 24/7. I had one of them lock up and had to power it off. When it came back up it decided to trash a bunch of CPU work units that had been in progress. I have seen this behaviour from them where they just lock up for no apparent reason. Given all four Ryzen machines have done this it seems something specific to them. They've had BIOS upgrades and two of them have also had GPU upgrades. I've even seen it when they are idle. I haven't been able to pin point the cause. Given these four machines are slated for replacement soon I am not going to waste any more time trying to debug the issue.


AMD Linux driver experiment
I have an old HD 7770 graphics card that has been sitting in its box for a few years now. They were released in February 2012 so they are ancient in GPU terms. I decided to fire up one of the i3's as an AMD cruncher in order to see just how bad it is getting AMD's drivers to work under Linux. AMD GPUs are good at running OpenCL apps, much better than Nvidia.

I'll point out I am running Debian and AMD only release their Linux drivers for Red Hat or Ubuntu. Ubuntu is based on Debian so how hard could it be? The hardware part is simple just fit the card into the PCIe slot and plug a 6 pin power cable in. The machine is happy to display through its DVI port without any issue. I install a clean copy of Debian and it complains about missing AMD firmware. Debian have a package called firmware-amd-graphics which fixes that. I install it and reboot and I how have a high-res desktop working.

The next part is to get OpenCL going which is when it all falls apart. First you need to install a few packages from the Debian repo:

sudo apt install build-essential dkms

Now we need to download the latest amdgpu-pro drivers which I did on a windows machine and then stuck them on a USB thumb drive to copy them across. At the time I write this they are called amdgpu-pro-17.40-492261.tar.xz so they need to be unpacked using the command:

tar -xJpf amdgpu-pro-*.tar.xz

At this point you'll have a bunch of .deb files and an install script. You'll notice they have their driver version number (17.40-492261) in all the file names. When they bring out a new version expect these numbers to change. After this we then need to install them one by one, but we don't need all of them just to get OpenCL. We would do the following:

sudo dpkg -i amdgpu-pro-core_17.40-492261_all
sudo dpkg -i libopencl1-amdgpu-pro_17.40-492261_amd64
sudo dpkg -i clinfo-amdgpu-pro_17.40-492261_amd64
sudo dpkg -i opencl-amdgpu-pro-icd_17.40-492261_amd64
sudo dpkg -i amdgpu-pro-dkms_17.40-492261_all
sudo dpkg -i libdrm2-amdgpu-pro_2.4.82-492261_amd64
sudo dpkg -i ids-amdgpu-pro_1.0.0-492261_all
sudo dpkg -i libdrm-amdgpu-pro-amdgpu1_2.4.82-492261_amd64

I got as far as the dkms when it failed to build. AMD only support the current long term release kernel and so it fails under the 4.19 kernel. I think Ubuntu are on the 4.18 kernel at the moment so there isn't much I can do about this.

Looking at the install script the ids-amdgpu-pro* isn't referenced so I suspect its not needed, but seeing as it failed before that point I can't tell.

I will be sticking with Nvidia because at least their drivers are simple enough to install (yes they have a dkms component as well) and work on current release kernels. AMD really need to get their act together with their drivers, they could be moving so much more hardware if they fixed their software.

01 June 2019

1st of June

Farm status
Intel GPUs
Running Einstein O2AS20 work overnight

Nvidia GPUs
Two running Seti work

Raspberry Pis
Running Einstein BRP4 work


BIOS updates
The motherboard manufacturers will update the CPU firmware via a BIOS update. There are other ways of patching them as well such as using the intel-microcode package under Debian.

The Intel CPUs have another security issue referred to as MDS or more commonly known as Zombieload. The AMD machines don't have this particular issue but are doing updates to support the 3rd generation Ryzen CPUs even on older motherboards. I took the opportunity to update all the machines.


Other news
We got a bit of a cold snap in the weather so the two machines with GTX 1660 Ti cards have been running 24/7. This has greatly improved their output.

I resurrected my Milkyway@home account, which I haven't used since 2012 and did a burst of GPU work for them. Their GPU app is written in OpenCL and so is slower on Nvidia cards. The Milkyway simulations took slightly under 4 minutes to complete on the GTX 1660 Ti.


Ryzen upgrades
At Computex 2019 (last week) AMD announced 5 of their 3rd generation Ryzen CPUs. The official specs of these CPUs were somewhat different to the leaks on the internet. We're expecting more official announcements on the 7th of July as that is the release date. Only another 5 weeks to go...

At the moment I am looking at ASUS X570-Pro motherboards with DDR4 3200MHz memory, but I am not sure how much memory because that depends on how many cores they have. Which CPU they'll get is undecided until the rest of the Ryzen line-up is officially announced. I will swap all 4 machines out with the exception of reusing the GPUs so that will mean new cases, power supplies, CPU cooler and NVMe SSD's to replace the hard disks.

The Ryzen 1700's that I currently use are 65 watts and have 8 cores/16 threads. For a GPU cruncher I was hoping for a lower wattage CPU, probably with a lower core count. The X570 chipset uses 15 watts so that will eat any power saved.

19 May 2019

19th of May

Farm status
Intel GPUs
Running Einstein A2AS20 work overnight

Nvidia GPUs
Two running Seti work overnight

Raspberry Pis
Twelve running Einstein BRP4 work.
Four running Seti work.


Other news
The farm has been running overnight as it is still warm during the day. I had the Intel GPUs run some Asteroids work for a couple of days as they are only on 55M credits compared to Einstein on 59M and Seti on 61M. I am trying to spread the project work evenly.

I had two of the Nvidia GPU machines run some Einstein A2AS20 work, but only running 8 tasks at a time which means they took 16 hours to complete. They are first generation Ryzen's and the hyper-threading doesn't give much of a gain. They are next on my list of upgrades.

GPUgrid are looking at beta testing an updated science app. The new app is to support the Turing based GPU's such as my GTX 1660 Ti cards.

Asteroids unfortunately don't have a compatible app, their CUDA Period Search app has been compiled with CUDA 5.5 and doesn't work with the Turing based GPU's. Nvidia are currently on CUDA 10.1.


Mate on Debian
Debian have some of the newer versions of software in a repository called Stretch-backports. In there they happen to have the next release of the desktop that I use called Mate. Stretch is using 1.16.2 and Stretch-backports has 1.20, so I put it on one machine to see how it looks. The desktop didn't change much but I think BOINC Manager looks a lot worse. I have sent a couple of screen shots off to the BOINC developers mailing list suggesting we might need to "tweak" the manager.

Below you can see the same version of BOINC Manager (7.10.2) under Mate 1.16.2 and then under Mate 1.20. Which one do you think is better?

 


11 May 2019

11th of May

Farm status
Intel GPUs
All running Einstein O2AS20 work

Nvidia GPUs
Two running Seti

Raspberry Pis
All running Einstein BRP4 work


Other news
Debian pushed their 418.56 Nvidia driver through to stretch-backports. I now have the GTX 1660 Ti going. I upgraded the other Nvidia GPU machines to this version as well. I updated the Seti Multi-beam GPU app to the latest version (it needed CUDA 10.1 which the 418.56 provided) on all of them as well. The newer Seti app gives a bit of performance boost as well as reducing the number of validation errors.


Debian buster testing
Currently I am installing Debian Buster so I can document the no-desktop bug for Debian, It seems tied to Nvidia drivers, but I am not sure if its also tied to the GTX 1660 Ti or not. I have to install it on a machine with a GTX 1060 to see if it works. And it did, so its something to do with the newer Turing GPU's and the 418.56 driver.

Next I will swap in the GTX 1660 Ti and see if I can break Debian. I will then have to remote into the machine so I can get details for a bug report.


AMD CPUs on sale
There has been a price drop on the 2nd generation Ryzen chips. Thats due to an impending release of the 3rd generation ones. The current rumor has AMD announcing them at CES 2019 which is in June, but more importantly is some (but not all) models will be available for purchase in June with the others coming in the 3rd quarter of 2019. If the rumors are correct the lowest-end chip will still have 6 cores/12 threads and the top end is a 16 core/32 thread chip.

05 May 2019

5th of May

Farm status
Intel GPUs
All running Einstein O2AS20 work

Nvidia GPUs
Two running Seti overnight

Raspberry Pis
All running Einstein BRP4 work


Other news
Einstein have resumed their O2AS20 search and I have it running on my fastest machines. They're taking 12 hours 45 minutes each. They also have a 1 week deadline which means you can't mix them with other projects because of the deadline. Credits are also under, given how long they run for. I am sure the project will address some of these points, although they probably can't do much about the run time.

The GTX 1660 Ti isn't doing anything as I can't install the Nvidia drivers in Debian. See my earlier post "A journey I would rather not go on" for all the details about that. I can't even reinstall the GTX 1060 that I used before because the old driver is broken.

On a brighter note I did install the Mate desktop 1.20 which is the same version as included in Debian Buster (the next Debian release) on one machine and it worked fine.

28 April 2019

28th of April

Farm status
Intel GPUs
Were running Seti, now all on Einstein.

Nvidia GPUs
Two running Seti

Raspberry Pis
All running Einstein BRP4 work


Other news
The weather cooled off over the weekend so I had all the Intel GPU and two of the Nvidia GPU machines running Seti, They reached the goal of 60 million so I have now put the Intel GPUs back to running Einstein work.

Einstein have released (or should that be re-released) their O2OS20 Continuous Gravity Wave search. We started this late last year and then some issues were raised and it was stopped. Only one of the Intel GPU machines has received some, the others got O1OD1 work. Initial processing estimates to complete the O2OS20 are 10 to 11 hours (on an i7-8700 at stock speed of 3.2GHz).
 

GTX 1660 Ti
As you would have gathered from my previous post Debian seem to have issues with their Nvidia drivers and I can't get it going. Just as well I only installed one of them. Hopefully they'll fix their mess soon.

Nvidia also released the GTX 1650 last week so there is now a 430.09 driver to support them under Linux. Nvidia refer to it as a "beta" driver so I suspect Debian will ignore it until there is a release version available.

19 April 2019

A journey I would rather not go on



























The two ASUS GTX 1660 Ti cards arrived. Being eager to get the new toy going I went to install one of them. Hardware installation was fine. I already had a PCIe power cable with a 6+2 pin power connector running the older graphics card. Swapped it out no issue there, power up the machine and got an ASUS logo followed by the Debian desktop. All looking good so far.

I went to check the BOINC logs and it couldn’t work out what model GPU it was, so time to reinstall the current (410) driver. It got part the way through before complaining about unmet dependencies. But its from the repo so why does it have unmet dependencies? I decided to try removing it and rebooting. Oh great no desktop now. At least I can log into the box remotely.

Next I try installing the driver from Debian Buster (the next release). No that has unmet dependencies as well. Lets try the version from Debian Experimental (418.56) as its more up to date. It wants to install 800 updates. Okay last resort before I give up and put the old card back into the machine, lets do a dist-upgrade to get to Debian Buster. Two hours later its finished. Reboot and we have an ASUS logo and the new dark-themed (more like a grey camouflage look) desktop. It still doesn’t recognise the GPU though. Debian Buster still has the 410 driver.

Okay now try installing the driver from Experimental. It installed okay. Lets hold our breath, cross your fingers and reboot. I get an ASUS logo and the camo desktop. Well that bit is still working at least. I check the BOINC log and now it recognises the GPU. Hooray. Lets see if it can be used for compute. I set BOINC to no cache and allow it to fetch work, it downloads 16 CPU and one GPU task. I disable work fetch and watch. The GPU task isn’t moving. Uh oh. Lets give it a bit of time. After about 30 seconds it jumps to 23% done and slowly starts counting up. Looking good. It gets to about 50% and oh crap its gone back to 0% and started counting up again. I keep watching as it gets past 50% and makes its way up to 100% and then uploads. I’m not too sure what happened there but it looks like it worked. I know we’ve gone from CUDA 10.0 to 10.1 with the driver update.

I try to shut it down the following morning once the CPU tasks have finished. I login as root and try to shut it down. “Shutdown now” command not found. Oh wonderful. A bit of googling and I find out we have to use “systemctl poweroff” and “systemctl reboot” now. The service command is also gone, we use “systemctl stop xxx” or “systemctl start xxx” to stop or start services.

Where to now? Next I will update the Seti Multi-beam app. The one I have is CUDA 9 and there is a CUDA 10.1 version. Hopefully that will work, but don’t hold your breath...


Update 25 April 
I raised a bug for Debian. They seem to have fixed the driver dependencies for Experimental and moved it up to Sid. The drivers at Stretch and Stretch-backports are still broken.

I tried re-installing Stretch, upgrading to Buster and then the driver from Sid - The machine hangs at boot time and won't display the desktop at all.

I have also tried downloading the driver directly from Nvidia however to install it you need to get gcc and various other dependencies sorted out by hand.


Update 11 May
Debian have pushed the 418.56-2 driver through to stretch-backports. This works and I have finally got the GTX 1660 Ti running. I even upgraded the driver on the GTX 1060 machines and they are running fine as well.

14 April 2019

14th of April

Farm status
Intel GPUs
All running Seti overnight

Nvidia GPUs
Two running Seti overnight

Raspberry Pis
All running Einstein


Other news
Seti is on 59,418,000 credits so I have the farm concentrating on it so I can reach 60 million.

Seti have 20th anniversary T shirts being organised so I have ordered one.


GPU ordering
The EVGA GTX 1660 Ti that I wanted (the XC Ultra) still aren't available in Australia so I might have to buy another brand. ASUS have a dual-slot card or maybe I will just get the EVGA 2.75 slot one. I will check what is available tomorrow and put an order in this week.

The bad news is the Turing based cards don't work with GPUgrid or Asteroids@home which will restrict which projects I can use them on.

I am thinking I will go back to my original idea of have a dedicated GPU machine, possibly with two graphics cards. Probably an i5-8500T (6 cores/6 threads) with a TDP of 35 watts and two GTX 1660 Ti. For the moment I will just get a pair of 1660 Ti's and swap out two of my GTX 1060's. The dedicated machine can come later.

31 March 2019

31st of March

Farm status
Intel GPUs
Five running Seti

Nvidia GPUs
Two running Seti

Raspberry Pis
Twelve running Einstein and four running Seti


Other news
Its been hot and and its been cool. As its cool at the moment I have most of the farm running.

Einstein O1OD1 work seems to have run out. I had one of the Intel GPU's running it and the others running Seti. The farm is currently running Seti work while Einstein test a GPU version of their gravity wave app.


AMD Ryzen
They will be officially announcing the 3rd generation Ryzen line up and pricing on the 7th of July simply because its the 7th day of the 7th month and they are 7nm CPUs. I will be waiting to see what the official specs are as I look to replace my Ryzen 1700's.

16 March 2019

16th of March

Farm status
Intel GPUs
Running Seti work

Nvidia GPUs
Two running Seti work

Raspberry Pis
12 running Einstein BRP4 work
4 running Seti work


Other news
Nvidia released the GTX 1660 (non-Ti). Its a bit faster than a GTX 1060 and of course cheaper than the 1660 Ti. It has GDDR5 memory, presumably to keep the price down. It also has less CUDA cores (1408) than the Ti (1536).

I completed the install of the six NVMe SSD's into the Intel GPU machines. I swapped out their SSHD's and had to reinstall Linux and BOINC on them all which was a little time consuming.

Einstein seems to have finished off their O1OD1 search which I had the farm concentrating on. While they sort out another search I will concentrate on the other projects for a while.

09 March 2019

9th of March

Farm status
Intel GPUs
5 running Seti. Have been running Einstein

Nvidia GPUs
Off

Raspberry Pis
Off


Crunching news
I have been concentrating on the Einstein O1OD1 crunching. To even things out a bit I am currently running Seti work before going back to Einstein. The data for Einstein's O1OD1 data is expected to be processed in around 12 days. When that happens I will concentrate on the other projects until Einstein get another search run going.


Hardware upgrades
I have ordered a bunch of 1TB NVMe SSD's to put into the Intel GPU machines. After the success of the proxy server I decided the Intel GPU machines would be next. They've already shipped and should arrive next week.

On the hardware front I am waiting on the EVGA GTX 1660 Ti XC Ultra to become available. There doesn't seem to be anyone selling them in Australia at the moment. I have been told that they would take 4-5 weeks after their release to become available.

The next thing on the upgrade list are the Ryzen machines. I am waiting on AMD to release the 3rd generation Ryzen CPUs around the middle of the year. I will be replacing my four existing Ryzen 1700's. I am still deciding which Ryzen CPU to replace them with.

24 February 2019

24th of February

Farm status
Intel GPUs
Running Einstein O1OD1 work overnight

Nvidia GPUs
Two running Einstein O1OD1 work overnight

Raspberry Pis
Off


Crunching news
We had a couple of cooler days with showers so I had the Intel GPU machines doing Einstein work with bursts of Seti thown it.

I also had a couple of the Ryzens running doing Einstein work overnight, as well as running bursts of GPUgrid (short tasks) and Seti.


Other news
The proxy server had its motherboard swapped out. Its got an NVMe SSD which took a couple of goes at installing Linux before behaving. It seems you to have to tell grub to use UEFI mode to be able to boot to the SSD. I also had to flash the BIOS as the SSD wasn't appearing.

The 6th generation i7's were all sold.

I reinstalled Linux on one of the Ryzen 7 machines due to it getting confused about the package dependancies for the Nvidia driver.


GTX 1660Ti officially released
Nvidia finally announced the GTX 1660 Ti. They look like a good replacement for my GTX 1060's so once they become available in quantity I will get some. The current driver in Debian doesn't support them but there is a driver available directly from Nvidia. I don't think they'll be able to get it into the Debian Buster release (expected in March or April) but it may become available soon after via their backports repo.

EVGA seems to have 3 versions of them so I will probably go with the XC Ultra as it has a dual-slot cooler with dual fans (the other two versions have triple slot coolers with a single fan). Its also got the highest boost speed out of the three. There doesn't appear to be any stock available of the XC Ultra at the moment so it might be a while before I can place an order.

03 February 2019

3rd of February

Farm status
Intel GPUs
All off

Nvidia GPUs
All off

Raspberry Pis
Idle


Weather impacting processing
We have flooding in the north of the country and bush fires in the south. Sydney has been hot but we had two days of cooler weather so I managed to run the Raspberry Pis and Intel GPUs for a short period. Its hot so they are off again. It doesn’t look like the weather is going to cool down any time soon.

In other news the parts for the proxy server are awaiting assembly. I also have someone interested in the 6th generation i7’s that I replaced so hopefully he’ll take the lot and that should make some room.

13 January 2019

13th of January

Farm status
Intel GPUs
Off. Have been doing bursts of Einstein work overnight

Nvidia GPUs
Off

Raspberry Pis
Idle. Have been doing bursts of Einstein work overnight


Other news
Parts were ordered last year for the Proxy server, but the CPU got back-ordered. The CPU has now been changed from an i3-8300 to an i3-8100 as they are in stock. The parts are supposed to arrive next week.

Linus Tech Tips did a video called "Using 6000 CPU cores for Science" which was about the LIGO gravity wave detectors. Unfortunately they didn't cover the Einstein project and how we process the data. The link is https://www.youtube.com/watch?v=sCuKuUgNfjA

The weather is effecting processing or work so I am down to turning machines on in the evening, doing a burst of work and powering them off the following morning.


New GPU announcements
As expected Nvidia announced a RTX 2060 graphics card at CES 2019. The specs look quite good apart from the fact the Turing GPU's and the Einstein GPU apps won't work together. They also announced a GTX 1160 which is basically the same card without the Ray Tracing and Tensor cores. We'll have to wait for some of these to come to market but they sound like a better alternative for number crunching.

AMD also announced a Vega VII graphics card which could be useful for number crunching. It was targeted to the gamers and comes with 16GB of video memory. Its built using 7 nanometer fabrication so hopefully won't use too much power.

I expect I will be replacing GPUs in the 1st quarter of this year so now its just a matter of finding the right one to pick.