25 April 2018

25th of April

Farm status
Intel GPUs
Running Einstein and Seti work

Nvidia GPUs
Two running Seti work

Raspberry Pis
All running Einstein BRP4 work


Other news
I have been able to run some bursts of Seti work and Einstein overnight (their tasks take 9 to 10 hours). Einstein has started processing their 2nd observation run of Ligo gravity waves so that will keep things busy for 3 or 4 months. They have access to the Atlas cluster as well as what additional processing power the rest of us add.

AMD released the Zen+ CPUs which use a slightly smaller process size (12 nano meter) and so has allowed them to increase the speed of their CPUs while using the same power as before. They have also tweaked the design a bit to improve the cache hit rates. I don’t think its worthwhile upgrading just for a speed increase.

I still haven’t managed to swap out the i7-6700’s for the i7-8700’s as my PC installer guy seems to have disappeared. Time to find another one I think. At the moment I have one i7-8700 running and seven i7-6700’s which are a bit faster but less cores. There are another five i7-8700’s sitting in boxes.

The Nvidia GPU machines got an updated driver so I have installed it on one machine. The last time they pushed an updated driver out it kept dropping the GPU into low power mode so I am a bit wary of updated drivers as its difficult to get back to the previous version.

08 April 2018

8th of April

Farm status
Intel GPUs
All off

Nvidia GPUs
All off

Raspberry Pis
All running Einstein BRP4 work


Other news
This last fortnight has been all about the Raspberry Pis. Its still too hot to be running the other machines so I have been concentrating on the little ones.

First off was the arrival of the 11 Pi3 model B+ and swapping out the Pi3 model B’s. First problem was a lack of heatsinks. I put as many into service as I could (5 of them) and ordered more heatsinks. Once heatsinks arrived I then decided I would use new SD cards rather than reusing the ones from the older Pis. A trip to the shops fixed that. Then a late night imaging a bunch of SD cards and firing up each Pi3B+ and installing the software.

Because I now had a bunch of spare Pi3 model B’s I decided I would use one of them as a NFS server in conjunction with the PiDrive that wasn’t doing anything. That made life a lot easier as I can now just copy various config files from it into the appropriate directories instead of what I used to do (manually edit file and cut and paste). I know I tried setting up an NFS server a couple of years ago but it wasn’t reliable. This time it seems a lot better.

At the moment I have upgraded 9 out of 10 number compute nodes and one support node. I have one more compute node left to swap over that is finishing off the work it has which takes around 11 hours.

I looked at the 3rd Pi^4 case that I had and thought why not put the two other compute nodes, currently in official Pi cases, into the Pi^4 case and get another two Pis. And while I am at it lets replace the Pi3B that is running the NFS with a 3B+ as well. I can feel the need to order more parts.

I broke a stand-off in one of the Pi^4 cases due to the screw holding the Pi3B in getting stuck. The head of the screw was stripped so the screwdriver couldn’t get a grip. In the end I had to deliberately break it to get the old Pi out. The M2.5 screws are so tiny and the metal isn’t hard so its easy to strip the head on them. I took half an hour just to get the piece of stand off and screw separated. Needless to say that screw got thrown away. I will have to glue the stand-off into the case now.


HT Condor
I have been using the freed-up Pi3B’s to experiment a bit with HT Condor. Its the software they run on a real cluster for scheduling batch jobs and its available in the Raspbian and Debian repositories. The HT stands for High Throughput. All was going fine until I enabled the firewall. After that I can’t get the components to talk to each other so I am trying to resolve that.

A number of compute clusters run HT Condor and have BOINC as a backfill task, that is if the cluster doesn’t have anything else to run it will start up a single instance of BOINC for each available core on each compute node. I don’t think thats going to work too well with the Pis due to the lack of memory however it should work on the larger machines which don’t have the memory constraints.