08 April 2018

8th of April

Farm status
Intel GPUs
All off

Nvidia GPUs
All off

Raspberry Pis
All running Einstein BRP4 work


Other news
This last fortnight has been all about the Raspberry Pis. Its still too hot to be running the other machines so I have been concentrating on the little ones.

First off was the arrival of the 11 Pi3 model B+ and swapping out the Pi3 model B’s. First problem was a lack of heatsinks. I put as many into service as I could (5 of them) and ordered more heatsinks. Once heatsinks arrived I then decided I would use new SD cards rather than reusing the ones from the older Pis. A trip to the shops fixed that. Then a late night imaging a bunch of SD cards and firing up each Pi3B+ and installing the software.

Because I now had a bunch of spare Pi3 model B’s I decided I would use one of them as a NFS server in conjunction with the PiDrive that wasn’t doing anything. That made life a lot easier as I can now just copy various config files from it into the appropriate directories instead of what I used to do (manually edit file and cut and paste). I know I tried setting up an NFS server a couple of years ago but it wasn’t reliable. This time it seems a lot better.

At the moment I have upgraded 9 out of 10 number compute nodes and one support node. I have one more compute node left to swap over that is finishing off the work it has which takes around 11 hours.

I looked at the 3rd Pi^4 case that I had and thought why not put the two other compute nodes, currently in official Pi cases, into the Pi^4 case and get another two Pis. And while I am at it lets replace the Pi3B that is running the NFS with a 3B+ as well. I can feel the need to order more parts.

I broke a stand-off in one of the Pi^4 cases due to the screw holding the Pi3B in getting stuck. The head of the screw was stripped so the screwdriver couldn’t get a grip. In the end I had to deliberately break it to get the old Pi out. The M2.5 screws are so tiny and the metal isn’t hard so its easy to strip the head on them. I took half an hour just to get the piece of stand off and screw separated. Needless to say that screw got thrown away. I will have to glue the stand-off into the case now.


HT Condor
I have been using the freed-up Pi3B’s to experiment a bit with HT Condor. Its the software they run on a real cluster for scheduling batch jobs and its available in the Raspbian and Debian repositories. The HT stands for High Throughput. All was going fine until I enabled the firewall. After that I can’t get the components to talk to each other so I am trying to resolve that.

A number of compute clusters run HT Condor and have BOINC as a backfill task, that is if the cluster doesn’t have anything else to run it will start up a single instance of BOINC for each available core on each compute node. I don’t think thats going to work too well with the Pis due to the lack of memory however it should work on the larger machines which don’t have the memory constraints.

No comments: