16 May 2015

16th of May

Farm news
The Intel-GPU machines have all been doing Climate Prediction work. Well they were up until this afternoon when the CPDN database went off-line so I can't report work or do trickles, I have suspended them and we're concentrating on Asteroids work now.

I noticed a lot of the CPDN work units taking 165+ hours to complete when they normally take 110 hours. I put this down to CPU cache contention running 8 at a time. I am now running only 4 at a time per machine.

I did manage to do some GPUgrid work during the week. I was also testing a fix to BOINC to do with reuse of slot directories. The official fix will be coming in 7.6, at the moment we are testing the user option settings.

The Raspberry Pi's still haven't officially switched to Debian Jessie yet despite it becoming the stable release a week ago. Hopefully they will get there in a couple of weeks.


File server upgrade
The file server is off in the shop for an upgrade. The shop have already encountered their first problem. As a result they now have it until next weekend. The CPU cooler doesn't have the right mounting bracket for an Intel socket 2011-v3 so they will need to get one in. The case was full of dust despite cleaning it regularly and dust filters. I am taking the opportunity to replace all the case fans. They are still working, its more preventative maintenance seeing as the server runs 24x7.

I still need to find a suitable RAID controller for the file server that can do RAID 6 or better. The motherboard supports RAID 5. Statistically a second drive will fail around the same time as the first one and RAID 5 can only handle losing one drive so they recommend RAID 6 or better these days.

02 May 2015

2nd of May

Farm news
This week had a bit of the usual crunching, some Linux upgrades and a hard disk failure. Oh and finalising the file server upgrade.

Crunching is continuing for the Climate Prediction ANZ work units which take about 5 days to complete (each). The Intel GPU part of the cluster is running them. I also have been running a few GPUgrid long work units that take about 11 hours on the GTX970 and Asteroids work on the CPU cores.

Debian Jessie was released to the public along with Ubuntu Vivid. I was already running Jessie on the Raspberry Pi's but they haven't officially updated to Jessie yet. I tried upgrading the Parallella's to Vivid by going to Utopic (14.10) and then upgrading it to Vivid (15.04) but that failed. I had to reimage the SD card back to 14.04 and then update to Utopic. I suspect the kernel is too old as it been stuck on 3.12.0 for quite a while.

The hard disk in one of the GPUgrid crunchers failed after it had been running overnight. Fortunately I have a few spares so its been swapped out with another of the same vintage. I had to reinstalled windows, BOINC and a few other apps. It had a WD Black manufactured in June 2012 and they have a 5 year warranty. The on-line retailer has gone out of business so the only option is to return it to Malaysia. Given the postage cost its not going to happen. I am surprised that WD don't have a Australian distributor or a collection point.


File server upgrade
The other thing this week had me chasing up the computer shop in regards to the file server. I can't get the CPU I was originally after. Intel only shows them being available in trays, which means buying 50 or 100. I settled on the next best CPU a 6 core/12 thread Xeon with an 83 watt power rating. Memory unfortunately has to be ECC and DDR4 so that is costing a bit.

I have also ordered some 4Tb WD Se drives to go into the file server. I can then reduce the number of drives and still have more disk space. They will be in a 3 drive RAID 5 configuration which should give around 8Tb of usable space.

24 April 2015

23rd of April

Farm news
For the last few days Sydney has been hammered by a storm with strong winds and rain.

Meanwhile crunching continues. Its quite appropriate that I am running climate models. All the Intel GPU machines are running CPDN models, mostly the ANZ (Australian and New Zealand) ones that take around 120 hours with a few short models that only take 50 hours.

I managed to also get a burst of GPUgrid work done. They keep running out of their short work units which take around 3 hours so I opted in to the long work units that take 9-12 hours. I ran a few before I had to button up the house due to the storm.


More upgrades
The continual upgrades keep happening. This time I am looking at the file server. It will get a new motherboard, CPU and I will steal the memory out of one of the 6 core/12 thread machines, in return it will get faster memory. The rest of the parts will get reused.

The end result will be a file server that can expand as the new motherboard has 10 SATA III ports (old has 6 SATA II and the're all in use). It has a few PCIe slots (old only has one), it supports disks larger than 2Tb and the CPU power drops from 120w to 85w.

I will update the hard disks at some later date to bigger capacity ones but less of them. It only takes 3 drives to make a RAID 5 array. There is no urgency to replace the existing drives which are 4 x 2Tb.

17 April 2015

17th of April

Farm news
The weather has been cooler for most of this week so I have had the Intel GPU machines running climate models (still going they're up to 113 hours so far) and some Asteroids work. Asteroids have finally fixed their missing files issue so work is now flowing again.

CPDN announced that they will only target 1 particular platform (Windows, Linux or Mac) for each type of climate model in future to save on development and improve their reliability. I would think that it may be easier to issue the work units as VirtualBox VM images so they don't need to get involved in which operating system to target.


Intel driver update
Intel released driver 10.18.10.4176 for the HD4000 so I was trying it with Einstein. It actually seems to work. The last few releases from Intel haven't worked. I didn't do many work units but managed to get the BRP4 work units done and then some Parkes PMPS XT (aka BRP6) work units. The bad news is its quite a bit slower than the (recommended) 10.18.10.3621 driver. I didn't try it with Seti and have since gone back to the 3621 driver as its faster.


BOINC testing
We got an early look at the preference changes in 7.5.0. They seemed to work fine but I have suggested some cosmetic changes. Others have also asked for additional settings such as a "in use" and a "not in use" set of preferences. No word yet on them coming or not.


Windows updates
Got a few fixes again for patch Tuesday as its known. There was the usual run around and update the farm. Also a few for the Raspberry Pi's (Debian Jessie).

While that has been going on I have been trying to get the windows time software (w32time) to behave and keep the PC's clocks more accurate. Microsoft chose to do their own version of the ntp client that works somewhat differently from the standard ntp software. Anyway after fiddling with a few things and using google a lot I have them working as they should be.