11 November 2018

11th of the 11th (month)

Farm status
Intel GPUs
All have been running Einstein O1OD1 work overnight

Nvidia GPUs
All off

Raspberry Pis
Twelve running Einstein BRP4 work


Other news
This week has been all about the file server and ZFS.

The Samsung 970Pro SSD arrived along with the Silverstone ECM21 adapter card. The only gripe I have with it is they didn't include a screw to hold the SSD in place but there are holes for it on the card. Cheap skates. I don't have any in my stockpile that fit so I ended up using a cable tie. I have since gone onto eBay and ordered a packet of 12 screws which I hope are the right size.

Installation is straight forward, no drivers are need and Linux saw it as a disk device and gave it a nvme device name. I just used gparted and created a partition table and formatted it. My original idea was to partition it into 2 drives and have one as a ZFS intent log drive (aka ZIL) and the other partition as a cache drive, however ZFS likes to deal in drives rather than partitions so it refused to mount the 2nd partition. I tried it both as a ZIL and as a cache drive. A ZIL doesn't need to be big, about 8GB is enough even for a 10Gbe network, it only needs to hold 5 seconds worth of writes.

I testing by dragging a 3.7GB test folder on and off my Drobo 5N2. I deleted it from the destination before running each test.

The Drobo has 2 x 1Gbe ports bonded together, 4 x 4TB WD red hard disks and a 256GB mSATA SSD. The storage server has a 10Gbe NIC, 4 x 4TB WD SE (enterprise grade) hard drives plugged into an Intel controller in JBOD mode. They're both plugged into the same Netgear switch. One 10Gbe port on the switch was an uplink to a 10Gbe switch which in turn has a 1Gbe link to the router. The other 10Gbe port was plugged into the storage server. The Drobo was plugged into two of the 1Gbe ports. That makes the theoretical maximum speed 1 gigabit/sec or 1Gbe ie the speed of the slowest device in the loop. Assuming a 10 bit signal (start bit, 8 bits data and a stop bit) that means we should be able to push 100MB (that's right megabytes) a second.

The best I could get was 49MB/sec which drops down to around 20MB before climbing back up which is probably the SSD on the Drobo buffering data. That was going from the Drobo to the storage server. Setting it up as a cache drive or ZIL didn't really make any difference to this regardless of which direction I was copying the files. As far as I understand it the ZIL is hardly used due to the asynchronous writes and the cache is only useful for data being read for a 2nd or subsequent time. If I had lots of users or random reads happening then it possibly would have made a difference, but for copying stuff in and out it doesn't make any difference in the speed. Ideally I should have used two devices that can do 10Gbe so as to remove the network speed from the equation.

I know in the reviews they recommend the Intel Optane 900P as a ZIL device but they're AUD $529 for the 280GB model, compared to the $250 that I spent on the Silverstone adapter and Samsung 970 Pro (512GB). That's double the price for half the capacity. You really need two devices for the ZIL to ensure data integrity as well.

I really liked Linus Tech Tips NVMe based server (a Supermicro SSG 2028R I believe) connected to his hard disk based server which sounds cool. Just connect them by that 40Gbe Mellanox controller and a short bit of fibre optic cable between the two and move the files off the fast storage to the slower one at lightening speed, or at least as fast as the hard disks can go. Maybe I should just buy an SSG 2028R...

If anyone has any suggestions on how to optimise it without throwing too much more hardware at it let me know. As always I'm happy to hear any comments.

03 November 2018

3rd of November

Farm status
Intel GPUs
All off

Nvidia GPUs
All off

Raspberry Pis
12 crunching Einstein BRP4 work


File server
I’ve converted it over to Linux and that is when the problems started. The RAID controller wasn’t recognised, even in JBOD mode so I removed it and plugged the drives directly into the motherboard SATA ports. I setup ZFS on the 4 drives without any issues. When I copied the files back it took ages and was indicating it was only doing 75MB/sec write speed which I think is pretty slow. Maybe the ZFS overheads are such that is a good figure.

I managed to get the RAID controller going but write speeds are still 75MB/sec. I have ordered a PCIe to NVMe adapter and a NVMe SSD to use as a cache and logging drive. NVMe SSD’s are somewhat faster than SATA which only go up to 6 gigabit/sec. Hopefully this will speed things up. I'm not sure if I need a second one (ie one for cache and another for logging).


Networking updated
I ordered and received a couple of 10Gbit network switches (the ones with 2 x 10Gbe and 8 x 1Gbe) and put them in. I also got a full 8 x 10Gbe switch to plug into the router so that I can get to use this higher speed down to the smaller switches and into the file server. I still have an extra 10Gbe network card that I could put into the file server.


New LIGO search
Einstein have a new search they are conducting on the LIGO O1 data. Its CPU only. I did some testing to see how it performs on the Intel GPU machines using all available threads versus using half of them. It certainly does more work using all threads.

I started doing the same on the AMD machines but after the first batch running on half the threads there isn't any new work available. Being 1st generation Ryzen’s they are pretty poor on the hyperthreading front, whereas Intel have had quite a few generations to optimise their designs.

21 October 2018

21st of October

Farm status
Intel GPUs
Six i7-8700's running Asteroids and Seti

Nvidia GPUs
Off

Raspberry Pis
12 running Einstein BRP4 work


Other news
I got a PC assembler out and have now got all six of the i7-8700's swapped in. The i7-6700's that I used before are now decommissioned. That brings the Intel GPU part of the farm up to 36 cores/72 threads. They're slower than the 6th generation but due to the increased core count still produce more work.

With the new machines I also had to install Linux which is easy enough to do when I follow my own step-by-step instructions. I then discovered a slight problem with the way I setup BOINC on them so had to reinstall it on the existing machines. I am now working around a quirk of the Asteroids project server giving some machines the SSE2 or SSE3 app which is slower than the AVX version.

The ECC memory for the file server arrived and has been put into it. There is no obvious benefit at the moment until I get it running Linux and ZFS. That is the next step for it. I may get a SSD to use as a cache drive. I did use one of the other machines and a couple of external USB drives to practice getting ZFS running. Things left to try before I convert it are setting up windows file shares using samba and getting the UPS recognised.


The new router/gateway device also showed up so that is something else for me to work on.

13 October 2018

13th of October

Farm status
Intel GPUs
Two i7-8700’s running Seti
Three i7-6700’s running Asteroids and Seti

Nvidia GPUs
Two running GPUgrid and Seti

Raspberry Pis
Twelve running Einstein BRP4 work


Supermicro BIOS update
I haven’t updated the Supermicro BIOS since I got it. It came with a BIOS version 1 and they are up to version 3 now. I decided its about time it got updated due to all the Intel Spectre patching that is happening and download the latest from their website. The instructions say to put it on a DOS boot disk and run it. How to create a DOS boot disk in this day and age? I have a USB floppy drive and MS-DOS 6 upgrade diskettes, but even that doesn’t work.

After a lot of googling and coming up with suggestions that don’t work I ran across one that actually worked by using Rufus to create it. After that it was fairly straight forward apart from the couple of times when the machine rebooted and appeared to have died (it powers off), but leaving it to do its thing it got there. I expected a reboot after applying the BIOS but not two while booting up again.


Load balanced internet
After last weeks issues with the Telstra internet connection not working to US destinations but the other one did I had a bit of a look at load balancers. It seems they are usually combined with a network firewall appliance so I might get one of those. I don't think the firewall built into my current routers is particularly effective so it should also improve network security.

I need to get the Telstra internet connection changed back to ADSL and get another modem to get it going. Personally I would like to get rid of the home phone which would solve the problem of scam phone calls from overseas, but the wife wants to keep it. This would all change if/when the NBN comes around as there is no point in having two phone lines when NBN offer speeds up to 100Mbit on a single connection. They don't think they be doing my area until 2019.


File server changes
One of the other projects I have is to replace the file server, however I am looking at reusing the existing one running under some derivative of Linux in order to move away from Microsoft products. That was one of the reasons to update the BIOS. I plan on setting the RAID controller to JBOD mode and then using ZFS for its file system, however the server doesn’t have ECC memory which is recommended for ZFS file systems. It can take it but I didn’t buy it at the time due to cost.