Farm status
Intel GPUs
All have been running Einstein O1OD1 work overnight
Nvidia GPUs
All off
Raspberry Pis
Twelve running Einstein BRP4 work
Other news
This week has been all about the file server and ZFS.
The Samsung 970Pro SSD arrived along with the Silverstone ECM21 adapter card. The only gripe I have with it is they didn't include a screw to hold the SSD in place but there are holes for it on the card. Cheap skates. I don't have any in my stockpile that fit so I ended up using a cable tie. I have since gone onto eBay and ordered a packet of 12 screws which I hope are the right size.
Installation is straight forward, no drivers are need and Linux saw it as a disk device and gave it a nvme device name. I just used gparted and created a partition table and formatted it. My original idea was to partition it into 2 drives and have one as a ZFS intent log drive (aka ZIL) and the other partition as a cache drive, however ZFS likes to deal in drives rather than partitions so it refused to mount the 2nd partition. I tried it both as a ZIL and as a cache drive. A ZIL doesn't need to be big, about 8GB is enough even for a 10Gbe network, it only needs to hold 5 seconds worth of writes.
I testing by dragging a 3.7GB test folder on and off my Drobo 5N2. I deleted it from the destination before running each test.
The Drobo has 2 x 1Gbe ports bonded together, 4 x 4TB WD red hard disks and a 256GB mSATA SSD. The storage server has a 10Gbe NIC, 4 x 4TB WD SE (enterprise grade) hard drives plugged into an Intel controller in JBOD mode. They're both plugged into the same Netgear switch. One 10Gbe port on the switch was an uplink to a 10Gbe switch which in turn has a 1Gbe link to the router. The other 10Gbe port was plugged into the storage server. The Drobo was plugged into two of the 1Gbe ports. That makes the theoretical maximum speed 1 gigabit/sec or 1Gbe ie the speed of the slowest device in the loop. Assuming a 10 bit signal (start bit, 8 bits data and a stop bit) that means we should be able to push 100MB (that's right megabytes) a second.
The best I could get was 49MB/sec which drops down to around 20MB before climbing back up which is probably the SSD on the Drobo buffering data. That was going from the Drobo to the storage server. Setting it up as a cache drive or ZIL didn't really make any difference to this regardless of which direction I was copying the files. As far as I understand it the ZIL is hardly used due to the asynchronous writes and the cache is only useful for data being read for a 2nd or subsequent time. If I had lots of users or random reads happening then it possibly would have made a difference, but for copying stuff in and out it doesn't make any difference in the speed. Ideally I should have used two devices that can do 10Gbe so as to remove the network speed from the equation.
I know in the reviews they recommend the Intel Optane 900P as a ZIL device but they're AUD $529 for the 280GB model, compared to the $250 that I spent on the Silverstone adapter and Samsung 970 Pro (512GB). That's double the price for half the capacity. You really need two devices for the ZIL to ensure data integrity as well.
I really liked Linus Tech Tips NVMe based server (a Supermicro SSG 2028R I believe) connected to his hard disk based server which sounds cool. Just connect them by that 40Gbe Mellanox controller and a short bit of fibre optic cable between the two and move the files off the fast storage to the slower one at lightening speed, or at least as fast as the hard disks can go. Maybe I should just buy an SSG 2028R...
If anyone has any suggestions on how to optimise it without throwing too much more hardware at it let me know. As always I'm happy to hear any comments.