28 December 2020
28th of December
CPU only
Running Einstein work
Intel GPUs
Off
Nvidia GPUs
Off
Raspberry Pis
Running Einstein work
Farm news
Another hot week so not much crunching done.
Rosetta has some issue where it doesn't have any work to send despite the server status page saying there are 11 million work units ready. In the mean time everything that is running is doing Einstein work.
19 December 2020
19th of December
CPU only
Running Rosetta
Intel GPUs
Four running Rosetta
Nvidia GPUs
Off
Raspberry Pis
Six running Einstein. Three running Einstein and Rosetta (50/50 split).
Other news
Its been a hot and humid week so everything except the Pis have been off. Today is a somewhat cooler 24 degrees C with drizzle on and off (ie 100% humidity). I've fired up a few of the Intel GPU machines and have them doing Rosetta.
I am still waiting for the 2nd Ryzen 5900X to be shipped. The supplier status is "We are expecting a small shipment this week. Once we have confirmed numbers, we will update with an expected new queue position. Queue is just shy of 50% filled since launch". I ordered the second 5900X on the 6th of November so its been six weeks so far. Maybe Santa will bring a new CPU for christmas.
Next year plans
I was thinking of rationalizing the farm a bit and bringing it up to 4 x Ryzen 5900X machines and get rid of the Intel GPU machines completely (6 x i7-8700). That reduces the physical machines while not losing too many CPU cores.
Another area that I would like to look at is an AMD GPU machine or two. Currently there are 4 x Ryzen 5600 with Nvidia GTX 1660Ti in the farm. I haven't used AMD GPUs before due to issues with their drivers under Linux.
Ampere Altra
They are now shipping. Some reviews have started to surface. Apparently the 80 core 3.3Ghz model is faster than an AMD Threadripper at compute performance. I made inquiries about getting one but never heard back from my server supplier. I will have to ask again.
12 December 2020
12th of December
CPU only
Running Rosetta
Intel GPUs
Off
Nvidia GPUs
Ran some Rosetta. Currently off
Raspberry Pis
Pi3s running Einstein. Pi4s running Einstein and Rosetta.
For news on the Raspberry Pis see Marks Rpi Cluster
GPU issues
I went to use a couple of the Nvidia GPU machines and found that the GPU was missing in BOINC. This is caused by the "GPL Condom" code added to the 5.9 Linux kernel. It blocks drivers that use both GPL and proprietary symbols. All of my Nvidia GPU machines were on the 5.9.6 kernel.
The display part works fine, its the CUDA and OpenCL capabilities that are blocked, which means I can't use the GPUs for compute work. To work around this I ended up down-grading two of the machines to the 4.19 kernel that Debian buster is currently running. While its easy to select an earlier kernel on the Grub boot menu, its another thing to remove the newer kernel(s) from the machine without reinstalling and that took a bit of experimenting.
24 November 2020
Ryzen 5900X issues
Ryzen 5900X update
As I mentioned in my previous post I received one Ryzen 5900X CPU. I have replaced one of the Ryzen 3900X with it. Before swapping out the old CPU I did a BIOS update to get the latest AGESA (1.1.0.0) to support the Ryzen 5000 series. The Linux kernel doesn't like it. When it boots up it throws quite a few of these messages:
EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
It still manages to start up though. I haven't tried doing any crunching with it.
From internet searches it would seem there are upstream patches for device 0x1650 to the EDAC but I'm not sure what kernel one needs to be running to get it. The machine is running a 5.8.10 kernel. I tried 5.9.6 from Debian bullseye but that didn't help. There is a 5.9.9 in testing which I haven't tried. If you run Linux check you can get a (very) recent kernel before you install your nice shiny new Ryzen.
Update 1 Dec 2020
I should point out that the machine still works despite the 24 EDAC errors above.
I have raised a bug with Debian but haven't had any response. I don't know which kernel will contain a fix. I tried the 5.9.9 kernel and that didn't help.
Debian have an upcoming release of bullseye, expected in March 2021. The 5.10 kernel has been listed as an LTS (Long Term Support) version so I wouldn't be surprised if they put the 5.10 kernel into bullseye and hopefully it contains the necessary patch.
Update 19 Dec 2020
Debian closed the bug. The patch is in the 5.10 rc7 kernel. While 5.10 has been released by the kernel team it had a couple of bugs with the RAID code so they released a 5.10.1 already. Debian experimental have this at the moment and I expect it will be pushed up into bullseye in time for its release.
21 November 2020
21st of November
CPU only
Running Rosetta work.
Intel GPU
Off
Nvidia GPU
Off
Raspberry Pis
Three running Rosetta and Einstein. Six running Einstein.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
Most of the week has been hot so the farm was off, apart from the Pis. A cool change came through yesterday so I have the CPU only machines running Rosetta.
I received one of the Ryzen 5900X CPU's that I ordered. When one of the CPU only machines finish their work I will swap out its Ryzen 3900X. I have not had any update from the other supplier for the second CPU that I ordered.
The ELCB (Earth Leakage Circuit Breaker) aka safety switch has been tripping off a few times this week. I have two power circuits that have safety switches and its always the same one. Trying to isolate which power point is causing it has been impossible, other than its on the circuit that powers all the computers, most of which have been off this past week. I should split the computers across two circuits to spread the load but that means rewiring the place which I am trying to avoid.
06 November 2020
Ryzen 5900X sold out on day one
Last night I stayed up until 1AM for the Ryzen 4th generation official availability. I used two of my regular PC suppliers web sites.
I am searching for the Ryzen 5900x and around 1:05AM I get a hit, add it to my cart and go straight to the checkout. I got as far as the payment method and then it tells me there is no stock, even though it was in PreOrder status. Tried again and same thing, so I guess the bots got them all.
There was one up on eBay for $1600 plus $42 shipping from Germany around 1:10AM.
The other PC supplier web site didn't show them until I checked again at 7:30AM this morning. It has them as Order Only status and they wanted $20 above the recommended retail price. Its now 10:45AM and the price is back to the recommended retail of $859 AUD so I guess I will order one now.
Update 8 Nov 2020
From one computer shop that I placed an order with around 11AM. Apparently they have plenty of 5600X and 5800X. It looks like I am going to have to wait quite a while for a 5900X though.
Ryzen 9 5900X
Very high demand and limited stock. Our shipment arriving early next week will cover only ~15% of orders placed up to 9am Friday. We are waiting on updates from AMD for what we can expect to receive throughout November.
Ryzen 9 5950X
High demand and limited stock. Our next shipment due early next week will cover ~60% of orders placed up to 9am Friday. We are waiting on updates from AMD for what we can expect to receive throughout November.
31 October 2020
31st of October
CPU only
Running some Einstein and Asteroids work
Intel GPUs
Off. Did some Rosetta work earlier in the week.
Nvidia GPUs
Off. Did some GPUgrid and Rosetta earlier in the week
Raspberry Pis
Running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
Rosetta ran out of work. One of the nice things about their work units, when you can get them, is they run for 8 hours so it makes it easier to run work overnight.
I mentioned in my last blog post that I was looking at updating the SSD's in the CPU only machines. I have ordered a couple of 1TB Sabrent Rocket SSD's to go in them. Hopefully they will alleviate the I/O bottleneck when running Rosetta work.
Linux bugs
I updated the Intel GPU machines with the latest kernel available in buster-backports, which is 5.8. After applying the updates the machines now switch to 640x480 resolution when one logs out. It switches back to the correct resolution (1680x1050) when I log in, but its really big (and annoying) on a 22 inch monitor. The Nvidia GPU machines aren't doing this.
There is a 5.9 kernel already in the testing release, so hopefully that will fix it. The 5.10 kernel is meant to be a long-term release (that is they'll fix bugs for a couple of years) so I think they will push the 5.10 kernel through fairly soon now that development on it has completed.
Update 3rd of November
Sabrent Rocket SSD's installed in both of the Ryzen 3900X machines. They seem faster than before.
25 October 2020
25th of October
CPU only
Running Rosetta overnight only.
Intel GPUs
Off
Nvidia GPUs
Two running GPUgrid and Milkyway overnight
Raspberry Pis
Three running Rosetta and Einstein. Ten running Einstein.
Other news
Its been warm so the farm is running work overnight when its cool enough.
When starting up 24 Rosetta tasks that there is quite a delay before they get going. I believe this is an I/O bottleneck because if I stagger the start up there isn't any delay. I have been looking at the Sabrent Rocket 4 NVMe SSD to replace the existing gen 3 SSD's in them.
I am looking at getting Ryzen 5900X to replace the 3900X CPU's when they become available on the 5th of November. I'm not sure if I will be able to get two of them as I expect they'll be very hard to come by straight after launch. I should be able to swap out the CPU and keep everything else in the machines as is.
27 September 2020
27th of September
CPU only
Running Rosetta work.
Intel GPUs
Off. Did some Rosetta overnight.
Nvidia GPUs
Running bursts of GPUgrid and Milkyway work.
Raspberry Pis
Three running Einstein and Rosetta work. Ten running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Updated CUDA
Nvidia released CUDA 11.1 to support the RTX30 cards. Not that anyone can get them at the moment.
There are reports some of the 3rd party cards are having issues due to changes they made to the power delivery sub-system. JayzTwoCents did a video on this where he tears down a number of RTX30 cards where you can see what some of the manufacturers did. The Nvidia Founders Edition cards are not effected.
There are reports the GPUgrid ACEMD3 app fails with unknown arch so hopefully they won't take too long to update it.
Einstein (OpenCL) apps work although may not be as fast as they could be.
So where does that leave me? I tend to buy GPU's that are around the 100-120 watt power level which the current cards are well above. There is a rumored RTX 3060 which might fit my needs. It will be some months until supply is sorted out and we don't have any details of the 3060, which means more waiting.
19 September 2020
19th of September
CPU only
Running Einstein gravity wave work overnight
Intel GPUs
Off
Nvidia GPUs
Two running Einstein work overnight and some Milkyway work.
Raspberry Pis
Running Rosetta and Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
BOINC 7.16.11 released
The Windows and MacOS versions of the BOINC client were released on the 8th of September. I asked the Linux maintainers to push the 7.16.11 client to buster-backports for Debian. They already had it in buster (the next release of Debian). Gianfranco has a ppa for Ubuntu which you can find here
I have updated all of my machines today, except for the Pi3's which are still on BOINC 7.14.2 due to not pointing to buster-backports.
Other news
I spent last weekend building both the proxy server and submit nodes. I had quite a bit of trouble with the submit node trying to get the hard disk drives in.
The submit node is in a Fractal Designs Meshify C case that can hold an ATX size motherboard. The drive cage doesn't allow enough room at the rear of the cage for power cables. The cage itself is removable and held in by 4 screws from the bottom of the case. I had to move it approximately 1 inch to the right. I ended up having to drill two holes in the bottom of the case and two into the drive cage for the screws.
05 September 2020
5th of September
CPU only
Running Rosetta work
Intel GPUs
Off
Nvidia GPUs
Off
Raspberry Pis
Three running Rosetta and Einstein (50/50). Ten running Einstein
For news on the Raspberry Pis see Marks Rpi Cluster
The last week
We had a couple of warm days (it got up to 30 degrees C) where only the Raspberry Pis were running. It is only the start of spring. I have since got the CPU only machines going on Rosetta work.
New additions
I have ordered parts for two new machines, some of which have already arrived. One will be a replacement for the proxy server and the other is intended to be used as a HT Condor submit node.
Proxy server
Case | Fractal Design Node 1100 (from old one) |
PSU | Seasonic (from old one) |
Motherboard | Asus B550M-K |
CPU | Ryzen 3300X |
Cooler | Noctua NH U9S |
Memory | HyperX Predator DDR4-3200 2 x 8GB (spare) |
Network | 2.5GbE NIC (spare) |
Storage | 512GB Samsung 970Pro (from old one) |
Submit node
Case | Fractal Design Meshify C |
PSU | Seasonic G series 360w (spare) |
Motherboard | Asus X570-P/CSM |
CPU | Ryzen 3600 (spare) |
Cooler | Noctua NH U9S |
Memory | Kingston DDR4-2400 ECC 2 x 16GB |
Network | 10GbE Asus XG-C100C (spare) |
Storage | 512GB Samsung 970Pro and 2 x 8TB Toshiba X300 HDD (spare) |
The ECC memory seems to be stuck at the courier depot. They say they attempted to deliver yesterday, but with 3 people at home none of us heard anything. They usually leave a card in the letterbox but didn't this time so I suspect they didn't even try.
As you can see I have reused parts where possible to keep the cost down. I might get a Meshify C Mini case for the proxy server rather than reusing the old case. I have 3 Toshiba X300 drives as spares but the case only has two 3.5" HDD bays.
Update: 10 Sep 2020
The ECC memory turned up the following day without me having to do anything. The other parts have arrived. I forgot the X570-P motherboard so had to place another order for it and the Meshify C Mini case. I am now waiting on them to arrive.
30 August 2020
August 30
CPU only
Running Einstein work.
Intel GPUs
Idle
Nvidia GPUs
Off
Raspberry Pis
Running Einstein and the occasional Rosetta work.
For news on the Raspberry Pis see Marks Rpi Cluster
Radio BOINC Network
They do regular podcasts about BOINC and the various projects. To quote their website "BOINC Radio is a participatory podcast hosted on the BOINC Network Discord server". You can join in and ask questions if you want, or just listen to the podcasts at your leisure. Their website is https://www.boinc.network/
Idling along
The weather is warming up. Its almost Spring in Sydney so the farm is idle during the day and processing overnight when its cooler.
As mentioned in my last post the workload is transitioning from all Rosetta to a mix of Rosetta and other projects.
23 August 2020
23rd of August
CPU only
All running Rosetta work.
Intel GPUs
All running Rosetta work.
Nvidia GPUs
Off.
Raspberry Pis
Three running Rosetta. Ten running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
Replacement KVM's and cables arrived. I swapped out one of the old KVM's and I can now see! It still has an issue with the colors on one of the machines but if I wiggle the VGA plug on the DVI to VGA adapter it works, so I think that might be the adapter. It looks like I have to get another couple of DVI to VGA adapters.
Rosetta doesn't seem to be doing much Covid-19 related work these days. They have moved onto other research judging from the names of the work units. In the next week or two I will resume the other projects that I paused. I will still run Rosetta work but it will get less resource share.
Oh and there is an Asteroid designated 2018VP1 that could have a potential impact with the earth on the 2nd of November. Its only 2 meters in diameter and currently rated as a 0.41% chance according to NASA. That might change as they make more observations.
16 August 2020
16th of August
CPU only
Running Rosetta work.
Intel GPU
Idle. Did some Asteroids and Rosetta.
Nvidia GPU
Off.
Raspberry Pis
Three running Rosetta. Ten running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
I can't see
I went to install some software from a USB key on the Intel GPU machines but the 8 port KVM (Keyboard Video Mouse) switch has stopped working. When powered on it doesn't display its usual 00 and I can't select any of the computers. I have ordered a couple of replacements (working on the assumption the other one will also stop working soon). They are both around 5 years old.
I had an idea there might be a problem last week as the colors started going off on two of the machines when I select them. Thinking it might have been the cables I ordered a set of 4 cables earlier in the week, but given I can't select any computer now means its the KVM that is faulty rather than the cables.
Fortunately I can ssh into them and do the install using the command line. Its a lot simpler to drag and drop files using a GUI than it is using the command line.
Idling
The weather has been warm during the day so I have had to set the Intel GPU machines to idle during the day and have them working overnight. It is still winter in Sydney.
I am trying to keep the two Ryzen 3900X machines running 24/7 as they do as much work as the six Intel GPU machines.
I managed to get to position 50 in the Rosetta Top Participants list before NBN came around on Thursday last week. Since then my results per day have fallen dramatically.
08 August 2020
8th of August
CPU only
Running Rosetta work
Intel GPUs
Five running Rosetta work
Nvidia GPUs
Off
Raspberry Pis
Three running Rosetta work. Ten running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Software updates
As you can tell the farm is still concentrating on Rosetta work. Its currently up to 55th place in the Top Participants list. Despite the "boot hole" patching (twice) and a Debian point release during the week the recent average credit is still climbing.
More interruptions
The NBN installers are back late next week to finish off the fibre connection so that will mean a break in production while they do their bit. Hopefully that will be the last interruption for a while.
Internet connectivity has been good since switching from DSL to the NBN network. At the moment they have fibre to the building and then they're using the existing copper phone line for the last hop. The highest speed plan they offer using this method is a NBN50 plan (50Mbit downloads).
30 July 2020
30th of July
20 July 2020
Second Ryzen 3900X build
12 July 2020
12th of July
05 July 2020
5th of July
27 June 2020
27th of June
21 June 2020
20th of June
CPU only
Running Rosetta work
Intel GPUs
Running Rosetta work
Nvidia GPUs
Two running Rosetta work
Raspberry Pis
Three running Rosetta work. Ten running Einstein work.
Networking changes
I am in the process of switching to the NBN network. It should provide a higher speed internet connection. I've signed up for a 50 Mbit plan (50 Mbit download and 10 Mbit upload).
At the moment I have two DSL connections with two separate modem/routers which makes two separate local area networks. One has the x64 machines and the other has the Raspberry Pis. I will have to combine them into a single network.
Other news
I got a shipping notification for the 2nd Ryzen 3900X build. The rest of the parts should arrive next week. I already have the power supply and memory.
I've made it to 60th place on the Rosetta@home top participant list. I will be dropping back because today was rather warm (30 degrees in my computer room) and I have to reduce the heat being produced by the computers. I've set all of the Intel GPU and Nvidia GPU machines to no new tasks so they finish off the work they currently have.
07 June 2020
7th of June
CPU only
Running Rosetta work.
Intel GPUs
All running Rosetta work.
Nvidia GPUs
Two running Rosetta work.
Raspberry Pis
Two running Rosetta work. Ten running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
The farm has climbed to 73rd place in the Rosetta top participant list. I don't think I will be able to get much higher. Most of their top 100 participants are running multiple servers with lot of cores. The top one that I could look at (in 4th place) had 95 computers with 12, 24 and 40 core CPUs.
The Ryzen 3900X that is on back-order got delayed again. The problem part seems to be the motherboard. All the other parts are in stock and have been allocated to the order. The power supply and memory that I ordered from other suppliers arrived weeks ago.
30 May 2020
30th of May
For news on the Raspberry Pis see Marks Rpi Cluster
Type | Number | CPU | GPU | Memory |
CPU only | 1 | AMD Ryzen 3900X (12C/24T) | GT710 | 64GB |
Intel GPU | 6 | Intel i7-8700 (6C/12T) | Built-in | 32GB |
Nvidia GPU | 4 | AMD Ryzen 3600 (6C/12T) | GTX 1660 Ti | 32GB |
23 May 2020
23rd of May
CPU only
Running Rosetta work.
Intel GPUs
All running Rosetta work
Nvidia GPUs
Two running Rosetta work
Raspberry Pis
Two running Rosetta. Six running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
The 64GB memory kits arrived. I've installed one kit in the CPU only machine and removed the 4 sticks of 16GB that it had. They've been reinstalled into the remaining Nvidia GPU machines. That brings all the 6 core/12 thread machines up to 32GB of memory. The 64GB kits have RGB lighting which I find rather annoying.
The other CPU only machine that I ordered is delayed. The motherboard seems to have been back ordered and keeps changing dates as to when it will arrive.
2.5GbE networking
The 2.5GbE network cards arrived. They only use a PCIe gen3 x1 slot but when I plugged it into an x1 slot the kernel started spamming the console with messages about PCIe bus errors. After swapping it to a free x16 slot it worked fine, so probably something to do with how the PCIe slots are wired up.
The first card went into the proxy server and the second card is in the CPU only machine. I still need to try some speed tests between them and the 10GbE machines (the storage servers).
I used to have the gigabit ports bonded so they share the network traffic but have now switched to running the 2.5 gigabit card as primary and the motherboard (gigabit) port as an active backup.
Debian don't have a driver for the Realtek 8215. Support for the chip was added in August 2019 but the Debian firmware-realtek package is from July 2019. I ended up getting the driver directly from the Linux kernel web site.
I've ordered some more of these EDUP cards off eBay. The original seller increased the price to 69 AUD but I found another seller with them listed in GBP which worked out to 60 AUD each. I only need one more for the next CPU only machine.
Running two network ports per machine now means I need more network cables. Something else to buy.
Update: 26 May 2020
I managed to get one of them to do around 2.5Gbit. According to iperf3 they were doing 2.27Gbit. That was between a 10Gbit and through a few switches into a machine with a 2.5Gbit card. I tested in both directions.
16 May 2020
16th of May
CPU only
Running Rosetta work
Intel GPUs
All running Rosetta work
Nvidia GPUs
One running Rosetta work
Raspberry Pis
Two running Rosetta work. Six running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
The farm continues to concentrate on Rosetta work on as many cores as possible (within power limitations).
Another Ryzen 3900X build
I ordered parts for another Ryzen 3900X system as they are on sale. Its the same specs as before. The power supply has already arrived. It came from a different supplier. The remaining parts should arrive next week.
The 64GB memory kits (2 x 32GB sticks) came back into stock so I ordered two kits, one for each of the Ryzen 3900X builds. These are from yet another supplier.
The PCH fan on the current Ryzen 3900X system is not spinning. It doesn't seem to be causing any problem at the moment. Its part of the motherboard so I might have to drop the machine off for repair. I would expect the shop would have to replace the motherboard as its soldered on. I will wait until I have built the 2nd Ryzen 3900X before doing this.
2.5GbE networking
I found some EDUP network cards on eBay for 55 AUD each. They have a Realtek chip. I bought two. They are coming from China so I wouldn't be surprised if they fail after a few months or they are really 1GbE network cards. I should be able to get my money back if either happens.
I was thinking of putting one in the proxy server and another in the Ryzen 3900X build. Now that I have a second Ryzen 3900X on order I will probably put one in each of them. I will wait and see how they work before buying any more.
10 May 2020
10th of May
CPU only
The Ryzen 3900X is running Rosetta work
Intel GPUs
All running Rosetta work
Nvidia GPUs
Off
Raspberry Pis
Two running Rosetta work. Six running Einstein BRP4 work.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
Most of last fortnight was taken up with the Ryzen 3900X build, getting it going and doing a burn-in on different projects.
I spent a little time on the proxy server. I put a spare 10GbE network adapter into it which didn't make any noticeable difference. The hardware side was simple, however changing the bonding to adaptive failover took most of my time. There are examples on the internet and most of them don't work any more. Rosetta changed their project comms to HTTPS so that means we cannot cache downloads for them. In the end I went back to the dual 1GbE network adapter that I have used for the last year and bonded the ports.
2.5GbE networking
I did a bit of research into 2.5GbE network adapters. There are a few (mostly based upon Realtek chips). Intel has an i225 chip but it has a bug where the inter-packet gap is incorrect and it requires a respin of the silicon to fix. Aquanta also had one but seem to sell it for more than the 10GbE network adapters.
None of the computer stores are selling them in Australia. There are USB 3 to 2.5GbE external adapters available. I found one PCIe card on Amazon coming from the US. It was 87 AUD. There was confusion over its warranty with Amazon stating it had an "International warranty". Of the 10 reviews two people complained their card failed after 3 months.
Ryzen 3900X build
The first issue was with the thermal paste squirting over the side of the CPU which I cleaned up as best I could.
The second turned out to be the additional power connectors on the motherboard. I had plugged both of the aux 12 volt power connectors and it refused to POST. The fans wouldn't even spin up. I thought that maybe the thermal paste had shorted something out. Apparently Noctua thermal paste is non-electrically conductive. After checking one of my other Ryzen builds I unplugged the 2nd aux 12 volt power connector and that got it going.
Rather than taking pictures of the computer I thought I would share the BOINCtasks view of it running 23 Rosetta tasks. I've left one thread free for the operating system/other programs.
And how top sees the same workload.
I wanted to get a 64GB memory kit (2 x 32GB) for it and then that will allow me to put the existing memory into the Ryzen 3600's which is what it was intended for. Unfortunately the supplier selling the 64GB memory kit doesn't have any stock. Its G.Skill Trident Z DDR4-3200 memory which I haven't used before.
Another upgrade I would have liked to do was a 2.5GbE network adapter but none of the computer places seem to sell them in Australia. I could get 10GbE but they cost 180 AUD. In the mean time I will just put a spare 1GbE adapter in it and bond the ports together.
I might build another one of these because the price on the Ryzen 3900X has dropped again, in fact all of the Ryzen 3000 series seem to be on sale at the moment. They might be clearing stocks before the Ryzen 4000 desktop CPUs come out.
26 April 2020
26th of April
Intel GPUs
Running Rosetta work overnight
Nvidia GPUs
Two running Rosetta work overnight
Raspberry Pis
Two running Rosetta. Six running Einstein work.
For news on the Raspberry Pis see Marks Rpi Cluster
Parts orders
All my orders came at once. Computer parts that is. That was Noctua 140 MM fans, memory kits and a replacement X10SRi-F motherboard.
I installed two of the memory kits (2x16GB) into the two Nvidia GPU machines that are running Rosetta work, leaving them with two free memory slots. I didn't realize they had HyperX Predator originally which meant I couldn't use all 4 memory slots because the heat spreader on them is so tall it gets in the way of the CPU fan. The new kits are HyperX Fury which has a lower profile heat spreader.
I installed a Noctua fan in the front of one of the Intel GPU machines and it was louder than the one it replaced. I went back to using the original case fan. The Noctua fans run at 1200 RPM but the original fans run at 1000 RPM. I didn't try the low noise or ultra low noise adapters that the Noctua's come with because they simply slow down the fan reducing its air flow. I might try the Noctua's in the back of the case where its not so loud.
I haven't even looked at the Supermicro X10SRi-F motherboard yet.
Another build
While checking my usual PC parts online shops I noticed the Ryzen 9 3900X is on sale, so I decided I would build one as a CPU compute node.
Part | Desc |
Case | Fractal Designs Meshify C |
CPU | Ryzen 3900X |
Cooler | Noctua NH-U12S |
Disk/SSD | Samsung EVO Plus 1TB |
Graphics | ASUS GT710 2GB |
Memory | |
Motherboard | ASUS X570-P/CSM |
PSU | Seasonic Focus Gold 550 |
I didn't buy memory. I will use the two 32GB memory kits that were going to go into the Nvidia GPU machines. The GT710 is a passively cooled card but it only needs to display a desktop.
19 April 2020
19th of April
Intel GPUs
All running Rosetta work
Nvidia GPUs
Two running Rosetta
Raspberry Pis
Two running Rosetta. Six running Einstein.
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
As you can tell most of the farm is running Rosetta as much as possible trying to help out with Covid-19 research.
I have ordered six Noctua cooling fans to replace the case fans in the front of Fractal Designs ARC Midi cases. They contain the i7-8700 motherboards. They all work fine and I even have a couple of spares, however the Noctua fans have a higher airflow, are quieter and last longer.
Storage server
The first part of the storage server upgrade was to increase its disk capacity. I posted about it in March. When I did that upgrade I found the motherboard had a problem with one of the memory sockets (or the controller). I ordered a replacement motherboard but I am still waiting for it to be shipped.
The second part of the upgrade will be CPU and memory. I want to give it 128GB of memory (it supports up to 256GB). It has a Xeon E5 2620v3 CPU at the moment that I will upgrade to a v4 Xeon, which is the latest the motherboard supports.
11 April 2020
Easter Saturday 2020
Intel GPUs
All running Rosetta work
Nvidia GPUs
All off
Raspberry Pis
Two running Rosetta, Twenty running Einstein.
For news on the Raspberry Pis see Marks Rpi Cluster
Rosetta
I have all the Intel GPU machines running Rosetta, along with a couple of the Pi4's. They've been running 24/7 for the last couple of weeks. Rosetta are looking at proteins in relation to Covid-19. While they have recently had a large influx of users they can always use more computers.
Its a single thread CPU based app available for x64 machines (Linux, OSX and Windows) and they also have an app for ARM64 also known as aarch64 (Android and Linux). The Pi4's are running the aarch64 app.
BOINC 7.16
The BOINC developers have released the 7.16.5 app for Windows and 7.16.6 for Linux and OSX. I have been running 7.16.1 on all the farm for a couple of months now. I only had one issue which is listed as having been fixed, however I need to wait for Debian to make the later version available.
04 April 2020
4th of April
Intel GPUs
All running Rosetta work
Nvidia GPUs
All off
Raspberry Pis
One running Rosetta work, Six running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Replacements
After running memtest86+ on the i7-8700 that was giving memory errors it seems I have a failed stick of memory. I bought another couple of memory kits (4 x 8GB sticks) to replace all the memory in it. The failed pair were sent back under warranty. The remaining pair were removed once the replacements arrived. That got the machine back to 32GB which it needs to run some of the Rosetta work.
I have an order in for a replacement X10SRi-F motherboard that is in the storage server. It has a failed memory socket or controller. Hopefully its not the controller because that is built into the CPU. Its been delayed. I have been told it won't arrive for a fortnight but suspect it will be much longer. In the mean time the machine still works as long as I don't use the failed socket.
Project news - Rosetta
They have been overwhelmed by the number of new users (much like Folding@home). They released an updated app along with a new app they refer to as "Rosetta for Portable devices" which has both Android and Linux (aarch64) versions.
I have all six i7-8700's running Rosetta at the moment searching for various proteins to do with Covid-19. I could run the Ryzens as well but some of the work units need 1.5GB of memory each and the Ryzens only have 16GB installed. I would have to limit them to running on half the cores or upgrade the memory.
29 March 2020
29th of March
Intel GPUs
Four running Rosetta work
Nvidia GPUs
Three running Einstein gravity wave work
Raspberry Pis
Eight running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Folding@home
Its all about the Corona virus these days. Some tech commentators have suggested to run Folding@home however they have so many new users they can't generate enough work to keep up with demand.
I gave their Linux app a try. Its a stand-alone app. It gave errors on install because it uses Python 2 which is depreciated but the CPU did work. Its a multi-thread app and used 11 out of 12 threads on the machine I tried it on. It didn't find the OpenCL run-time library so wouldn't use the GPU. Its probably looking in the wrong place for it. I removed it. They need to update it for current Linux distributions which use Python 3 and current placement of the OpenCL libraries.
Rosetta@home
Another project studying the Corona virus protein, however this one is a BOINC-based project. They only have a CPU app and it uses a single thread. You can run multiple instances. They have two apps, the first is Rosetta Mini and uses around 400-500MB memory and the standard Rosetta app which can use up to 2GB memory per thread. I've been running it on the Intel GPU machines as they have 32GB. Tasks take a certain amount of time, which defaults to 8 hours but you can adjust it to more of less via the web site project preferences.
One of the Intel GPU machines failed half the work units complaining about memory errors, so once it finishes its current work I will have to have a look at it. I suspect the memory isn't seated properly for one or two sticks of memory because half of them seem to work.
GPUgrid
They've said they will also be doing some Corona virus protein studies, but don't currently have any. They have a Nvidia GPU app. I have been running some of their current work recently.
28 March 2020
X10SRi Storage Server rebuild
pool: pool1
state: ONLINE
scan: scrub repaired 0B in 0 days 00:02:09 with 0 errors on Sat Mar 28 03:59:10 2020
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca298c13b97 ONLINE 0 0 0
wwn-0x5000cca298c143ed ONLINE 0 0 0
wwn-0x5000cca298c13f04 ONLINE 0 0 0
wwn-0x5000cca298c16623 ONLINE 0 0 0
wwn-0x5000cca298c14c7f ONLINE 0 0 0
wwn-0x5000cca298c16afa ONLINE 0 0 0
wwn-0x5000cca298c152cd ONLINE 0 0 0
wwn-0x5000cca298c1431f ONLINE 0 0 0
errors: No known data errors
# df /pool1
Filesystem 1K-blocks Used Available Use% Mounted on
pool1 75325838848 162769536 75163069312 1% /pool1
15 March 2020
15th of March
Intel GPUs
One running Einstein Gravity wave work
Nvidia GPUs
Two running GPUgrid work
Raspberry Pis
Eight running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Project news - GPUgrid
They've got an experiment going with lots of work units. My GTX 1660 Ti's are talking about an hour and a half each. I'm only running two machines so I don't trip the circuit breaker. There is no shortage of work at the moment.
Einstein gravity wave work
I have been running the gravity wave work on CPU for a while and some of the frequencies are now using quite a bit of memory. So much so that if I try and run 12 on the Nvidia GPU machines that I can get 7 running and the others end up waiting for memory. They have 16GB of memory but some of the work units are using 2GB each.
I even got one of the Intel GPU machines running them because they have 32GB of memory, but they're a lot slower.
Storage server
I ordered some larger hard disks for one of the storage servers. It only has 32GB of memory so I swapped the memory out of the other one. I found that the X10SRi-F motherboard has a fault in DIMM socket C1 so I can't populate the memory as recommended. Supermicro stopped making the X10 motherboards so I might have to try getting a second-hand one to replace it. I will also see if its possible to get the motherboard repaired. In the mean time I have put the 32GB back in the sockets to the left of the CPU which work (the C1 socket is to the right of the CPU).
I was inspired by Linus (of Linus Tech Tips) who made a relatively cheap storage server using a Fractal Designs "Define" case and he managed to stuff 20 hard disks in it here: https://www.youtube.com/watch?v=FAy9N1vX76o
I wouldn't recommend going above 16 drives as the case only had 16 drive bays. He attached drives to the top and back of the case which doesn't do much for reliability. My file server with 32GB of memory is in a Fractal Design Define R2 case, an older version and it has 8 + 2 drive bays. I think 8 drives is enough for my purposes. Its currently got a SATA SSD as the boot drive and 4 x 4TB drives.
Update 17 Mar 2020
I need to make a couple of corrections to the Storage server details above.
1. The X10SRi-F motherboards are still available.
2. The case Linus used was a Fractal Designs Define 7 XL which is larger than the Define.
I still wouldn't recommend bolting drives onto the top or back of the case though, just use the drive bays that it comes with.
Meanwhile I've ordered 4 x 14TB HDD. Today ordered 3 more. The price went up $51 between my first and second orders (6 days). Oh and they have been delayed.
07 March 2020
7th of March
Intel GPUs
One running Einstein gravity wave work
Nvidia GPUs
All running Einstein gravity wave work
Raspberry Pis
Eight running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Project news - Seti
Sad news this week. They announced they are going into "hibernation" from the 31st of March. That generally means the end of the project.
Other news
I've been doing some GPUgrid work on a couple of the Nvidia GPU machines after we got updated drivers via the Debian buster-backports repo. I also did some Milkyway work. Since then I have updated all of the Nvidia GPU machines to the 440.59 drivers.
The Intel "neo" drivers have finally made it into a Debian repo. I have installed them on one of the Intel GPU machines but haven't tried any GPU crunching on it yet. Generally it slows the whole CPU down so it not a good idea to use both at the same time.
I was looking at building a Ryzen 3950x machine, however with the news that Seti is closing down have decided not to proceed with it. Einstein CPU work is very demanding on the memory system and so it wouldn't suit their app.
And it more news this week there is yet another security bug with Intel's ME module and they don't think they can correct it. Supposedly the 10th generation CPU's aren't effected but I can't see why anyone would buy them given all the security flaws.
23 February 2020
23rd of February
Intel GPUs
All off
Nvidia GPUs
Had all of then doing Einstein and Milkyway work
Raspberry Pis
All running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Other news
I had a couple of wins this week. Firstly I managed to get Debian Buster to work on the Nvidia GPU machines and second I got some work from GPUgrid after almost a year of not processing anything from them.
While it may seem to be a repeat of my last post, after getting the upgrade to Buster working I went and wiped the machine and re-did it. That failed again. Then the 5.4.13 kernel got released which seems to have got it going again.
Debian Buster
There were two issues with Debian Buster.
The 4.19 kernel locks up at boot time due to needing entrophy. That was addressed by none other than Linus Torvalds in the 5.4.8 kernel. The buster-backports repo has the 5.4.13 kernel as I write this.
The other was the need to explicitly install the nvidia-driver package to be able to display in the monitors native resolution otherwise the desktop its stuck at 1024x768. Once installed it will display properly. This didn't need to be done under Stretch.
GPUgrid work
GPUgrid have been trying different apps in order to get one that would work with modern graphics cards (ie the Turing architecture). It has taken them almost a year to sort out. They now have a large batch of work that uses the new app.
The download was huge, one of the many files is 100MB but after you have got it on the machine it doesn't need to download them again. Once running they immediately take the GPU up to 99% load and its power cap (120 watts on my GTX1660Ti cards). Run time is around 90 minutes on the GTX1660Ti. You'll need a CUDA 10.0 or a later driver to be able to run them.
Seti woes
Dispite the recent fund-raising drive to replace the hard disks in their database server they're still having issues with supplying enough work due to the database not fitting into memory. It was suggest years ago that they should shorten the deadlines but they haven't tried that yet. Generally I can't get work from them so have been concentrating on other projects.
Also in the news this week was Breakthrough Listen released 2PB of data, most of which we don't process at all. They're being handled by the Seti institute which is not related to the Seti@home project.
16 February 2020
Mid February
Intel GPUs
All off. Have been doing bursts of work
Nvidia GPUs
All off. Had 3 doing bursts of Seti and Milkyway work
Raspberry Pi
Running bursts of work as weather permits
For news on the Raspberry Pis see Marks Rpi Cluster
Buster and Nvidia-driver
After updating one of the Nvidia GPU machines to Debian Buster and trying all sorts of things to get it to display properly (it would only do 1024x768 resolution) I have finally got it going.
Under Stretch (the release prior) I install nvidia-kernel-dkms, nvidia-opencl-icd and nvidia-smi and it installs the glx driver as well as the CUDA and OpenCL components. Not so under Buster. You have to install nvidia-driver in order to get the glx driver. It seems the packaging has changed with Buster for some reason.
Now that I have worked that out I can now look at upgrading the other machines. I raised a bug with Debian in September 2019.
Weather news
Or Weather or not to run. We've gone from bush fires to floods. The weather is presenting challenges hence the bursts of work when I can. Typically it will be just a few hours so I will run Seti and Milkyway as their work units are fairly short (an hour or two on the CPU and minutes on the GPU). If it looks like it will be cooler for longer I will run Einstein CPU work which takes 7 to 14 hours depending on how many I run at once.
27 January 2020
27th of January
Intel GPUs
All running Asteroids and Seti work
Nvidia GPUs
Two running Einstein gravity wave work. Two off.
Raspberry Pis
Seven running Einstein BRP4 work
For news on the Raspberry Pis see Marks Rpi Cluster
Weather
Its been hot and humid and a few drops of rain. Too hot to run the farm. Today was a bit of a break in the weather so I have most of the farm running. The rest of this week is looking like it will be hot and humid so I don't expect to get much work done.
Project news - Seti
The project increased the allowed number of work units per computer. That promptly blew out the size of the database with the number of work units "out in the field". Last week they limited the result creation rate in order to get it back down in size. The project needs to fit the entire database table into memory and the machine already has the maximum amount of memory that it can hold. They resized the table last week but we still seem to have issues getting work.
In addition to database issues Seti@home did a fund raiser to replace the 120 x 2TB disk drives with 26 x 16TB disk drives (two are spares). It was supposed to run for 2 weeks but met its target in the first week.
11 January 2020
New year. New decade
Intel GPUs
All off
Nvidia GPUs
All running Einstein gravity wave work (on CPU)
Raspberry Pis
Two running, the rest are off.
New year
As another year ticks by and we're still crunching. Recently I've been looking at an industrial unit but the prices are too high in Sydney. Maybe something further away.
The bush fires keep effecting the air quality. The air quality and temperatures effect the ability to run the computers. Fortunately I haven't been directly impacted by the fires.
Funding
I mentioned in my 24th of November post that I made an offer to fund the SuperHost idea but I had no feedback from any of the key people in the BOINC world. I still haven't had any response. I have some money that I put aside for such development work but seeing as there is no interest in SuperHost I may as well use it for another purpose. I have two other things I was interested in:
1. A monitoring tool for a farm of BOINC machines that would highlight which machines need attention. The idea is to use it in conjunction with BOINCtasks. I did run the idea past Fred Efmer (the author of BT) but he felt that BT was able to issue alerts and wasn't inclined to do further development.
2. Power9 CPU support. Now these aren't exactly your high end desktop, they are more of a server grade CPU. Raptor computing make desktop and server grade machines along with IBM and probably a few more companies. The US has a couple of clusters (Summit and Sierra) running them. Debian Buster can run on them so there is already Linux and BOINC support. That just leaves the project apps needing to be ported to the ppc64el architecture as its known.
I expect there are more people who would be interested in point 1 than there are people who run a Power9 CPU so I am more inclined towards point 1.