30 December 2023

30th of December 2023

Farm status
CPU only
Off

Nvidia GPUs
Off

Raspberry Pis
On and off depending on how hot it gets.


Blackout killed UPS
We had a blackout a fortnight ago. That cost me a CyberPower 650VA UPS that only had to power the internet cable modem and ISP's router. It appears to have totally killed the battery inside and it wouldn't power on. I have purchased another one (they are cheap) but if the power goes off for more than an hour it seems its too stupid to shut itself down and totally drains the battery. It has a lead-acid battery so if they get below about 50% that is the end of the battery.

Fortunately most things were off or in the case of the raspberries were idle.


Debian 12.3 point release
Following the issues with the Debian 12.3 point release they decided to stop the release. A day later they did a 12.4 point release which included an updated kernel that didn't have the ext4 corruption issue. They then followed that up with another kernel update a few days later due to issues with WiFi drivers.

Raspberry Pi OS still seems to be running an effected kernel that has the ext4 data corruption issue.


Parts orders
I mentioned in my last post about getting Contact Frames (or Secure Frames) for my Ryzen 7900's. I decided to get Thermal Paste Guards instead. I've ordered them so should be able to complete the builds in the new year.


Altra server issue
I decided to upgrade my Ampere Altra to debian bookworm. Unfortunately the installer fails with a "grub install dummy" failed message. This seems to be related to booting in UEFI mode, which it seems to be booting in, so I am not sure if I need to create legacy boot media for it. I'll make another attempt when the house is empty due to the noise that the server makes.

10 December 2023

10th of December

Farm status
CPU only
Off

Nvidia GPUs
Off

Raspberry Pis
Running overnight.

For more information on the Raspberry Pis see Marks Rpi Cluster

 

File corruption bugs
Debian discovered the kernel they are pushing out in their 12.3 point release (kernel 6.1.64) has an ext4 file system corruption bug. It was fixed in the 6.1.66 kernel but Debian haven't updated to it yet.

OpenZFS also has a file corruption bug which is fixed with OpenZFS 2.1.14 (or 2.2.2 if you are running a 2.2 version). Strangely Debian have put OpenZFS 2.1.14 into the bookworm-backports repo. One would have thought they would include it in the Debian 12.3 point release that came out on the 9th of December or offered it as a security fix for bookworm.

The bad news is I have a few servers with the effected version of OpenZFS. However to get it one needs to be rewriting files on the disks too fast for the underlying device(s) which I don't do. I have applied the Debian 12.3 point release to a number of machines so I likely have the ext4 issue. I haven't seen any problems so far, so maybe it is only an issue under some combination of conditions.


Other news
I still haven't assembled the Ryzen 7900 machines. I need to get a couple of Contact Frames (sometimes called Secure Frames) for the CPU socket before I install them.

I went on a cruise for a few weeks so the farm was off during that period. Most of the farm is off due to hot weather at the moment. Yesterday hit 39 degrees C. Unfortunately this is one of the joys of an Australian summer coupled with global warming.


28 October 2023

Hiatus

I've had the larger crunchers powered off to try and save my electricity bill. It doesn't seem to have had much effect as the last bill was almost $900 for the quarter.

This week it was cool for a few days so I got all of the x64 machines going. For the CPU only machines (a pair of Ryzen 5900X) I ran a few hours FGRP5 work. Most of the Einstein work is now GPU-based so I fired up the GPU crunchers (four Ryzen 3600 with a GTX3060Ti in each) and had them running for a day. The farm is back off as things warm up again.

The Raspberry Pis continue to crunch. For more information on them see Marks Rpi Cluster

 

13 August 2023

13th of August

Suffering from "bill shock" as they call it. My last electricity bill was over $800 for the quarter, so I haven't had the farm running apart from the Raspberry Pi's. See Marks Rpi Cluster for details on it.

I have even taken to unplugging the 2.5G network switches to try and reduce power consumption, although I don't think they use much power, its the Ryzen 5900X and the GPU machines that consume the most. The Chia storage servers also contribute to my electricity use although I haven't plugged a meter in to see how much they actually use. The power board is under a wire rack that they are sitting on so its difficult to get to.

I received the parts for the Ryzen 7900 builds but haven't got to them yet. They are slated to replace the 5900X machines. The good news is the Ryzen 7900 uses less power than the 5900X. Also I don't need to use (or power) a discreet graphics card. Apparently the CPU features list is too long for BOINC, so we need an updated BOINC client and updated server software, which is one reason why I haven't replaced them yet.


25 June 2023

25th of June

Farm status
CPU only
Running Einstein BRP4G work overnight.

Nvidia GPUs
Off

Raspberry Pis
Running Asteroids@home and Einstein.


New CPU only build
I've ordered parts to make a couple of CPU only machines to replace the Ryzen 5900X's. The new ones are Ryzen 7900, 64GB of DDR5-5200 memory, 2TB WD Black M.2 and an ASUS Prime B650 motherboard. I'll reuse the existing case and power supply.

The main reason for the upgrade is they have a 65 watt TDP, which should reduce my power bills a little. I was originally looking at the Ryzen 7900X but they have a 170 watt TDP which is more than the current machines. I expect it will take a week or two for parts to arrive.

I was going to get my regular online shop to build it but when I added an after market CPU cooler their website decided to charge me another $50 for assembly. I complained but they were unwilling to reduce the build price. It already had a CPU cooler so its not really an extra part to assemble. In the end I decided to assemble it myself and save the cost of a new case and power supply as well. This will be my first AM5 build.


Milkyway no longer using GPU
The Milkyway project have finished their research using the Separation GPU app and decided to stop using it. They will publish their findings in due course. They still have a multi-threaded Nbody CPU app (it uses up to 16 cores). That means they only have a CPU based app so I added Milkyway to the CPU only crunchers and removed the project from the Nvidia GPU machines.

12 June 2023

12th of June

Farm status
CPU Only
Two Ryzen 5900X running Einstein work.

Nvidia GPUs
Off.

Raspberry Pis
Running Einstein BRP4 work.

Debian 12 (Bookworm) release
The Debian project released their latest version of the operating system. Early signs were promising when I tested the beta version. At the moment I can't install the Nvidia drivers (which worked fine on the beta). There were 150 known bugs when they released on the 10th of June. The CPU only machines have been updated without any issues. I think the problem is the drivers are still listed as part of the testing release as a search for package nvidia-kernel-dkms on debian.org lists the (older) Bullseye version as being stable, however we are now on Bookworm.

Ntp is no longer included, they've switched to using systemd-timesyncd instead. If you try to install ntp it will install ntpsec instead. I was already using ntpsec on some machines so switching to it wasn't an issue for me.

Pi Hole doesn't work. It comes up and says Bookworm is an unsupported operating system when you try to install so we will need them to provide an updated version.

Raspberry Pi OS despite using the 6.1 kernel that Bookworm is using is still based on Debian Bullseye, so it needs updating.


Electricity prices
Electricity prices are going up on the 1st of July by approx 20-25% so I probably won't be using the Nvidia GPU machines much. I still need to organise replacements for the Ryzen 5900's as well. I think the replacements should use less power.


Update 13 June 2023
Pi Hole has been updated and now supports Debian 12 (Bookworm).


Update 17 June 2023
I worked out why the nvidia-driver wouldn't install. It seems parts of it are in the non-free archive, which I had selected via /etc/apt/sources.list, but other parts are in the contrib archive. It will install if you have both listed but not if you only have one of the them.

07 May 2023

7th of May

Farm status
CPU only
Both Ryzen 5900X's doing Einstein and Universe@home work in the mornings

Nvidia GPUs
Off

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
Its still warm during the day but nights are getting cooler as its Autumn in Sydney at the moment. I have been running the Ryzen 5900's first thing in the morning so they can warm the place up before the sun takes over.

I am waiting for the larger DDR5 memory modules to become available (and the prices to drop) before starting a couple of Ryzen 7900 (non-X) builds. My Ryzen 5900X machines have 4 x 16GB memory modules but with DDR5 they don't recommend using 4 sticks of memory as they have timing issues, they recommend either dropping the memory speed or only using 2 sticks.

I saw some tech news items talking about Ryzen 7900X3D's catching fire, with reports MSI has already issued a BIOS update to prevent overclocking. No news yet if this issue also effects the other CPU's in the Ryzen 7000 series. As with new tech (new CPU's and DDR5 memory) it is best to wait until they iron out the issues before buying.

10 April 2023

Easter weekend 2023

After a 4 month hiatus we're back to crunching. The main issue was with the weather being hot and I won't use air conditioning. My discounted electricity rate also went from a 21% discount down to 5% and the rates also went up around 20% so I am being frugal with my crunching.

Since I was gone the Pi's have mostly been running, but even they had a break for almost a month. If you want to read about them see Marks Rpi Cluster

 

Current config
As of writing (10th of April 2023) the farm consists of:
2 x Ryzen 5900X machines
1 x Ampere Altra
4 x Ryzen 3600 with RTX 3060Ti GPUs
and Marks Rpi Cluster.

 

Future upgrades
Last time I posted I was looking at updating the Ryzen 5900X machines to Ryzen 7900X machines. Since then AMD have released the non-X CPUs which are faster than the 5900X but have a 65 watt TDP (they actually use more). My current thinking is maybe getting the non-X CPU's instead, but it still needs a new motherboard for the AM5 socket and DDR5 memory which isn't cheap.

Nvidia released their 4000 series of GPU's which are disappointing. Apart from using much more power the pricing is also somewhat higher so I will probably skip this entire generation of GPU's and see what comes next.