30 December 2023

30th of December 2023

Farm status
CPU only
Off

Nvidia GPUs
Off

Raspberry Pis
On and off depending on how hot it gets.


Blackout killed UPS
We had a blackout a fortnight ago. That cost me a CyberPower 650VA UPS that only had to power the internet cable modem and ISP's router. It appears to have totally killed the battery inside and it wouldn't power on. I have purchased another one (they are cheap) but if the power goes off for more than an hour it seems its too stupid to shut itself down and totally drains the battery. It has a lead-acid battery so if they get below about 50% that is the end of the battery.

Fortunately most things were off or in the case of the raspberries were idle.


Debian 12.3 point release
Following the issues with the Debian 12.3 point release they decided to stop the release. A day later they did a 12.4 point release which included an updated kernel that didn't have the ext4 corruption issue. They then followed that up with another kernel update a few days later due to issues with WiFi drivers.

Raspberry Pi OS still seems to be running an effected kernel that has the ext4 data corruption issue.


Parts orders
I mentioned in my last post about getting Contact Frames (or Secure Frames) for my Ryzen 7900's. I decided to get Thermal Paste Guards instead. I've ordered them so should be able to complete the builds in the new year.


Altra server issue
I decided to upgrade my Ampere Altra to debian bookworm. Unfortunately the installer fails with a "grub install dummy" failed message. This seems to be related to booting in UEFI mode, which it seems to be booting in, so I am not sure if I need to create legacy boot media for it. I'll make another attempt when the house is empty due to the noise that the server makes.

10 December 2023

10th of December

Farm status
CPU only
Off

Nvidia GPUs
Off

Raspberry Pis
Running overnight.

For more information on the Raspberry Pis see Marks Rpi Cluster

 

File corruption bugs
Debian discovered the kernel they are pushing out in their 12.3 point release (kernel 6.1.64) has an ext4 file system corruption bug. It was fixed in the 6.1.66 kernel but Debian haven't updated to it yet.

OpenZFS also has a file corruption bug which is fixed with OpenZFS 2.1.14 (or 2.2.2 if you are running a 2.2 version). Strangely Debian have put OpenZFS 2.1.14 into the bookworm-backports repo. One would have thought they would include it in the Debian 12.3 point release that came out on the 9th of December or offered it as a security fix for bookworm.

The bad news is I have a few servers with the effected version of OpenZFS. However to get it one needs to be rewriting files on the disks too fast for the underlying device(s) which I don't do. I have applied the Debian 12.3 point release to a number of machines so I likely have the ext4 issue. I haven't seen any problems so far, so maybe it is only an issue under some combination of conditions.


Other news
I still haven't assembled the Ryzen 7900 machines. I need to get a couple of Contact Frames (sometimes called Secure Frames) for the CPU socket before I install them.

I went on a cruise for a few weeks so the farm was off during that period. Most of the farm is off due to hot weather at the moment. Yesterday hit 39 degrees C. Unfortunately this is one of the joys of an Australian summer coupled with global warming.