12 July 2020

12th of July

Farm status
CPU only
Running Rosetta work

Intel GPUs
Running Rosetta work

Nvidia GPUs
Off. Had two running Rosetta during the week

Raspberry Pis
Three running Rosetta. Ten running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster 


Other news
The farm has been running Rosetta flat out as they continue their Covid-19 research. When it rains I have to shut the window and that means the heat builds up so I need to reduce the number of machines running which is why the Nvidia GPU machines are off quite often. The weather forecast for next week has rain for the whole week, heavy at times.

With the electricians coming next week I need to clear some space around power and network points around the house so haven't had a chance to look at the Ryzen 3900X build or the storage server. I had to adjust one of the shelving racks that sits above one of the power and network points which means moving computers around. They will also need access to the roof space so that means clearing access, one of which has a pile of (empty) suitcases in the way.

I will need to power everything down so all of the machines have to finish off any work they have. With Rosetta that can take up to 16 hours.


ARM based server
One of the ideas I am looking at is an ARM based server for compute purposes now that ARM is getting more support from developers. Both Ampere Computing and Marvell have new multi-core chips. The Ampere Altra boasts 80 cores and the Marvell Thunder X3 has 96 cores/384 threads. These are aimed at the data center market at the moment although usually its a matter of time before they make a workstation version.

Its a bit early (and expensive) to get one at the moment but something to keep an eye on developments. Both of these support DDR4-3200 memory, have disk/SSD support and PCIe expansion. The number one position in the Top500 Supercomputer list as of June 2020 is the Fujitsu Fugaku which is ARM based.

05 July 2020

5th of July

Farm status
CPU only
Running Rosetta work

Intel GPUs
Running Rosetta work

Nvidia GPUs
Two running Rosetta work

Raspberry Pis
Three running Rosetta. Ten running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster 


Pauses
We had a couple of pauses in computing since my last update. One due to the weather and the other due to electrical work.

The electricians need to replace network cabling in the walls/roof as its stopped working. The network cable tester thinks about half the wires are broken or disconnected and that was after replacing the sockets. The original cabling was done about 20 years ago and was Cat 5e so it will get updated to Cat 6a in the process.


Repairs and Outstanding work
I had to replace a case fan in one of the Intel GPU machines as it was making a fair bit of noise. The machine next to it is also loud so I need to replace it as well. Its a fairly quick fix but I need to get it to finish off its current work and that takes up to 16 hours.

The second Ryzen 3900X build is still sitting in boxes waiting to be assembled. Given its the weekend I might get time to look at it today, depending on family commitments.

The replacement server motherboard is still waiting to be installed. Its on the bottom of my list at the moment as the existing server is still working even with the memory in the wrong slots.


27 June 2020

27th of June

Farm status
CPU only
Running Rosetta work

Intel GPUs
Running Rosetta work

Nvidia GPUs
Two running Rosetta work

Raspberry Pis
Three running Rosetta and 10 running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster 


Rosetta news
Rosetta announced via twitter that they have created an antiviral protein. To quote their tweet on the 26th of June:

"We have some BIG NEWS: Researchers @UWproteindesign have succeeded in creating antiviral proteins that neutralize the new coronavirus in the lab. (These experimental drugs are being optimized for animal trials now)"

Most of last week they had little or no work so the farm was basically idle, possibly due to completing that task. More work has been forthcoming so the farm is running them now.


Other news
Most of this week was occupied getting the two networks combined into one new NBN connection. I ended up reinstalling the Raspberry Pis due to firewall issues on the Pi and not having a screen and keyboard so I couldn't access them. The Intel and AMD machines were easier because they have a screen and keyboard.

SpeedTest says I can get 45 Mbit downloads and 10 Mbit uploads now. File transfers are quicker but I don't feel they are 5 times faster than before.

21 June 2020

20th of June

Farm status
CPU only
Running Rosetta work

Intel GPUs
Running Rosetta work

Nvidia GPUs
Two running Rosetta work

Raspberry Pis
Three running Rosetta work. Ten running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster 


Networking changes
I am in the process of switching to the NBN network. It should provide a higher speed internet connection. I've signed up for a 50 Mbit plan (50 Mbit download and 10 Mbit upload).

At the moment I have two DSL connections with two separate modem/routers which makes two separate local area networks. One has the x64 machines and the other has the Raspberry Pis. I will have to combine them into a single network.


Other news
I got a shipping notification for the 2nd Ryzen 3900X build. The rest of the parts should arrive next week. I already have the power supply and memory.

I've made it to 60th place on the Rosetta@home top participant list. I will be dropping back because today was rather warm (30 degrees in my computer room) and I have to reduce the heat being produced by the computers. I've set all of the Intel GPU and Nvidia GPU machines to no new tasks so they finish off the work they currently have.

07 June 2020

7th of June

Farm status
CPU only
Running Rosetta work.

Intel GPUs
All running Rosetta work.

Nvidia GPUs
Two running Rosetta work.

Raspberry Pis
Two running Rosetta work. Ten running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
The farm has climbed to 73rd place in the Rosetta top participant list. I don't think I will be able to get much higher. Most of their top 100 participants are running multiple servers with lot of cores. The top one that I could look at (in 4th place) had 95 computers with 12, 24 and 40 core CPUs.

The Ryzen 3900X that is on back-order got delayed again. The problem part seems to be the motherboard. All the other parts are in stock and have been allocated to the order. The power supply and memory that I ordered from other suppliers arrived weeks ago.

30 May 2020

30th of May

Farm status
CPU only
Running Rosetta work.

Intel GPUs
All running Rosetta work.

Nvidia GPUs
Two running Rosetta overnight.

Raspberry Pis
Two running Rosetta. Six running Einstein.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
I made it into the top 100 participants for Rosetta. As of this morning I was in 86th place.

The extra network cables arrived so I have used some on the machines that have multiple network cards. They now have two connections into different switches. I also had a 1M cable running between two switches and it was pulling one end so that got replaced by a longer cable.

I had a look around for some simple 2.5GbE switches and managed to only find one from D-Link. Most others have what they call multi-gig ports where it supports 1, 2.5, 5 or 10GbE speeds. The switches I am using at the moment have 8 x 1GbE ports and 2 multi-gig ports, one of which gets used as the up-link. You think there would be something like a 4 x 2.5GbE ports with a high speed up-link, or even an 8 port version.


Farm makeup
At the moment the compute portion of the farm consists of:
Type Number CPU GPU Memory
CPU only 1 AMD Ryzen 3900X (12C/24T) GT710 64GB
Intel GPU 6 Intel i7-8700 (6C/12T) Built-in 32GB
Nvidia GPU 4 AMD Ryzen 3600 (6C/12T) GTX 1660 Ti 32GB

That gives a total of 72 cores/144 threads and 384GB of memory. There is another Ryzen 3900X on order.

23 May 2020

23rd of May

Farm status
CPU only
Running Rosetta work.

Intel GPUs
All running Rosetta work

Nvidia GPUs
Two running Rosetta work

Raspberry Pis
Two running Rosetta. Six running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
The 64GB memory kits arrived. I've installed one kit in the CPU only machine and removed the 4 sticks of 16GB that it had. They've been reinstalled into the remaining Nvidia GPU machines. That brings all the 6 core/12 thread machines up to 32GB of memory. The 64GB kits have RGB lighting which I find rather annoying.

The other CPU only machine that I ordered is delayed. The motherboard seems to have been back ordered and keeps changing dates as to when it will arrive.


2.5GbE networking
The 2.5GbE network cards arrived. They only use a PCIe gen3 x1 slot but when I plugged it into an x1 slot the kernel started spamming the console with messages about PCIe bus errors. After swapping it to a free x16 slot it worked fine, so probably something to do with how the PCIe slots are wired up.

The first card went into the proxy server and the second card is in the CPU only machine. I still need to try some speed tests between them and the 10GbE machines (the storage servers).

I used to have the gigabit ports bonded so they share the network traffic but have now switched to running the 2.5 gigabit card as primary and the motherboard (gigabit) port as an active backup.

Debian don't have a driver for the Realtek 8215. Support for the chip was added in August 2019 but the Debian firmware-realtek package is from July 2019. I ended up getting the driver directly from the Linux kernel web site.

I've ordered some more of these EDUP cards off eBay. The original seller increased the price to 69 AUD but I found another seller with them listed in GBP which worked out to 60 AUD each. I only need one more for the next CPU only machine.

Running two network ports per machine now means I need more network cables. Something else to buy.

Update: 26 May 2020
I managed to get one of them to do around 2.5Gbit. According to iperf3 they were doing 2.27Gbit. That was between a 10Gbit and through a few switches into a machine with a 2.5Gbit card. I tested in both directions.

16 May 2020

16th of May

Farm status
CPU only
Running Rosetta work

Intel GPUs
All running Rosetta work

Nvidia GPUs
One running Rosetta work

Raspberry Pis
Two running Rosetta work. Six running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
The farm continues to concentrate on Rosetta work on as many cores as possible (within power limitations).


Another Ryzen 3900X build
I ordered parts for another Ryzen 3900X system as they are on sale. Its the same specs as before. The power supply has already arrived. It came from a different supplier. The remaining parts should arrive next week.

The 64GB memory kits (2 x 32GB sticks) came back into stock so I ordered two kits, one for each of the Ryzen 3900X builds. These are from yet another supplier.

The PCH fan on the current Ryzen 3900X system is not spinning. It doesn't seem to be causing any problem at the moment. Its part of the motherboard so I might have to drop the machine off for repair. I would expect the shop would have to replace the motherboard as its soldered on. I will wait until I have built the 2nd Ryzen 3900X before doing this.


2.5GbE networking
I found some EDUP network cards on eBay for 55 AUD each. They have a Realtek chip. I bought two. They are coming from China so I wouldn't be surprised if they fail after a few months or they are really 1GbE network cards. I should be able to get my money back if either happens.

I was thinking of putting one in the proxy server and another in the Ryzen 3900X build. Now that I have a second Ryzen 3900X on order I will probably put one in each of them. I will wait and see how they work before buying any more.

10 May 2020

10th of May

Farm status
CPU only
The Ryzen 3900X is running Rosetta work

Intel GPUs
All running Rosetta work

Nvidia GPUs
Off

Raspberry Pis
Two running Rosetta work. Six running Einstein BRP4 work.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
Most of last fortnight was taken up with the Ryzen 3900X build, getting it going and doing a burn-in on different projects.

I spent a little time on the proxy server. I put a spare 10GbE network adapter into it which didn't make any noticeable difference. The hardware side was simple, however changing the bonding to adaptive failover took most of my time. There are examples on the internet and most of them don't work any more. Rosetta changed their project comms to HTTPS so that means we cannot cache downloads for them. In the end I went back to the dual 1GbE network adapter that I have used for the last year and bonded the ports.


2.5GbE networking
I did a bit of research into 2.5GbE network adapters. There are a few (mostly based upon Realtek chips). Intel has an i225 chip but it has a bug where the inter-packet gap is incorrect and it requires a respin of the silicon to fix. Aquanta also had one but seem to sell it for more than the 10GbE network adapters.

None of the computer stores are selling them in Australia. There are USB 3 to 2.5GbE external adapters available. I found one PCIe card on Amazon coming from the US. It was 87 AUD. There was confusion over its warranty with Amazon stating it had an "International warranty". Of the 10 reviews two people complained their card failed after 3 months.

Ryzen 3900X build

In my previous post I mentioned the Ryzen 3900X build that I was going to do. All the parts arrived and I put it all together. I had a couple of issue.

The first issue was with the thermal paste squirting over the side of the CPU which I cleaned up as best I could.

The second turned out to be the additional power connectors on the motherboard. I had plugged both of the aux 12 volt power connectors and it refused to POST. The fans wouldn't even spin up. I thought that maybe the thermal paste had shorted something out. Apparently Noctua thermal paste is non-electrically conductive. After checking one of my other Ryzen builds I unplugged the 2nd aux 12 volt power connector and that got it going.

Rather than taking pictures of the computer I thought I would share the BOINCtasks view of it running 23 Rosetta tasks. I've left one thread free for the operating system/other programs.









And how top sees the same workload.
















I wanted to get a 64GB memory kit (2 x 32GB) for it and then that will allow me to put the existing memory into the Ryzen 3600's which is what it was intended for. Unfortunately the supplier selling the 64GB memory kit doesn't have any stock. Its G.Skill Trident Z DDR4-3200 memory which I haven't used before.

Another upgrade I would have liked to do was a 2.5GbE network adapter but none of the computer places seem to sell them in Australia. I could get 10GbE but they cost 180 AUD. In the mean time I will just put a spare 1GbE adapter in it and bond the ports together.

I might build another one of these because the price on the Ryzen 3900X has dropped again, in fact all of the Ryzen 3000 series seem to be on sale at the moment. They might be clearing stocks before the Ryzen 4000 desktop CPUs come out.

26 April 2020

26th of April

Farm status
Intel GPUs
Running Rosetta work overnight

Nvidia GPUs
Two running Rosetta work overnight

Raspberry Pis
Two running Rosetta. Six running Einstein work.

For news on the Raspberry Pis see Marks Rpi Cluster


Parts orders
All my orders came at once. Computer parts that is. That was Noctua 140 MM fans, memory kits and a replacement X10SRi-F motherboard.

I installed two of the memory kits (2x16GB)  into the two Nvidia GPU machines that are running Rosetta work, leaving them with two free memory slots. I didn't realize they had HyperX Predator originally which meant I couldn't use all 4 memory slots because the heat spreader on them is so tall it gets in the way of the CPU fan. The new kits are HyperX Fury which has a lower profile heat spreader.

I installed a Noctua fan in the front of one of the Intel GPU machines and it was louder than the one it replaced. I went back to using the original case fan. The Noctua fans run at 1200 RPM but the original fans run at 1000 RPM. I didn't try the low noise or ultra low noise adapters that the Noctua's come with because they simply slow down the fan reducing its air flow. I might try the Noctua's in the back of the case where its not so loud.

I haven't even looked at the Supermicro X10SRi-F motherboard yet.


Another build
While checking my usual PC parts online shops I noticed the Ryzen 9 3900X is on sale, so I decided I would build one as a CPU compute node.

Part Desc
Case Fractal Designs Meshify C
CPU Ryzen 3900X
Cooler Noctua NH-U12S
Disk/SSD Samsung EVO Plus 1TB
Graphics ASUS GT710 2GB
Memory
Motherboard ASUS X570-P/CSM
PSU Seasonic Focus Gold 550

I didn't buy memory. I will use the two 32GB memory kits that were going to go into the Nvidia GPU machines. The GT710 is a passively cooled card but it only needs to display a desktop.

19 April 2020

19th of April

Farm status
Intel GPUs
All running Rosetta work

Nvidia GPUs
Two running Rosetta

Raspberry Pis
Two running Rosetta. Six running Einstein.

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
As you can tell most of the farm is running Rosetta as much as possible trying to help out with Covid-19 research.

I have ordered six Noctua cooling fans to replace the case fans in the front of Fractal Designs ARC Midi cases. They contain the i7-8700 motherboards. They all work fine and I even have a couple of spares, however the Noctua fans have a higher airflow, are quieter and last longer.


Storage server
The first part of the storage server upgrade was to increase its disk capacity. I posted about it in March. When I did that upgrade I found the motherboard had a problem with one of the memory sockets (or the controller). I ordered a replacement motherboard but I am still waiting for it to be shipped.

The second part of the upgrade will be CPU and memory. I want to give it 128GB of memory (it supports up to 256GB). It has a Xeon E5 2620v3 CPU at the moment that I will upgrade to a v4 Xeon, which is the latest the motherboard supports.

11 April 2020

Easter Saturday 2020

Farm status
Intel GPUs
All running Rosetta work

Nvidia GPUs
All off

Raspberry Pis
Two running Rosetta, Twenty running Einstein.

For news on the Raspberry Pis see Marks Rpi Cluster


Rosetta
I have all the Intel GPU machines running Rosetta, along with a couple of the Pi4's. They've been running 24/7 for the last couple of weeks. Rosetta are looking at proteins in relation to Covid-19. While they have recently had a large influx of users they can always use more computers.

Its a single thread CPU based app available for x64 machines (Linux, OSX and Windows) and they also have an app for ARM64 also known as aarch64 (Android and Linux). The Pi4's are running the aarch64 app.


BOINC 7.16
The BOINC developers have released the 7.16.5 app for Windows and 7.16.6 for Linux and OSX. I have been running 7.16.1 on all the farm for a couple of months now. I only had one issue which is listed as having been fixed, however I need to wait for Debian to make the later version available.


04 April 2020

4th of April

Farm status
Intel GPUs
All running Rosetta work

Nvidia GPUs
All off

Raspberry Pis
One running Rosetta work, Six running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Replacements
After running memtest86+ on the i7-8700 that was giving memory errors it seems I have a failed stick of memory. I bought another couple of memory kits (4 x 8GB sticks) to replace all the memory in it. The failed pair were sent back under warranty. The remaining pair were removed once the replacements arrived. That got the machine back to 32GB which it needs to run some of the Rosetta work.

I have an order in for a replacement X10SRi-F motherboard that is in the storage server. It has a failed memory socket or controller. Hopefully its not the controller because that is built into the CPU. Its been delayed. I have been told it won't arrive for a fortnight but suspect it will be much longer. In the mean time the machine still works as long as I don't use the failed socket.


Project news - Rosetta
They have been overwhelmed by the number of new users (much like Folding@home). They released an updated app along with a new app they refer to as "Rosetta for Portable devices" which has both Android and Linux (aarch64) versions.

I have all six i7-8700's running Rosetta at the moment searching for various proteins to do with Covid-19. I could run the Ryzens as well but some of the work units need 1.5GB of memory each and the Ryzens only have 16GB installed. I would have to limit them to running on half the cores or upgrade the memory.

29 March 2020

29th of March

Farm status
Intel GPUs
Four running Rosetta work

Nvidia GPUs
Three running Einstein gravity wave work

Raspberry Pis
Eight running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Folding@home
Its all about the Corona virus these days. Some tech commentators have suggested to run Folding@home however they have so many new users they can't generate enough work to keep up with demand.

I gave their Linux app a try. Its a stand-alone app. It gave errors on install because it uses Python 2 which is depreciated but the CPU did work. Its a multi-thread app and used 11 out of 12 threads on the machine I tried it on. It didn't find the OpenCL run-time library so wouldn't use the GPU. Its probably looking in the wrong place for it. I removed it. They need to update it for current Linux distributions which use Python 3 and current placement of the OpenCL libraries.


Rosetta@home
Another project studying the Corona virus protein, however this one is a BOINC-based project. They only have a CPU app and it uses a single thread. You can run multiple instances. They have two apps, the first is Rosetta Mini and uses around 400-500MB memory and the standard Rosetta app which can use up to 2GB memory per thread. I've been running it on the Intel GPU machines as they have 32GB. Tasks take a certain amount of time, which defaults to 8 hours but you can adjust it to more of less via the web site project preferences.

One of the Intel GPU machines failed half the work units complaining about memory errors, so once it finishes its current work I will have to have a look at it. I suspect the memory isn't seated properly for one or two sticks of memory because half of them seem to work.


GPUgrid
They've said they will also be doing some Corona virus protein studies, but don't currently have any. They have a Nvidia GPU app. I have been running some of their current work recently.

28 March 2020

X10SRi Storage Server rebuild

I felt inspired by Linus Tech Tips storage server on the cheap build, see my previous post for a link. So in true send up of Linus here is my version. Unlike Linus I don't have any sponser and have to pay for my own hardware and reuse parts where possible.

The case is a 2015 vintage Fractal Designs Define R2 which has excellent build quality, and lots of room to stuff things inside. We're going to need it. It also weights a hefty 12 kilograms.

You can see the sound-dampening foam on the front panel. It also has thicker padding on the side panels (not shown) as well as the top of the case.

Its has louvered doors. There's nylon mesh in front of the fans to keep out the dust. As Linus pointed out with his build this arrangement usually means the airflow isn't the best. To counter that I have replaced the original fans with Noctua ones all round. Better airflow, reliability and they are fairly quiet.

This is what we're starting with. It had 4 x 4TB drives. They started off life running Windows Server 2008 R2 using the RAID controller in hardware mode and then most recently its been switched to Linux and the controller is in JBOD mode with ZFS on Linux providing the RAID functionality.

So its out with the old. The bits on the side are the drive caddy.

And in with the new.

But wait there's more. We're doubling the number of drives in order to fill all 8 drive bays. I'm not counting the 5.25" drive bays at the top of the case. Besides these cost me a small fortune.

Here is the finished product. Yes I know it needs cable management Linus.

And what does all this look like in Linux I hear you ask, Like this:

# zpool status
  pool: pool1
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:02:09 with 0 errors on Sat Mar 28 03:59:10 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        pool1                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca298c13b97  ONLINE       0     0     0
            wwn-0x5000cca298c143ed  ONLINE       0     0     0
            wwn-0x5000cca298c13f04  ONLINE       0     0     0
            wwn-0x5000cca298c16623  ONLINE       0     0     0
            wwn-0x5000cca298c14c7f  ONLINE       0     0     0
            wwn-0x5000cca298c16afa  ONLINE       0     0     0
            wwn-0x5000cca298c152cd  ONLINE       0     0     0
            wwn-0x5000cca298c1431f  ONLINE       0     0     0

errors: No known data errors

# df /pool1
Filesystem       1K-blocks      Used   Available Use% Mounted on
pool1          75325838848 162769536 75163069312   1% /pool1


Thats 71TB of usable space. Probably more because I turned lz4 compression on for the pool. I went with two drive redundancy so I can lose any two drives and still not lose my data.

Well that's it for this post. If you like this post then give me the thumbs up and if you don't we'll it doesn't matter because this isn't YouTube.

15 March 2020

15th of March

Farm status
Intel GPUs
One running Einstein Gravity wave work

Nvidia GPUs
Two running GPUgrid work

Raspberry Pis
Eight running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Project news - GPUgrid
They've got an experiment going with lots of work units. My GTX 1660 Ti's are talking about an hour and a half each. I'm only running two machines so I don't trip the circuit breaker. There is no shortage of work at the moment.


Einstein gravity wave work
I have been running the gravity wave work on CPU for a while and some of the frequencies are now using quite a bit of memory. So much so that if I try and run 12 on the Nvidia GPU machines that I can get 7 running and the others end up waiting for memory. They have 16GB of memory but some of the work units are using 2GB each.

I even got one of the Intel GPU machines running them because they have 32GB of memory, but they're a lot slower.


Storage server
I ordered some larger hard disks for one of the storage servers. It only has 32GB of memory so I swapped the memory out of the other one. I found that the X10SRi-F motherboard has a fault in DIMM socket C1 so I can't populate the memory as recommended. Supermicro stopped making the X10 motherboards so I might have to try getting a second-hand one to replace it. I will also see if its possible to get the motherboard repaired. In the mean time I have put the 32GB back in the sockets to the left of the CPU which work (the C1 socket is to the right of the CPU).

I was inspired by Linus (of Linus Tech Tips) who made a relatively cheap storage server using a Fractal Designs "Define" case and he managed to stuff 20 hard disks in it here: https://www.youtube.com/watch?v=FAy9N1vX76o

I wouldn't recommend going above 16 drives as the case only had 16 drive bays. He attached drives to the top and back of the case which doesn't do much for reliability. My file server with 32GB of memory is in a Fractal Design Define R2 case, an older version and it has 8 + 2 drive bays. I think 8 drives is enough for my purposes. Its currently got a SATA SSD as the boot drive and 4 x 4TB drives.


Update 17 Mar 2020
I need to make a couple of corrections to the Storage server details above.
1. The X10SRi-F motherboards are still available.
2. The case Linus used was a Fractal Designs Define 7 XL which is larger than the Define.

I still wouldn't recommend bolting drives onto the top or back of the case though, just use the drive bays that it comes with.

Meanwhile I've ordered 4 x 14TB HDD. Today ordered 3 more. The price went up $51 between my first and second orders (6 days). Oh and they have been delayed.

07 March 2020

7th of March

Farm status
Intel GPUs
One running Einstein gravity wave work

Nvidia GPUs
All running Einstein gravity wave work

Raspberry Pis
Eight running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Project news - Seti
Sad news this week. They announced they are going into "hibernation" from the 31st of March. That generally means the end of the project.


Other news
I've been doing some GPUgrid work on a couple of the Nvidia GPU machines after we got updated drivers via the Debian buster-backports repo. I also did some Milkyway work. Since then I have updated all of the Nvidia GPU machines to the 440.59 drivers.

The Intel "neo" drivers have finally made it into a Debian repo. I have installed them on one of the Intel GPU machines but haven't tried any GPU crunching on it yet. Generally it slows the whole CPU down so it not a good idea to use both at the same time.

I was looking at building a Ryzen 3950x machine, however with the news that Seti is closing down have decided not to proceed with it. Einstein CPU work is very demanding on the memory system and so it wouldn't suit their app.

And it more news this week there is yet another security bug with Intel's ME module and they don't think they can correct it. Supposedly the 10th generation CPU's aren't effected but I can't see why anyone would buy them given all the security flaws.

23 February 2020

23rd of February

Farm status
Intel GPUs
All off

Nvidia GPUs
Had all of then doing Einstein and Milkyway work

Raspberry Pis
All running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Other news
I had a couple of wins this week. Firstly I managed to get Debian Buster to work on the Nvidia GPU machines and second I got some work from GPUgrid after almost a year of not processing anything from them.

While it may seem to be a repeat of my last post, after getting the upgrade to Buster working I went and wiped the machine and re-did it. That failed again. Then the 5.4.13 kernel got released which seems to have got it going again.


Debian Buster
There were two issues with Debian Buster.

The 4.19 kernel locks up at boot time due to needing entrophy. That was addressed by none other than Linus Torvalds in the 5.4.8 kernel. The buster-backports repo has the 5.4.13 kernel as I write this.

The other was the need to explicitly install the nvidia-driver package to be able to display in the monitors native resolution otherwise the desktop its stuck at 1024x768. Once installed it will display properly. This didn't need to be done under Stretch.


GPUgrid work
GPUgrid have been trying different apps in order to get one that would work with modern graphics cards (ie the Turing architecture). It has taken them almost a year to sort out. They now have a large batch of work that uses the new app.

The download was huge, one of the many files is 100MB but after you have got it on the machine it doesn't need to download them again. Once running they immediately take the GPU up to 99% load and its power cap (120 watts on my GTX1660Ti cards). Run time is around 90 minutes on the GTX1660Ti. You'll need a CUDA 10.0 or a later driver to be able to run them.


Seti woes
Dispite the recent fund-raising drive to replace the hard disks in their database server they're still having issues with supplying enough work due to the database not fitting into memory. It was suggest years ago that they should shorten the deadlines but they haven't tried that yet. Generally I can't get work from them so have been concentrating on other projects.

Also in the news this week was Breakthrough Listen released 2PB of data, most of which we don't process at all. They're being handled by the Seti institute which is not related to the Seti@home project.

16 February 2020

Mid February

Farm status
Intel GPUs
All off. Have been doing bursts of work

Nvidia GPUs
All off. Had 3 doing bursts of Seti and Milkyway work

Raspberry Pi
Running bursts of work as weather permits

For news on the Raspberry Pis see Marks Rpi Cluster


Buster and Nvidia-driver
After updating one of the Nvidia GPU machines to Debian Buster and trying all sorts of things to get it to display properly (it would only do 1024x768 resolution) I have finally got it going.

Under Stretch (the release prior) I install nvidia-kernel-dkms, nvidia-opencl-icd and nvidia-smi and it installs the glx driver as well as the CUDA and OpenCL components. Not so under Buster. You have to install nvidia-driver in order to get the glx driver. It seems the packaging has changed with Buster for some reason.

Now that I have worked that out I can now look at upgrading the other machines. I raised a bug with Debian in September 2019.


Weather news
Or Weather or not to run. We've gone from bush fires to floods. The weather is presenting challenges hence the bursts of work when I can. Typically it will be just a few hours so I will run Seti and Milkyway as their work units are fairly short (an hour or two on the CPU and minutes on the GPU). If it looks like it will be cooler for longer I will run Einstein CPU work which takes 7 to 14 hours depending on how many I run at once.

27 January 2020

27th of January

Farm status
Intel GPUs
All running Asteroids and Seti work

Nvidia GPUs
Two running Einstein gravity wave work. Two off.

Raspberry Pis
Seven running Einstein BRP4 work

For news on the Raspberry Pis see Marks Rpi Cluster


Weather
Its been hot and humid and a few drops of rain. Too hot to run the farm. Today was a bit of a break in the weather so I have most of the farm running. The rest of this week is looking like it will be hot and humid so I don't expect to get much work done.


Project news - Seti
The project increased the allowed number of work units per computer. That promptly blew out the size of the database with the number of work units "out in the field". Last week they limited the result creation rate in order to get it back down in size. The project needs to fit the entire database table into memory and the machine already has the maximum amount of memory that it can hold. They resized the table last week but we still seem to have issues getting work.

In addition to database issues Seti@home did a fund raiser to replace the 120 x 2TB disk drives with 26 x 16TB disk drives (two are spares). It was supposed to run for 2 weeks but met its target in the first week.

11 January 2020

New year. New decade

Farm status
Intel GPUs
All off

Nvidia GPUs
All running Einstein gravity wave work (on CPU)

Raspberry Pis
Two running, the rest are off.


New year
As another year ticks by and we're still crunching. Recently I've been looking at an industrial unit but the prices are too high in Sydney. Maybe something further away.

The bush fires keep effecting the air quality. The air quality and temperatures effect the ability to run the computers. Fortunately I haven't been directly impacted by the fires.


Funding
I mentioned in my 24th of November post that I made an offer to fund the SuperHost idea but I had no feedback from any of the key people in the BOINC world. I still haven't had any response. I have some money that I put aside for such development work but seeing as there is no interest in SuperHost I may as well use it for another purpose. I have two other things I was interested in:

1. A monitoring tool for a farm of BOINC machines that would highlight which machines need attention. The idea is to use it in conjunction with BOINCtasks. I did run the idea past Fred Efmer (the author of BT) but he felt that BT was able to issue alerts and wasn't inclined to do further development.

2. Power9 CPU support. Now these aren't exactly your high end desktop, they are more of a server grade CPU. Raptor computing make desktop and server grade machines along with IBM and probably a few more companies. The US has a couple of clusters (Summit and Sierra) running them. Debian Buster can run on them so there is already Linux and BOINC support. That just leaves the project apps needing to be ported to the ppc64el architecture as its known.

I expect there are more people who would be interested in point 1 than there are people who run a Power9 CPU so I am more inclined towards point 1.