24 May 2016

24th of May

Farm Status
Intel GPUs
All 6 i7-6700's are running Einstein O1 search (CPU only)

Nvidia GPUs
Both 6 core/12 thread machines are running Einstein O1 search (CPU only)

Raspberry Pis
All 7 Pi3's are running Einstein BRP4 search


Status
Pretty much everything that can is running the Einstein O1 search, apart from two machines. It was warmer earlier in the week so they were only running overnight. The weather has turned cooler so they're running 24/7 at the moment.

I had one Intel GPU machine process some Climate Prediction weather at home2 tasks. They took around 220 hours to complete. Not exactly quick. In comparison the Einstein O1 tasks take 12 to 13.5 hours depending on which machine is running them.

I tried Intel's beta video drivers 4444 on one of the i7-6700 's last week to see if they have finally fixed their OpenCL errors. Nope. I raised a bug with Intel in November 2015. I was looking at a 16 core Xeon cruncher, but it and some other purchases are on hold until Intel fix their bugs.


Pi stuff
The Raspberry Pi's have reached a recent average credit of 3000 after a couple of weeks running 24/7. I have had to replace an SD card in one Pi as its been getting slower and slower. Hopefully the new SD card will resolve the issue.

I can't update Raspbian Stretch. Every time I do the Pi loses the network interface after rebooting and I have to reimage the SD card. I reported it two weeks ago in the Raspberry Pi forums.

I am still waiting on the Raspberry Pi cases and Noctua fans. The cases have been on back-order for a while now. The computer shop didn't order the fans so I had to remind them this week. Three of the Pi3's are naked and just sitting on the cardboard boxes they came in. I am thinking of getting another Pi3 to add to the farm.

14 May 2016

14th of May

Farm Status
Intel GPUs
Running Einstein O1 search overnight. One machine running 4 Weather at home tasks (70% done after 170 hours).

Nvidia GPUs
The 6 core/12 thread machines are running Einstein O1 search overnight.

Raspberry Pis
Running Einstein BRP4 work 24/7.


BOINC testing
We've had 7.6.32 for a little while. I finally put it onto the Windows machines today. I am still waiting for it to come through for the Raspberry Pi's which are on .31 at the moment. The main change is to allow for multiple download servers and if it gets a failure to try the next server on the list.


Parallella's and Pi2's
As I mentioned a couple of posts back I was considering retiring them. They were removed from the farm. The Parallella's had a nice powder-coating of dust due to the fan on top blowing air straight in so I had to give them a clean. Currently they're off.

The Pi2's were also removed from the farm and I have used them for the Beowulf cluster that I wrote the tutorial for. Currently they're off.


Other stuff
With the above Pi3 changes the Pi part of the farm consists of 7 Raspberry Pi3's. Due to the number of power boards and power adapters I am looking at the Anker USB hubs to rationalise the power side of things.

Nvidia has a new GPU chip called the Pascal soon. There will be different versions of them. We're expecting the GTX1080 as a replacement for the GTX980 at the end of May. There is also a GTX1070 due out in June. If I'm replacing any of the GPU cards I would probably get rid of my GTX970's and then look at upgrading the GTX750Ti's but we will have to see how they perform.

08 May 2016

Raspberry Pi Beowulf Cluster

A few weekends ago I spent a bit of time trying to make sense of the various instructions for setting up a Beowulf cluster using Raspberry Pi's. What I have below is the steps I took with a bit of trial and error to get it going.

With Beowulf you have a Head or Master node that is used to control the compute nodes. You'll need a minimum of 2 Pi's. One is for the head node and one as a compute node. You can have as many compute nodes if you wish. In this example I am just doing a single node cluster.


Parts
a. Raspberry Pi's x compute nodes you want + 1 for head node
b. microSD cards x nodes (minimum 4Gb)
c. Network cables x nodes
d. Power adapters/cables x nodes

If you're not comfortable using the Linux command line then this isn't the best project for you as there is no GUI when using SSH.

I have a Windows computer that I use to access the Pi's via SSH and it has a SD card writer. The software I use is Putty for accessing the Pi's and Win32DiskImager to read/write images to the SD cards.

As I only did two nodes I updated each one from the Jessie release of Raspbian to the Stretch release. If you are doing a larger number of nodes you might want to write Jessie-Lite onto the SD card, get it upgraded to Stretch and then take a copy of that image and use it for the other nodes.


Create SD card image
1. Download the Raspbian image and unpack it. I started using the Jessie Lite version from March 2016 as it was the latest available version and doesn't come with too much extra stuff.

2. Write the Raspbian image to the microSD card.

3. Insert microSD card into the Pi and plug all the other bits in and power it up.

4. At this point I have a Pi called "raspberrypi" on my network and the router has automatically given it an  IP address of 192.168.0.150. I need to give it a different name to the default and a fixed address. I can see it via my router and assign a specific IP address, I am setting the router up to use 192.168.0.100. When the Pi is rebooted it will get this new IP address.

Login to the Pi over SSH. The default user is "pi" and the password is "raspberry" (without the quotes). At the command prompt run raspi-config by typing "sudo raspi-config".
- Expand the filesystem
- change the user password
- change the name of the Pi
- Change memory split (I usually set it to 16)
- Set locale
- Set timezone
And reboot

For the first one I called it HeadNode as it will be the head of the cluster.

5. Login to the Pi again using your new password and we can now update it. Edit /etc/apt/sources.list to point to the stretch release (change the word Jessie to Stretch). I use nano but there are other text editors. Comment out all the lines in /etc/apt/sources.list.d/raspi.list by putting a # symbol in the first column.

6. Type "sudo apt-get update" and it should  fetch the latest list of programs. This next bit takes some time, maybe an hour or two. Type "sudo apt-get dist-upgrade -y" to upgrade everything to the latest versions from the Raspbian repository to the stretch release. Once done you can reboot it.

7. Write the Jessie-Lite image to another microSD card. Insert it into the next Pi. This one is going to be our compute node. Power it up and repeat step 4. For this one I have called it ComputeNode1. Again I have assigned a specific IP address on the router as 192.168.0.101. Update it as per points 5 and 6..

7. At this point we should have one Pi called HeadNode with an IP address of 192.168.0.100 and one called ComputeNode1 with an IP address of 192.168.0.101.

8. Login to the head node and we'll need to provide the names of the other machines on the network we want to use. We need to edit the /etc/hosts file so type in "sudo nano hosts" and we need to add the IP addresses of the compute nodes.

Remove the 127.0.1.1 HeadNode (or ComputeNode1) line.
Add a line for each one at the end that has the IP address and the hostname. Add:
192.168.0.100 HeadNode
192.168.0.101 ComputeNode1

This way each machine will know the IP address for the others. Now lets check the connectivity by pinging each one. Type "ping ComputeNode1" and it should say "64 bytes from ComputeNode1 (192.168.0.101)" and a response time. Press Ctrl-C to stop it.

9. Login to ComputeNode1 and repeat the hosts file and ping test.


Setup NFS share
1. On headnode we'll create a shared directory that all nodes can all access. We start by installing the nfs-server software by typing "sudo apt-get install nfs-kernel-server". Enable services by typing "sudo update-rc.d rpcbind enable && sudo update-rc.d nfs-common enable" and then "sudo reboot".

2. Lets create a directory and set the owner to user pi. Type "sudo mkdir /mirror". Then "sudo chown -R pi:pi /mirror".

3. We now need to export it so the other nodes can see it. Type "sudo nano /etc/exports" to edit the file. At the end we need to add a line that reads "/mirror  ComputeNode1(rw,sync,no_subtree_check)".

4. Restart the nfs-kernel-server by typing "sudo service nfs-kernel-server restart". Export the details by typing "sudo exportfs -a" and check its exporting by typing "sudo exportfs" and it should list the details from /etc/exports.

5. Over to computenode1 and we'll set it up now. On computenode1 we need to create a mount point and set the owner to user pi, type "sudo mkdir /mirror" followed by "sudo chown -R pi:pi /mirror".

6. Do a "showmount -e headnode" command. It should show the export list. If it gives an error then the rpcbind service isn't starting automatically. This seems to be a bug in Jessie and is resolved in Stretch, which is why we updated.

7. Mount the drive by typing "sudo mount headnode:/mirror /mirror". Now lets check it worked by doing a "df -h" command and it should be listed. To check permissions type "touch /mirror/test.txt". Go back to headnode and lets see if we can see the file by looking at the directory, type "ls -lh /mirror" which should show our test.txt file.

8. On computenode1 we want it to automatically mount at start up instead of doing it manually. Unmount it by typing "sudo umount /mirror". Edit the fstab file by typing "sudo nano /etc/fstab" and add the following "headnode:/mirror  /mirror  nfs". To test do a "mount -a" command.

It seems that the mount sometimes fails on the computenode, especially if headnode hasn't booted up first so you may need to manually do the mount command. In other tutorials I have see use of the autofs which will mount the directory when its first accessed. I won't go into details here.


Setup password-less SSH
1. Generate an ssh key to allow password-less login by typing "ssh-keygen -t rsa" and when prompted for a username and password just press enter.

2. Copy the generated public key to the other nodes by typing "cat ~/.ssh/id_rsa.pub | ssh pi@ IP Address 'cat >> .ssh/authorized_keys'" where IP Address is the IP address of the other node(s).

3. SSH into the other machine manually by typing "ssh" and see if it will let you logon without having to type in your username and password.

Repeat for each node.


Install MPICH
1. On both machines we'll need MPICH, so type in "sudo apt-get install mpich". To make sure it installed correctly type "which mpiexec" and "which mpirun".

2. On HeadNode change directory to our shared one by typing "cd /mirror".

3. Create a file listing all our compute nodes. Type "nano /mirror/machinefile" and add the following:

computenode1:4  # spawn 4 processes on computenode1
headnode:2 # spawn 2 processes on headnode

This says ComputeNode1 can run 4 tasks (at a time) and HeadNode can run 2. As you add more compute nodes repeat the computenode lines with the correct names and number of tasks allowed. You can have different machines so a Raspberry Pi B or B+ would only execute 1 task and Pi2's and Pi3's could execute 4 tasks at a time.

If you want a node to run only one task at a time then omit the colon and number. If its listed in the machinefile then its assumed to be able to run at least one task.

3. Lets create a simple program called mpi_hello, so on headnode type "nano mpi_hello.c" and paste the following in:
#include < stdio.h >
#include < mpi.h >
int main(int argc, char** argv) {
    int myrank, nprocs;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    printf("Hello from processor %d of %d\n", myrank, nprocs);
    MPI_Finalize();
    return 0;
}

4. Compile it by typing "mpicc mpi_hello.c -o mpi_hello".

5. Run it by typing "mpirun -n 4 -f machinefile -wdir /mirror ./mpi_hello". The number following the -n tells it how many processes to run and the machinefile is the list of machines we created above. If it works we should get something like this as output:

Hello from processor 0 of 4
Hello from processor 1 of 4
Hello from processor 2 of 4
Hello from processor 3 of 4

Try different numbers after -n, for example -n 6 says to run 6 tasks which if we allowed headnode to run tasks would all run at the same time. If we specific more than we have cpu cores then they will run one after the other. If you allow headnode to run tasks you will notice the complete quicker than the compute node,.

The "-wdir /mirror" tells it the working directory. If you get errors check that its mounted and that all nodes have access. All the nodes need to be able to access it.

Some other suggestions
1. Use an external hard disk for additional disk space. WD make a PiDrive designed for the Raspberry Pi, but any USB hard disk that has its own power source should work.

2. There is a program called ClusterSSH that can be used to login to all the nodes at once and repeat the commands on each node. This can make maintenance a whole lot easier with multiple nodes.

3. Use a powered USB hub to power the Pi's and peripherals instead of using lots of power adapters.