08 May 2016

Raspberry Pi Beowulf Cluster

A few weekends ago I spent a bit of time trying to make sense of the various instructions for setting up a Beowulf cluster using Raspberry Pi's. What I have below is the steps I took with a bit of trial and error to get it going.

With Beowulf you have a Head or Master node that is used to control the compute nodes. You'll need a minimum of 2 Pi's. One is for the head node and one as a compute node. You can have as many compute nodes if you wish. In this example I am just doing a single node cluster.


Parts
a. Raspberry Pi's x compute nodes you want + 1 for head node
b. microSD cards x nodes (minimum 4Gb)
c. Network cables x nodes
d. Power adapters/cables x nodes

If you're not comfortable using the Linux command line then this isn't the best project for you as there is no GUI when using SSH.

I have a Windows computer that I use to access the Pi's via SSH and it has a SD card writer. The software I use is Putty for accessing the Pi's and Win32DiskImager to read/write images to the SD cards.

As I only did two nodes I updated each one from the Jessie release of Raspbian to the Stretch release. If you are doing a larger number of nodes you might want to write Jessie-Lite onto the SD card, get it upgraded to Stretch and then take a copy of that image and use it for the other nodes.


Create SD card image
1. Download the Raspbian image and unpack it. I started using the Jessie Lite version from March 2016 as it was the latest available version and doesn't come with too much extra stuff.

2. Write the Raspbian image to the microSD card.

3. Insert microSD card into the Pi and plug all the other bits in and power it up.

4. At this point I have a Pi called "raspberrypi" on my network and the router has automatically given it an  IP address of 192.168.0.150. I need to give it a different name to the default and a fixed address. I can see it via my router and assign a specific IP address, I am setting the router up to use 192.168.0.100. When the Pi is rebooted it will get this new IP address.

Login to the Pi over SSH. The default user is "pi" and the password is "raspberry" (without the quotes). At the command prompt run raspi-config by typing "sudo raspi-config".
- Expand the filesystem
- change the user password
- change the name of the Pi
- Change memory split (I usually set it to 16)
- Set locale
- Set timezone
And reboot

For the first one I called it HeadNode as it will be the head of the cluster.

5. Login to the Pi again using your new password and we can now update it. Edit /etc/apt/sources.list to point to the stretch release (change the word Jessie to Stretch). I use nano but there are other text editors. Comment out all the lines in /etc/apt/sources.list.d/raspi.list by putting a # symbol in the first column.

6. Type "sudo apt-get update" and it should  fetch the latest list of programs. This next bit takes some time, maybe an hour or two. Type "sudo apt-get dist-upgrade -y" to upgrade everything to the latest versions from the Raspbian repository to the stretch release. Once done you can reboot it.

7. Write the Jessie-Lite image to another microSD card. Insert it into the next Pi. This one is going to be our compute node. Power it up and repeat step 4. For this one I have called it ComputeNode1. Again I have assigned a specific IP address on the router as 192.168.0.101. Update it as per points 5 and 6..

7. At this point we should have one Pi called HeadNode with an IP address of 192.168.0.100 and one called ComputeNode1 with an IP address of 192.168.0.101.

8. Login to the head node and we'll need to provide the names of the other machines on the network we want to use. We need to edit the /etc/hosts file so type in "sudo nano hosts" and we need to add the IP addresses of the compute nodes.

Remove the 127.0.1.1 HeadNode (or ComputeNode1) line.
Add a line for each one at the end that has the IP address and the hostname. Add:
192.168.0.100 HeadNode
192.168.0.101 ComputeNode1

This way each machine will know the IP address for the others. Now lets check the connectivity by pinging each one. Type "ping ComputeNode1" and it should say "64 bytes from ComputeNode1 (192.168.0.101)" and a response time. Press Ctrl-C to stop it.

9. Login to ComputeNode1 and repeat the hosts file and ping test.


Setup NFS share
1. On headnode we'll create a shared directory that all nodes can all access. We start by installing the nfs-server software by typing "sudo apt-get install nfs-kernel-server". Enable services by typing "sudo update-rc.d rpcbind enable && sudo update-rc.d nfs-common enable" and then "sudo reboot".

2. Lets create a directory and set the owner to user pi. Type "sudo mkdir /mirror". Then "sudo chown -R pi:pi /mirror".

3. We now need to export it so the other nodes can see it. Type "sudo nano /etc/exports" to edit the file. At the end we need to add a line that reads "/mirror  ComputeNode1(rw,sync,no_subtree_check)".

4. Restart the nfs-kernel-server by typing "sudo service nfs-kernel-server restart". Export the details by typing "sudo exportfs -a" and check its exporting by typing "sudo exportfs" and it should list the details from /etc/exports.

5. Over to computenode1 and we'll set it up now. On computenode1 we need to create a mount point and set the owner to user pi, type "sudo mkdir /mirror" followed by "sudo chown -R pi:pi /mirror".

6. Do a "showmount -e headnode" command. It should show the export list. If it gives an error then the rpcbind service isn't starting automatically. This seems to be a bug in Jessie and is resolved in Stretch, which is why we updated.

7. Mount the drive by typing "sudo mount headnode:/mirror /mirror". Now lets check it worked by doing a "df -h" command and it should be listed. To check permissions type "touch /mirror/test.txt". Go back to headnode and lets see if we can see the file by looking at the directory, type "ls -lh /mirror" which should show our test.txt file.

8. On computenode1 we want it to automatically mount at start up instead of doing it manually. Unmount it by typing "sudo umount /mirror". Edit the fstab file by typing "sudo nano /etc/fstab" and add the following "headnode:/mirror  /mirror  nfs". To test do a "mount -a" command.

It seems that the mount sometimes fails on the computenode, especially if headnode hasn't booted up first so you may need to manually do the mount command. In other tutorials I have see use of the autofs which will mount the directory when its first accessed. I won't go into details here.


Setup password-less SSH
1. Generate an ssh key to allow password-less login by typing "ssh-keygen -t rsa" and when prompted for a username and password just press enter.

2. Copy the generated public key to the other nodes by typing "cat ~/.ssh/id_rsa.pub | ssh pi@ IP Address 'cat >> .ssh/authorized_keys'" where IP Address is the IP address of the other node(s).

3. SSH into the other machine manually by typing "ssh" and see if it will let you logon without having to type in your username and password.

Repeat for each node.


Install MPICH
1. On both machines we'll need MPICH, so type in "sudo apt-get install mpich". To make sure it installed correctly type "which mpiexec" and "which mpirun".

2. On HeadNode change directory to our shared one by typing "cd /mirror".

3. Create a file listing all our compute nodes. Type "nano /mirror/machinefile" and add the following:

computenode1:4  # spawn 4 processes on computenode1
headnode:2 # spawn 2 processes on headnode

This says ComputeNode1 can run 4 tasks (at a time) and HeadNode can run 2. As you add more compute nodes repeat the computenode lines with the correct names and number of tasks allowed. You can have different machines so a Raspberry Pi B or B+ would only execute 1 task and Pi2's and Pi3's could execute 4 tasks at a time.

If you want a node to run only one task at a time then omit the colon and number. If its listed in the machinefile then its assumed to be able to run at least one task.

3. Lets create a simple program called mpi_hello, so on headnode type "nano mpi_hello.c" and paste the following in:
#include < stdio.h >
#include < mpi.h >
int main(int argc, char** argv) {
    int myrank, nprocs;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    printf("Hello from processor %d of %d\n", myrank, nprocs);
    MPI_Finalize();
    return 0;
}

4. Compile it by typing "mpicc mpi_hello.c -o mpi_hello".

5. Run it by typing "mpirun -n 4 -f machinefile -wdir /mirror ./mpi_hello". The number following the -n tells it how many processes to run and the machinefile is the list of machines we created above. If it works we should get something like this as output:

Hello from processor 0 of 4
Hello from processor 1 of 4
Hello from processor 2 of 4
Hello from processor 3 of 4

Try different numbers after -n, for example -n 6 says to run 6 tasks which if we allowed headnode to run tasks would all run at the same time. If we specific more than we have cpu cores then they will run one after the other. If you allow headnode to run tasks you will notice the complete quicker than the compute node,.

The "-wdir /mirror" tells it the working directory. If you get errors check that its mounted and that all nodes have access. All the nodes need to be able to access it.

Some other suggestions
1. Use an external hard disk for additional disk space. WD make a PiDrive designed for the Raspberry Pi, but any USB hard disk that has its own power source should work.

2. There is a program called ClusterSSH that can be used to login to all the nodes at once and repeat the commands on each node. This can make maintenance a whole lot easier with multiple nodes.

3. Use a powered USB hub to power the Pi's and peripherals instead of using lots of power adapters.

4 comments:

Mark G James said...

Update: 8th of May 2016

It seems the latest releases of Raspbian Stretch will no longer work. I've reported it in the Rpi forums. It seems the eth0 (Ethernet) interface is no more, possibly as part of their kernel 4.4 and firmware upgrades.

Mark G James said...

Update 14th of May 2016

The Raspbian Stretch release is working again. Jessie Lite was updated on the 10th of May 2016 and includes kernel 4.4 now.

Unknown said...

Hi!
this raspberry cluster works with Boinc and seti@home too?

Mark G James said...

In reply to Unknown’s question:

You don’t need a Beowulf cluster to run BOINC. The Rpi can run BOINC straight from the repo and you can run Asteroids, Einstein or Seti on them without any issue apart from cooling. The Pi3’s require a fan andheatsink to keep cool.

I have separated my blog into two now. See marksrpicluster.blogspot.com for updates on it.