Parallel shell with pdsh

Shell Games

© Lead Image © Kasia biel, 123RF.com

© Lead Image © Kasia biel, 123RF.com

Author(s):

The most fundamental tool needed to administer a cluster is a parallel shell, which allows you to run the same command on a series of nodes. In this article, we look at pdsh.

A parallel shell allows you to run the same command on designated nodes in the cluster, so you don't have to log in to each node to run the command. This tool can be useful in many ways, but I like to use it when performing administrative tasks, such as:

  • Checking the versions of particular software packages on each node
  • Checking the OS version on all nodes
  • Checking the kernel version on all nodes
  • Searching the system logs on each node (if you don't store them centrally)
  • Examining the CPU usage on each node
  • Examining local I/O (if the nodes are doing local I/O)
  • Checking whether any nodes are swapping
  • Spot-monitoring the compute nodes

The complete list of possible tasks is extensive, but anything you want to do on a single node can be done on a large number of nodes using a parallel shell tool.

If you try to use a parallel shell on a 50,000-node cluster, however, the time skew could be large enough to make the results meaningless. Although certain techniques can allow the use of parallel commands on a large number of nodes, parallel shells are better used on a modest number of nodes or to gather information on slowly varying data. Parallel shells are even great for administering instances in the cloud on something like Amazon Web Services (AWS).

Many parallel shells are available – including DSH [1], PyDSH [2], PPSS [3], PSSH [4], pdsh [5], PuSSH [6], sshpt [7], and mqsh [8] – and each tool has its pros and cons. (Note: I have not tested all of these tools, so I can't vouch for them.) Several of these tools are written in Python, which has become a very popular tool for devops.

In this article, I'll select one of the parallel shells to illustrate its possibilities. Other tools are fairly similar, with some syntactical differences and various sets of features. The tool I'm going to talk about here is pdsh.

Introduction to pdsh

Pdsh is arguably one of the most popular parallel shell tools. The most recent version on SourceForge as of writing this article is 2.26, dated 2011-05-01. Code development appears to have moved to Google code. The most recent version there is 2.29, updated February 2013. I'll be using that version in this article.

Pdsh is very interesting because it allows you to run commands on multiple nodes using only ssh. The client nodes only need ssh, which is generally present on systems, and you don't need to install any extra software on the compute nodes – you just need ssh. However, you need the ability to SSH to any node without a password ("passwordless SSH").

Building and Installing pdsh

Building and installing pdsh is really simple if you've built code using GNU's autoconfigure before. The steps are quite easy:

./configure --with-ssh --without-rsh
make
make install

This puts the binaries into /usr/local/, which is fine for testing purposes. For production work, I would put it in /opt or something like that – just be sure it's in your path.

You might notice that I used the --without-rsh option in the configure command. By default, pdsh uses rsh, which is not really secure, so I chose to exclude it from the configuration. In the output in Listing 1, you can see the pdsh rcmd modules (rcmd is the remote command used by pdsh). Notice that the "available rcmd modules" at the end of the output lists only ssh and exec. If I didn't exclude rsh, it would be listed here, too, and it would be the default. To override rsh and make ssh the default, you just add the following line to your .bashrc file:

Listing 1

rcmd Modules

 

export PDSH_RCMD_TYPE=ssh

Be sure to "source" your .bashrc file (i.e., source .bashrc) to set the environment variable. You can also log out and log back in. If, for some reason, you see the following when you try running pdsh,

$ pdsh -w 192.168.1.250 ls -s
pdsh@home4: 192.168.1.250: rcmd: socket: Permission denied

then you have built it with rsh. You can either rebuild pdsh without rsh, or you can use the environment variable in your .bashrc file, or you can do both.

First pdsh Commands

To begin, I'll try to get the kernel version of a node by using its IP address:

$ pdsh -w 192.168.1.250 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

The -w option means I am specifying the node(s) that will run the command. In this case, I specified the IP address of the node (192.168.1.250). After the list of nodes, I add the command I want to run, which is uname -r in this case. Notice that pdsh starts the output line by identifying the node name.

If you need to mix rcmd modules in a single command, you can specify which module to use in the command line,

$ pdsh -w ssh:laytonjb@192.168.1.250 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

by putting the rcmd module before the node name. In this case, I used ssh and typical ssh syntax.

A very common way of using pdsh is to set the environment variable WCOLL to point to the file that contains the list of hosts you want to use in the pdsh command. For example, I created a subdirectory PDSH where I create a file hosts that lists the hosts I want to use:

[laytonjb@home4 ~]$ mkdir PDSH
[laytonjb@home4 ~]$ cd PDSH
[laytonjb@home4 PDSH]$ vi hosts
[laytonjb@home4 PDSH]$ more hosts
192.168.1.4
192.168.1.250

I'm only using two nodes: 192.168.1.4 and 192.168.1.250. The first is my test system (like a cluster head node), and the second is my test compute node. You can put hosts in the file as you would on the command line separated by commas. Be sure not to put a blank line at the end of the file because pdsh will try to connect to it. You can put the environment variable WCOLL in your .bashrc file:

export WCOLL=/home/laytonjb/PDSH/hosts

As before, you can source your .bashrc file, or you can log out and log back in.

Specifying Hosts

I won't list all the several other ways to specify a list of nodes, because the pdsh website [9] discusses virtually all of them; however, some of the methods are pretty handy. The simplest way is to specify the nodes on the command line is to use the -w option:

$ pdsh -w 192.168.1.4,192.168.1.250 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

In this case, I specified the node names separated by commas. You can also use a range of hosts as follows:

pdsh -w host[1-11]
pdsh -w host[1-4,8-11]

In the first case, pdsh expands the host range to host1, host2, host3, …, host11. In the second case, it expands the hosts similarly (host1, host2, host3, host4, host8, host9, host10, host11). You can go to the pdsh website for more information on hostlist expressions [10].

Another option is to have pdsh read the hosts from a file other than the one to which WCOLL points. The command shown in Listing 2 tells pdsh to take the hostnames from the file /tmp/hosts, which is listed after -w ^ (with no space between the "^" and the filename). You can also use several host files,

Listing 2

Read Hosts from File

 

$ more /tmp/hosts
192.168.1.4
$ more /tmp/hosts2
192.168.1.250
$ pdsh -w ^/tmp/hosts,^/tmp/hosts2 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64
192.168.1.250: 2.6.32-431.11.2.el6.x86_64

or you can exclude hosts from a list:

$ pdsh -w -192.168.1.250 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64

The option -w -192.168.1.250 excluded node 192.168.1.250 from the list and only output the information for 192.168.1.4. You can also exclude nodes using a node file:

$ pdsh -w -^/tmp/hosts2 uname -r
192.168.1.4: 2.6.32-431.17.1.el6.x86_64

In this case, /tmp/hosts2 contains 192.168.1.250, which isn't included in the output. Using the -x option with a hostname,

$ pdsh -x 192.168.1.4 uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64
$ pdsh -x ^/tmp/hosts uname -r
192.168.1.250: 2.6.32-431.11.2.el6.x86_64
$ more /tmp/hosts
192.168.1.4

or a list of hostnames to be excluded from the command to run also works.

More Useful pdsh Commands

Now I can shift into second gear and try some fancier pdsh tricks. First, I want to run a more complicated command on all of the nodes (Listing 3). Notice that I put the entire command in quotes. This means the entire command is run on each node, including the first (cat /proc/cpuinfo) and second (grep bogomips) parts.

Listing 3

Quotation Marks 1

 

In the output, the node precedes the command results, so you can tell what output is associated with which node. Notice that the BogoMips values are different on the two nodes, which is perfectly understandable because the systems are different. The first node has eight cores (four cores and four Hyper-Thread cores), and the second node has four cores.

You can use this command across a homogeneous cluster to make sure all the nodes are reporting back the same BogoMips value. If the cluster is truly homogeneous, this value should be the same. If it's not, then I would take the offending node out of production and check it.

A slightly different command shown in Listing 4 runs the first part contained in quotes, cat /proc/cpuinfo, on each node and the second part of the command, grep bogomips, on the node on which you issue the pdsh command.

Listing 4

Quotation Marks 2

 

The point here is that you need to be careful on the command line. In this example, the differences are trivial, but other commands could have differences that might be difficult to notice.

One very important thing to note is that pdsh does not guarantee a return of output in any particular order. If you have a list of 20 nodes, the output does not necessarily start with node 1 and increase incrementally to node 20. For example, in Listing 5, I run vmstat on each node and get three lines of output from each node.

Listing 5

Order of Output

laytonjb@home4 ~]$ pdsh vmstat 1 2
192.168.1.4: procs  ------------memory------------   ---swap-- -----io---- --system--  -----cpu-----
192.168.1.4:  r  b     swpd   free    buff   cache     si   so    bi    bo   in    cs  us sy id wa st
192.168.1.4:  1  0        0 30198704  286340  751652    0    0     2     3   48    66   1  0 98  0  0
192.168.1.250: procs -----------memory------------   ---swap-- -----io---- --system-- ------cpu------
192.168.1.250:  r  b   swpd   free    buff   cache     si   so    bi    bo   in    cs us sy  id wa st
192.168.1.250:  0  0      0 7248836   25632  79268      0    0    14     2   22    21  0  0  99  0  0
192.168.1.4:    1  0      0 30198100  286340 751668     0    0     0     0  412   735  1  0  99  0  0
192.168.1.250:  0  0      0 7249076   25632  79284      0    0     0     0   90    39  0  0 100  0  0

At first, it looks like the results from the first node are output first, but then the second node creeps in with its results. You need to expect that the output from a command that returns more than one line per node could be mixed. My best advice is to grab the output, put it into an editor, and rearrange the lines, remembering that the lines for any specific node are in the correct order.

Maybe someone with some serious pdsh-fu has a simple solution (please let me know if you have a technique). The other option is to issue only commands that return a single line of output. The results might not return in node order, but it would be easier to sort them.

You can easily use pdsh to run scripts or commands on each node. For example, if you have read my past articles on processor and memory metrics [11] or processes, networks, and disk metrics [12], you can use those scripts to gather metrics quickly and easily on each node. However, you might want to modify the scripts so you only get one line of output (or maybe add switches in the code so you can specify the output) to make it easier to sort the results.

pdsh Modules

Previously, I mentioned that pdsh uses rcmd modules to access nodes. The authors have extended this to create modules for various specific situations. The pdsh modules page [13] lists other modules that can be built as part of pdsh, including:

  • machines
  • genders
  • nodeupdown
  • slurm
  • torque
  • dshgroup
  • netgroup

These modules extend the functionality of pdsh. For example, the SLURM module allows you to run the command only on nodes specified by currently running SLURM jobs. When pdsh is run with the SLURM module, it reads the list of nodes from the SLURM_JOBID environment variable. Running pdsh with the -j obid option gets the list of hosts from the jobid specified.

Summary

Even experienced admins use parallel shell tools to understand the states of their systems. These tools are easily scriptable, so you can store the data in a flat file or in a database.

Although you have the choice of several parallel shells, arguably the most popular tool of this kind is pdsh, which I briefly showed how to build, install, and use. Pdsh is not very difficult to employ and can be used in conjunction with commands or scripts to gather information about compute nodes. A range of command options gives it a tremendous amount of flexibility for almost any scenario you can imagine.

The Author

Jeff Layton has been in the HPC business for almost 25 years (starting when he was 4 years old). He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales.