Redirect data streams with pipes

Listeners

The tail -f command is useful for reading because it only terminates on instruction or if it fields an end of file (EOF). However, you cannot just redirect a single command, because an EOF crops up after the command has been processed. And using a text file as a transport will not work either because the data location keeps on moving and the tail command no longer has valid information to read.

You can test this by running tail -f against an empty file in one session and redirecting the output of a command to that file in a second session. You will see a file truncated message. The only way to do this would be to append all the information you want the remote machine to process to the text file, but this would cause unnecessary bloat and memory consumption.

Named pipes offer a solution. Because the process only writes data to the pipe if the remote machine reads the data at the same time, the process does not require any additional memory for the pipe – the tail command always reads the data from the pipe at the same point. To establish communications between the two computers, you need two named pipes. The first one receives the data stream from the remote computer, and the second one sends the processed data to the target system.

The output from tail -f relies on an anonymous pipe to provide the data stream. You then redirect this to an SSH session that writes to the receiving pipe on the remote machine. This setup can be implemented using the call shown in Listing 8.

Listing 8

Generating the Data Stream

$ tail -f Send-Pipe | ssh -q TARGET "cat >RECEIVE-PIPE"

The next step is to determine the format in which the data reaches the target system. A relatively simple option involves defining a function that embeds the commands to be executed in an appropriate sequence. The rcmd function in Listing 6 (line 2) does this; you just need to pass the command to rcmd that you want to have executed as an argument.

For the target system to understand what should happen to the received data, the transfer starts with the BEGIN_CMD text marker, followed by the command to be executed. When receiving a line, the target system checks what kind of string the first argument contains. If it is BEGIN_CMD, the target system redirects the second argument to a shell. This in turn redirects the result into the pipe and sends it back to the sender.

The advantage of this method is the command does not need to be processed in the program. In Listing 9, the string is written to a subshell that then executes the commands and prints the result on screen. The output response ends up on the remote computer.

Listing 9

Outputting the Results

$ echo "cd /var/tmp; ls -l | wc -l" | /bin/bash
29

If you want to run multiple commands on the target system, but not in the system's default directory, you can open a subshell for each command, which means that each command also runs in an identical environment (Listing 10). As you can see here, the target directory /root is only valid until the command sequence is processed. The second pass displays the current working directory of the parent process.

Listing 10

Environment Path

# echo "cd /root ; pwd" | /bin/bash
/root
 echo "pwd"| /bin/bash
/var/tmp

To work around this problem for communications between two systems, the target system receives both the actual command and instructions to write the current working directory to a temporary text file. The next time the command is executed, the process checks whether a temporary file exists, parses it if necessary, and sets the directory to match (Listing 11).

Listing 11

Setting the Directory

# type rcmd
rcmd is a function
rcmd ()
{
setenv;
chkpipes;
if [ $? -ne 0 ]; then
return 1;
fi;
echo "BEGIN_CMD $@ ; { pwd >/tmp/lastpwd.$lhost ;}" > $sendpipe;
return $?
}

So far so good, but you want more than the ability to execute commands on a remote machine; you also want to transfer files. To do this, there is one more hurdle to overcome. It is not possible to output a line with an arbitrary length using echo. To prevent the receiver potentially interpreting the characters you send as control characters, you also need to convert the binary data to plain vanilla ASCII.

In earlier Unix variants, the command uuencode and its counterpart uudecode existed for this. Because these commands are no longer mandatory, you should use a Python module to prevent the characters being misinterpreted.

Program Flow

The listener reads the receive pipe and initiates appropriate actions. If it receives a line containing a command to be executed, it first checks whether there is a text file containing the last working directory (Figure 4). In the next step, it writes the BEGIN_CMD_OUT string to the outbound data stream.

Figure 4: Program flow for sending files to a remote computer.

After the transfer, the process then changes to the desired directory and writes any commands to be executed to a subshell. The command's output is written to the data stream to be sent. The end of the command is marked by the string END_CMD_OUT. If the line received starts with an END_COMMUNICATION string, the process quits.

If the transfer starts with BEGIN_FILETRANSFER, the function reads arguments 2 (filename) and 3 (destination directory). Using awk, it writes to a Python module that handles the decoding of the data until it receives the END_FILETRANSFER line.

As a last resort, the function checks whether the received line contains the BEGIN_CMD_OUT string. If so, it outputs all other lines on the screen until it receives the END_CMD_OUT string. Figure 5 shows a flow chart of the listener.

Figure 5: Program flow for the listener.

Extensible

Table 1 provides a summary of all the functions in the functions script (Listing 7). Because this example only shows the possibilities that named pipes offer, the functions script does not claim to cover all possible use cases.

Table 1

Functions at a Glance

Function

Explanation

setenv

Sets all required environmental variables.

chkpipes

Checks whether the required pipes exist in /tmp.

createpipes

Generates all required pipes in /tmp.

removepipes

Deletes all pipes on the remote host.

listen

Generates a listener that reads and processes incoming files.

establish

Generates a data stream in the direction of the second computer.

killall

Terminates all required background processes.

rcmd

Runs a command on the remote host.

sendfile

Copies a file to the remote system.

Currently, the killall function in Listing 7 occasionally does not terminate all processes in the first round. If that happens, you have to call it again. Also, when starting the listener, the script in Listing 7 does not check whether another process may already be reading from the pipes, which can lead to errors. Finally, Listing 7 lacks a function that checks whether the target directory exists when copying a file.

If the local hostname is not the same as the alias of the assigned IP address, some incorrect behavior occurs because the remote side is not aware of this problem. There are more options for extending the functions. For example, it would be conceivable to establish communications between any number of computers. To do so, you need to communicate the sender's ID to the remote machine so the remote machine writes the output to the correct pipes in each case.

It would also make sense to add a compression tool for faster transmission. The target side would then need to decompress the data stream. Another interesting possibility would be to transmit the return value of the executed command to the sender.

Note that by default, the process expects the read.awk file (Listing 12) to be in the /var/tmp/ directory. If it is in some other location, you need to adjust the awkfile variable in Listing 7.

Listing 12

read.awk

{
 if ($0 != "END_COMMUNICATION") {
  print $0 >pipe
  fflush(pipe)
 }
 else {
  close (pipe)
  exit 0
 }
}

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Optimizing Shell Scripts

    Shell scripts are often written for simplicity rather than efficiency. A closer look at the processes can lead to easy optimization.

  • Parallel Bash

    You don't need a heavy numeric mystery to benefit from the wonders of parallel processing. This article describes some simple techniques for parallelizing everyday bash scripts.

  • Command Line: Data Flow

    Working in the shell has many benefits. Pipelines, redirectors, and chains of commands give users almost infinite options.

  • Binary Data in Bash

    Bash is known for admin utilities and text manipulation tools, but the venerable command shell included with most Linux systems also has some powerful commands for manipulating binary data.

  • Apache StreamPipes

    You don't need to be a stream processing expert to create useful custom solutions with Apache StreamPipes. We'll use StreamPipes to build a simple app that calculates when the International Space Station will fly overhead.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News