Manipulating Binary Data with Bash

Bit Trip

© Lead Image © Yewkeo, 123RF.com

© Lead Image © Yewkeo, 123RF.com

Author(s):

Bash is known for admin utilities and text manipulation tools, but the venerable command shell included with most Linux systems also has some powerful commands for manipulating binary data.

One of the most versatile scripting environments available on Linux is the Bash shell. The core functionality of Bash includes many mechanisms for tasks such as string processing, mathematical computation, data I/O, and process management. When you couple Bash with the countless command-line utilities available for everything from image processing to virtual machine (VM) management, you have a very powerful scripting platform.

One thing that Bash is not generally known for is its ability to process data at the bit level; however, the Bash shell contains several powerful commands that allow you to manipulate and edit binary data. This article describes some of these binary commands and shows them at work in some practical situations.

Viewing and Converting Data

Two tools you can use to represent data in hexadecimal and binary format are hexdump and xxd. The hexdump utility provides many options for outputting hexadecimal data. Although xxd lacks some of the options available with hexdump, it does have one key feature hexdump lacks: In addition to letting you output hexadecimal values, xxd also has the ability to convert a hexadecimal string into binary data. The following command:

command> echo -n "hello" | xxd -p
output> 68656c6c6f

outputs the binary values of the ASCII string "hello" as an ASCII string of hexadecimal values (refer to Table 1).

Table 1

ASCII Lowercase Alphabet

Char

Decimal

Hexadecimal

a

97

61

b

98

62

c

99

63

d

100

64

e

101

65

f

102

66

g

103

67

h

104

68

i

105

69

j

106

6A

k

107

6B

l

108

6C

m

109

6D

n

110

6E

o

111

6F

p

112

70

q

113

71

r

114

72

s

115

73

t

116

74

u

117

75

v

118

76

w

119

77

x

120

78

y

121

79

z

122

7A

In the following command:

command> echo -n "6162" | xxd -p -r
output> ab

The hexadecimal string "6162" is converted into what the xxd man page calls a "mail-safe ASCII representation" of the binary data. Because hexadecimal 61 and 62 correspond to ASCII characters "a" and "b," respectively, the binary data is expressed in the form of the string "ab."

Bash makes it easy to build these commands into functions. The following functions, pack and unpack (named because of their similarity to the PHP pack function) use the preceding commands to convert a hexadecimal string to binary and convert binary to hex.

pack() {
  echo -n "$1" | xxd -p -r
}
unpack() {
  echo -n "$1" | xxd -p
}

All the examples so far have used the -p switch to specify that the command will use plain hexdump style, representing each 8-bit byte as a two-digit hexadecimal number. To output data as a binary string (containing 1s and 0s), you need to use different switches. The following function returns a binary string representation of binary data, along with a sample command and output:

tobin() {
  echo -n "$1" | xxd -b -g0 | awk '{ printf("%s",$2) }'
}
command> tobin ab
output> 0110000101100010

The two-character string "ab" is converted into a binary string containing two 8-bit values: 01100001 and 01100010 (97 and 98), corresponding to the base 10 values for characters "a" and "b."

Transforming Binary Data

Bash also offers some commands for transforming binary data using mathematical and logical operations. This process is call arithmetic expansion. In the following example:

command> echo $(( 4*5 ))
output> 20

the output of the echo command is the value calculated by the statement enclosed in $(( and )).

Another tool for performing mathematical operations is bc. bc is a command-line calculator that can read statements from standard input. For example:

command> echo "4*5" | bc
output> 20

Note that the preceding operation is performed in base 10. bc does not provide native functionality for arithematic operations on hexadecimal numbers, so you need to convert the numbers to base 10 and then perform the operation. For instance, if you want to perform simple addition and subtraction operations against hexadecimal numbers, use a function like the following:

hexadd() {
  echo "obase=16;ibase=A;$((16#$1))+$2" | bc
}
command> hexadd A 2
output> C

The hexadd function provides the ability to add or subtract (add a negative number) to a valid hexadecimal number in a single command. The function pipes a string of commands into bc. The first command, obase=16, sets the base in which data will be output (in this case, hexadecimal or base 16). The second command, ibase=A, sets the base used to read input data. The input base is set up A, which corresponds to base 10. The third and final command is an addition statement comprised of an arithmetic expansion and the second function argument. The arithmetic expansion uses the # operator to convert the number following it from the base specified before the # to base 10. In the command, the hexadecimal value for "A" is converted to decimal 10 using arithmetic expansion, added to 2 using bc, and then converted back into hexadecimal using bc.

You can also perform bitwise operations on the data, including AND, OR, XOR, and shift. The AND operation (using the & operator) returns the bits shared between the two numbers. The OR operation (using the | operator) returns all bits used by either of the two numbers. The XOR operation (using the ^ operator) returns all bits that are unique to one of the numbers.

In Listing 1, the first command performs a binary AND on 3 (or 0011 in binary) and 6 (or 0110 in binary). Because the only shared bit is the second bit, 2 (or 0010 in binary), the output for the command is 2. The second command performs an OR operation against the same two numbers: 3 and 6. Between the two numbers, the lower three bits are used, so 7 (or 0111 in binary) is returned. In the third example an XOR is performed against the same two numbers. The second and fourth bits are the same between the two numbers, however, the third and first bit are different, therefore 5 (or 0101 in binary), is returned.

Listing 1

Bitwise Operations

01 logicand() {
02   echo $(($1&$2))
03 }
04
05 logicor() {
06   echo $(($1|$2))
07 }
08
09 logicxor() {
10   echo $(($1^$2))
11 }
12
13 command> logicand 3 6
14 output> 2
15 command> logicor 3 6
16 output> 7
17 command> logicxor 3 6
18 output> 5

The other bitwise operations is the shift operation, which shifts the bits in one direction or another (right or left). Listing 2 shows functions for shifting the bits shifting right (using the >> operator) and left (using the << operator).

Listing 2

Shifting Bits

01 shiftr() {
02   echo $(($1>>$2))
03 }
04
05 shiftl() {
06   echo $(($1<<$2))
07 }
08
09 command> shiftl 4 2
10 output> 16
11 command> shiftr 16 4
12 output> 1

The first command in Listing 2 shifts the bits in 4 (0100) left by two spaces, returning a value of 16 (10000). The second command shifts the bits in 16 (10000) to the right four spaces, returning a value of 1 (0001). As you might have noticed, for each bit space shifted to the left, the number is multiplied by 2, and for each bit space shifted to the right, the number is divided by 2.

Examples

Switching from binary to hexadecimal, and moving bits around to change an A into a C, is certainly interesting, but does this capability have any uses in the real world? The following examples offer some hints for how you could use these tools in practical ways.

ASCII to Unicode

In some situations, you might need to convert an ASCII character into Unicode. ASCII is an 8-bit character set, whereas as Unicode starts at a 16-bit length. Converting from ASCII to Unicode might seem complicated, but it is actually quite simple thanks to the backward compatibility built into the Unicode standard. To convert ASCII to Unicode, you just need to prepend the value of 0 onto each ASCII character, thus making it a 16-bit character (see Listing 3).

Listing 3

ASCII to Unicode

01 ascii2unicode() {
02   echo "$1" | sed 's/\(.\)/\1\n/g' | awk '/^.$/{ printf("%c%c",0,$0) }'
03 }
04
05 command> ascii2unicode jello
06 output> jello

In Listing 3, the output of the command appears to show no noticeable change. To get a better view of the binary data behind this text, pipe the output into xxd:

command> ascii2unicode jello | xxd
output> 0000000: 006a 0065 006c 006c 006f      .j.e.l.l.o

As you can see, the ASCII values have been prepended with "00," which converts them to 16-bit Unicode characters. Take a closer look at Listing 3 to see what's happening: The output of the echo statement is piped into the sed statement, which places each character of the output on a separate line. The awk command reads the input from the sed command line-by-line, and when the line contains a single character, it prints the character prepended by the character value "0".

URL Encoding and Decoding

Hexadecimal data is something you see every day, but it often goes unnoticed. When data is passed as a query string in a URL, it may be encoded using special formatting. This formatting consists of a percent sign followed by the hexadecimal value of an ASCII character. For example, the URL encoded string of "%61%62%63," when decoded, becomes "abc." Listing 4 shows a function for performing URL encoding and decoding.

Listing 4

URL Encoding and Decoding

01 urlencode() {
02   echo -n "$1" | xxd -p | tr -d '\n' | sed 's/\(..\)/%\1/g'
03 }
04
05 urldecode() {
06   tr -d '%' <<< "$1" | xxd -r -p
07 }
08
09 command> urlencode name
10 output> %6e%61%6d%65
11 command> urldecode %64%6f%6e%65
12 output> done

The function in Listing 4 uses the standard functionality of xxd. When encoding a string, the output of xxd is split into 1-byte chunks and prepended with a "%" by the sed command. When decoding, all percent signs are stripped and the output is piped into xxd to revert the hexadecimal string to ASCII.

Calculating IP Subnets

On an IP network, the subnet mask specifies how many bits of the IP address will be dedicated to the network ID and how many will be used for the host ID. The size of the host ID address space will tell you how many host IP addresses are available. Listing 5 shows how to convert the subnet mask to a binary string and determine the host ID count.

Listing 5

Converting a Subnet Mask

01 subnetcalc() {
02   echo -n "$1" | \
03   awk 'BEGIN { FS="." ; printf("obase=2;ibase=A;") } { printf("%s;%s;%s;%s;\n",$1,$2,$3,$4) }' | \
04   bc | sed 's/^0$/00000000/g;s/\(.\)/\1\n/g' | \
05   awk 'BEGIN { ht = 0; nt = 0; }
06        /[01]/ { if ($0=="1") nt++; if ($0=="0") ht++; }
07        END { printf("Network bits: %s\nHost bits: %s\nHost IP Count: %d\n",nt,ht,2^ht); }'
08 }
command> subnetcalc 255.255.192.0
output> Network bits: 18
output> Host bits: 14
output> Host IP Count: 16384

The output of the echo statement is fed into the awk statement. This first awk command will generate the statement that is piped into the following bc command. The statement will include ibase, obase, and each individual octet of the subnet mask. Once bc evaluates the statements, it returns four lines: one line for each octet. The following sed statement finds lines containing only "0" and extends them to 8-bits of zeros. The sed statement also puts each bit on a line by itself. This will be necessary to properly evaluate the host bit length. The awk statement has three sections. The first section initializes the ht and nt variables, which store the host total bits and network total bits, respectively. The next section searches for lines containing 0 or 1. If the value is 1, the network total is incremented, and if the value is 0, the host total is incremented. The final section of the awk statement prints the summary data for the network, including the host and network bit counts, along with the host IP count.

Conclusion

The versatile Bash shell offers several utilities that allow you to output, transform, and manipulate binary values. This article introduced you to the xxd and bc commands. You also learned how to add these binary commands to your custom Bash functions to build your own tools for performing practical tasks like decoding URLs and calculating subnet masks.