Manipulating Binary Data with Bash

ASCII to Unicode

In some situations, you might need to convert an ASCII character into Unicode. ASCII is an 8-bit character set, whereas as Unicode starts at a 16-bit length. Converting from ASCII to Unicode might seem complicated, but it is actually quite simple thanks to the backward compatibility built into the Unicode standard. To convert ASCII to Unicode, you just need to prepend the value of 0 onto each ASCII character, thus making it a 16-bit character (see Listing 3).

Listing 3

ASCII to Unicode

01 ascii2unicode() {
02   echo "$1" | sed 's/\(.\)/\1\n/g' | awk '/^.$/{ printf("%c%c",0,$0) }'
03 }
04
05 command> ascii2unicode jello
06 output> jello

In Listing 3, the output of the command appears to show no noticeable change. To get a better view of the binary data behind this text, pipe the output into xxd:

command> ascii2unicode jello | xxd
output> 0000000: 006a 0065 006c 006c 006f      .j.e.l.l.o

As you can see, the ASCII values have been prepended with "00," which converts them to 16-bit Unicode characters. Take a closer look at Listing 3 to see what's happening: The output of the echo statement is piped into the sed statement, which places each character of the output on a separate line. The awk command reads the input from the sed command line-by-line, and when the line contains a single character, it prints the character prepended by the character value "0".

URL Encoding and Decoding

Hexadecimal data is something you see every day, but it often goes unnoticed. When data is passed as a query string in a URL, it may be encoded using special formatting. This formatting consists of a percent sign followed by the hexadecimal value of an ASCII character. For example, the URL encoded string of "%61%62%63," when decoded, becomes "abc." Listing 4 shows a function for performing URL encoding and decoding.

Listing 4

URL Encoding and Decoding

01 urlencode() {
02   echo -n "$1" | xxd -p | tr -d '\n' | sed 's/\(..\)/%\1/g'
03 }
04
05 urldecode() {
06   tr -d '%' <<< "$1" | xxd -r -p
07 }
08
09 command> urlencode name
10 output> %6e%61%6d%65
11 command> urldecode %64%6f%6e%65
12 output> done

The function in Listing 4 uses the standard functionality of xxd. When encoding a string, the output of xxd is split into 1-byte chunks and prepended with a "%" by the sed command. When decoding, all percent signs are stripped and the output is piped into xxd to revert the hexadecimal string to ASCII.

Calculating IP Subnets

On an IP network, the subnet mask specifies how many bits of the IP address will be dedicated to the network ID and how many will be used for the host ID. The size of the host ID address space will tell you how many host IP addresses are available. Listing 5 shows how to convert the subnet mask to a binary string and determine the host ID count.

Listing 5

Converting a Subnet Mask

01 subnetcalc() {
02   echo -n "$1" | \
03   awk 'BEGIN { FS="." ; printf("obase=2;ibase=A;") } { printf("%s;%s;%s;%s;\n",$1,$2,$3,$4) }' | \
04   bc | sed 's/^0$/00000000/g;s/\(.\)/\1\n/g' | \
05   awk 'BEGIN { ht = 0; nt = 0; }
06        /[01]/ { if ($0=="1") nt++; if ($0=="0") ht++; }
07        END { printf("Network bits: %s\nHost bits: %s\nHost IP Count: %d\n",nt,ht,2^ht); }'
08 }
command> subnetcalc 255.255.192.0
output> Network bits: 18
output> Host bits: 14
output> Host IP Count: 16384

The output of the echo statement is fed into the awk statement. This first awk command will generate the statement that is piped into the following bc command. The statement will include ibase, obase, and each individual octet of the subnet mask. Once bc evaluates the statements, it returns four lines: one line for each octet. The following sed statement finds lines containing only "0" and extends them to 8-bits of zeros. The sed statement also puts each bit on a line by itself. This will be necessary to properly evaluate the host bit length. The awk statement has three sections. The first section initializes the ht and nt variables, which store the host total bits and network total bits, respectively. The next section searches for lines containing 0 or 1. If the value is 1, the network total is incremented, and if the value is 0, the host total is incremented. The final section of the awk statement prints the summary data for the network, including the host and network bit counts, along with the host IP count.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Bash 4

    Despite the Bourne-again shell's biblical age and high level of maturity, developers continue to work on it. We take a look at the latest Bash release.

  • Bashish

    Bashish adds a dash of style to the command line.

  • sysdig

    Many Linux diagnostic tools require knowledge of a special syntax, which complicates handling and confuses the output. Sysdig groups several important tools into a single interface.

  • FOSSPicks

    This month Graham fires up MuseScore 3.0, hexyl, weborf, Chrono, and several other useful Linux tools.

  • How Does ls Work?

    A simple Linux utility program such as ls might look simple, but many steps happen behind the scenes from the time you type "ls" to the time you see the directory listing. In this article, we look at these behind-the-scene details.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News