Programming in Go

All Systems Go

© Lead Image © nomadsoul1, 123RF.com

© Lead Image © nomadsoul1, 123RF.com

Author(s):

The Go programming language combines type safety with manageable syntax and an extensive library. We take you through a programming example.

The Go programming language recently celebrated its fourth birthday and increasing popularity. The language has very few limits when it comes to coding Unix daemons, networking code, parallelized programs, and the like, although it is probably less well-suited for an operating system kernel. The Docker container virtualization project and Ubuntu Juju tool are two examples of projects written in Go.

Designed as an heir to the C programming language [1], Go offers many of its predecessor's strengths, while simplifying syntax and supporting secure programming (e.g., through strong type casting). The Unsafe module makes Go resemble C more closely, although it does compromise security, as the module name suggests.

The standard Go library [2] is extensive, offering many useful system programming modules for data compression, cryptography, binary file formats (ELF, Mach-O), and so forth. In this article, I will create a simple tool written in Go that has a function similar to the ps process status tool in Linux.

Go Projects

Go projects have a somewhat peculiar directory structure. To compile source code files, you must follow a certain procedure, so that the go build and project tool work properly.

The GOPATH environment variable specifies the directory in which all Go projects reside. Below that are dist, bin, and src, the latter of which contains the source code under a directory that uniquely identifies a package or project. In principle, this can be any string but is typically your domain name, followed by a project name (e.g., linux-magazine.com/<project>; Figure 1). The go tool also loads projects off the Internet (e.g., from GitHub), which then end up in $GOPATH/src/github.com/<project>.

Figure 1: The structure of Go projects: a top-level directory containing directories for binaries and source code projects.

In my example, the projects reside in $HOME/gocode. The environment variable is set like this:

export GOPATH=$HOME/gocode

The following call sets the project directory for the tool I will be programming, lap (i.e., list all processes):

mkdir -p src/linux-magazine.com/lap

If you now store a small "Hello World" file here [1], you can compile the project as follows:

go build linux-magazine.com/lap

If you take a look at the GOPATH/bin directory, you will discover no files there. The go tool only copies the binary to this location if you use the install command; it thus makes sense to do this straight away, instead of detouring via build. Program libraries end up in GOPATH/pkg. The object files can be removed by issuing go clean package; with an additional switch, go clean-f also removes the binaries.

The idea behind lap is quite simple. In the /proc filesystem on Linux, a globally visible directory for each running process uses the process ID as the name. Below this are a number of virtual files with information about this process, including the stat and status files, which contain, for example, the parent process ID, the owner, the start time, and so on.

Processing Files

The first task is therefore to list the files in /proc and filter out the ones with the process information. It can be done quite easily with the filepath.Glob function, which returns all the file names that match a certain pattern in an array. The following approach takes a little detour to demonstrate a few aspects of loops and string processing in Go:

entries, err := ioutil.ReadDir(procDir)
for index, proc := range entries {
    // do something with proc
}

The directory contents are returned by a call to the ReadDir function from the ioutil package. To go through all the entries, use the for loop with a range expression that lets you iterate over arrays, slices, strings, maps, and channels. With only one control variable, range assigns this to the array index. Go automatically assigns a second variable to the content of the corresponding array element.

The Go compiler only accepts this if you actually do something with the index variable. If you don't, you can use the Go wildcard for variables, _. A proc in this example is of the os.FileInfo type; it implements an interface that includes the call to Name() for reading the file name.

Unicode Support

Finding out whether the first character is a number is a little more difficult because Go uses UTF-8 for character encoding – which is actually a good thing. Because UTF-8 is a format that uses between 1 and 4 bytes for a character, it is not clear from the outset how many bytes the first character includes. Typically, Go programmers work their way through a byte array bit by bit and then make decisions on when a new UTF-8 character starts. With a type cast, Go does this automatically in one fell swoop. This results in an array of "runes" – as single UTF-8 characters are called in Go – of which you read the first character with an array index. This in turn can be done with a simple call to the unicode.IsDigit() function, which expects a rune as an argument:

if unicode.IsDigit([]rune(proc.Name())[0]) {...

Incidentally, the small example program that can be seen in Figure 2 proves that this is not just true of Arabic numerals in the Unicode world but also works for any other numeric systems.

Figure 2: The Unicode IsDigit method determines whether a character is a number. This even works with foreign number systems.

The next task is to open the virtual files in procfs and process their content. Conveniently, the ioutil library again offers a function (ReadFile) that fully reads a file and stores the contents in a byte array:

stat, err := ioutil.ReadFile(filename)

As you can see, the function returns two values: the content of the file and an error code. This is typical of Go functions and certainly more structured than, for example, returning a null pointer for the content in the case of an error. If you assign the error code to a variable, you also need to process the variable; otherwise, the compiler will in turn generate an error message and refuse to compile the file. If you want to ignore the return value, you can again assign the error code to the wildcard, _. To process the fault, you would typically test whether the err variable is equal to nil. If so, no error has occurred.

The /proc/PID/stat file contains only one line in a fixed format, with the individual fields separated by spaces. Unfortunately, the format is not documented anywhere in a useful way, not even in the Linux kernel documentation of procfs [2].

Ultimately, your only resource, if you want to know exactly what is going on, is to take a look at the Linux source code. The PID comes first no matter what (and this is the same as the directory name), then in brackets you have the process name, the process status, and the parent PID.

Regular Expressions

The fields can be processed with the fmt.Scanf function, which works in the same way as the corresponding function in C: Format strings specify the format of the read row and the data type. Alternatively – but possibly a bit more slowly – you can do the same thing with regular expressions. To this end, Go offers the regexp package [3], which implements the RE2 regular expression syntax [4].

The functions provided by the module are a bit confusing but do follow a system. First, you'll see two variants of each function: one that processes a byte array and one with the word String in the name, which deals with strings. Then, you'll see functions that return the string found once only, or more than once (All). If you use multiple search strings in a regular expression and assign variables to the strings you find, you will want to use a function with Submatch in its name.

In this example, you need the FindStringSubmatch function because the regex pattern is designed so that it can match only once anyway, while storing all the finds in one array. The byte array from ReadFile is converted to a string by typecasting. If you were to use the byte array variants of the regular expression functions, you would then continue to work with byte arrays or convert them.

Several functions are available to compile the search pattern: Compile, CompilePOSIX, MustCompile, and MustCompilePOSIX. Why this is so is not completely apparent, because the same thing could have been implemented with a single function that processes the corresponding parameters. To prevent the regular expression being constantly recompiled, you can define it as a top-level variable. The corresponding code is shown in Listing 1.

Listing 1

Regular Expressions

 

Line 14 of the code creates a new ProcData object and assigns the first two submatches to its attributes pid and name. In line 9, the regular expression is contained in backticks (`) because a backslash (\), is regarded as an escape character in strings in single or double quotes. You would otherwise need to type another backslash in front of each one you really needed. In the backtick environment, you can save yourself the trouble.

The procedure for the /proc/PID/status file, which you also need to read, is similar because the user ID of the process does not appear in the stat file. However, the status file is a little more difficult to parse because it contains many individual lines. You could process the file line by line or use a regular expression in multiline mode enabled by the m switch at the start of the regex. This alone, however, was not quite enough; I also needed the s switch, which ensures that the dot meta-variable (.) in the regular expression also includes the newline character (\n). The regex for reading the user and group ID from the status file, thus looks like this:

(?sm)^Uid:\t(\d+).*^Gid:\t(\d+)

The UID is now assigned to a field in the ProcData object.

The remaining task is to determine the UID for the user name – for example, by parsing the passwd file. However, Go can save you this work because it has the user.LookupId function for precisely this task. As a result, it returns a user object containing the user name, among other things. All told, the procedure looks like this:

user, _ := user.LookupId(procData.uid)
procData.user = user.Username

One option of the lap tool is to determine whether to output the name or UID. For easy implementation of command-line options, Go includes the flag package. Like everything else in Go, flags are typed and are available in various flavors (e.g., integer, string, and Boolean). A Boolean switch is defined by the line:

var realname = flag.Bool("r", false, "show real user name")
flag.Parse()

The first parameter specifies the name of the switch, followed by the default configuration and the explanatory text. Later in the program, you can query the variable using if(realname).

When called with -r, lap then returns the user name; otherwise, it returns the UID (Figure 3). The main features of the program are shown in Listing 2; the complete listing is available online [5].

Listing 2

lap.go

 

Figure 3: The result of all the hard work: The lap process tool displaying running processes and user names.

Missing Bits

With relatively little effort, a small Go tool has been developed to identify the processes on a Linux system and display them at the command line. However, you could improve this by adding a cache for username lookup, for example. The run-time information of the processes (i.e., the start times, etc.) is also missing and could be added to the tool.

Infos

  1. "Go Programming Language" by Oliver Frommel, ADMIN, issue 11, 2013, pg. 90
  2. The /proc filesystem: https://www.kernel.org/doc/Documentation/filesystems/proc.txt
  3. Go regexp package: http://golang.org/pkg/regexp/
  4. RE2 syntax: https://code.google.com/p/re2/wiki/Syntax
  5. Listings: ftp://ftp.linux-magazin.com/pub/listings/magazine/162