Search for processes by start time

Ghost Hunter

© Lead Image © batareykin, 123RF.com

© Lead Image © batareykin, 123RF.com

Author(s): , Author(s):

How do you find a process running on a Linux system by start time? The question sounds trivial, but the answer is trickier than it first appears.

As the maintainer of a computing cluster [1], Frank also provides his users with commercial software for calculations based on the fair-use principle. A limited number of license keys are available for this software (e.g., 10 keys for the MATLAB [2] simulation software).

Some of these calculations can take up to a week. When a calculation is finished and the process terminates, the license key is automatically returned to the pool of free keys and can be grabbed by another user. However, if users forget to end their processes, no more keys can be handed out, as these have all been allocated. To prevent this, the admins want to automatically search for processes that are older than 10 days. If they find a process matching this criteria, they can check with the users to clarify what should happen to the process.

The Linux kernel manages processes and makes information relating to them available to the user in the /proc filesystem. At the command line, ps is the reliable interface to process management. Unfortunately, ps has dozens of options, and its output is often not very clear either. This can be remedied with a little shell code or possibly a scripting language. This article compares several potential solutions using Bash, Python and Perl scripts, and the Go programming language.

Our goal is to find a solution that detects processes that are still running and were launched at least 10 days ago and then output the results in a list that is sorted in descending chronological order. The output will also include the user's login name or user ID, the PID, the executed program, and the time when the respective process began. If possible, we want to use only on-board tools. For the solutions based on Bash, you will need the ancient procps 3.3.0 release or newer (earlier versions lack some of the features used here).

Bash Variant 1

The first obvious solution is based on the ps command in combination with awk, date, sed, and sort. ps supports an optional output field lstart, which outputs a process's start time (and date) in a uniform, long format. Additionally, the option -h must be used to completely suppress the headers in the ps output.

While finding and implementing the solution (Listing 1) was quick, parsing ps's output is not trivial, which makes the script relatively unreadable as well as quite long. We encountered the following problems with this solution:

  • You have to set the LC_TIME environment variable to make sure that localized month names do not suddenly appear (env LC_TIME=C).
  • The day of the month has additional spaces before the single-digit numbers. To sort, you have to replace them with a zero using the sed parameter (lines 5 and 6, Listing 1).
  • The start date contains the months in letters instead of numbers; you have to convert them to digits. This can be done with sed as shown in lines 7 through 18.
  • The order of the date components is not suitable for sorting (first month, then day, then time, and finally the year). awk changes the order of these four components.
  • The same applies to filtering from a certain date, since awk can also compare strings with <.
  • The script uses date to generate the appropriate comparison date right at the outset, especially since it can also calculate data with relative specifications. The specification "10 days ago from now" is returned by calling:
date -d 'now -10 days'

date can format the output very flexibly.

  • If you do not specify any parameters when calling the script, it shows all processes older than 10 days.
  • All numeric fields must be explicitly specified in sort; otherwise sort will only consider the first field as numeric.

Listing 1

First Bash Attempt

01 #!/bin/sh
02 if [ -n "$1" ]; then limit=$1; else limit=10; fi
03 date="$(date '+%Y %m %d %T' -d "now -$limit days")"
04 env LC_TIME=C ps -eaxho pid,lstart,user,cmd | \
05   sed -e 's/^ *//;
06           s/  \([1-9]\) / 0\1 /;
07           s/Jan/01/;
08           s/Feb/02/;
09           s/Mar/03/;
10           s/Apr/04/;
11           s/May/05/;
12           s/Jun/06/;
13           s/Jul/07/;
14           s/Aug/08/;
15           s/Sep/09/;
16           s/Oct/10/;
17           s/Nov/11/;
18           s/Dec/12/' | \
19   awk '$6" "$3" "$4" "$5" "$1 < "'"$date"'" {print $6" "$3" "$4" "$5" "$1" "$7" "$8}' | \
20   sort -n -k1 -k2 -k3 -k4 -k5

The output from Listing 1 without the sort parameter with -k looks like Listing 2 on a computer that was last booted on April 3, 2020.

Listing 2

Output from

$ ./list-processes1.sh | head
2020 04 03 22:32:34 1 root init
2020 04 03 22:32:34 10 root [ksoftirqd/0]
2020 04 03 22:32:34 104 root [kintegrityd]
2020 04 03 22:32:34 105 root [kblockd]
2020 04 03 22:32:34 106 root [blkcg_punt_bio]
2020 04 03 22:32:34 11 root [rcu_sched]
2020 04 03 22:32:34 12 root [migration/0]
2020 04 03 22:32:34 13 root [cpuhp/0]
2020 04 03 22:32:34 14 root [cpuhp/1]
2020 04 03 22:32:34 15 root [migration/1]

In Listing 2, you can immediately see that the sequence of the processes cannot be correct. This is because the time stamps in the field lstart are only accurate to the second, not to the micro- or nanosecond. Sorting the output by process numbers at the very end solves this problem for the most part. You have to specify all fields up to and including the process number in the sort call, as shown in Listing 2. The output now looks like Listing 3.

Listing 3

Sorted Output

$ ./list-processes1.sh | head
2020 04 03 22:32:34 1 root init
2020 04 03 22:32:34 2 root [kthreadd]
2020 04 03 22:32:34 3 root [rcu_gp]
2020 04 03 22:32:34 4 root [rcu_par_gp]
2020 04 03 22:32:34 6 root [kworker/0:0H-kblockd]
2020 04 03 22:32:34 9 root [mm_percpu_wq]
2020 04 03 22:32:34 10 root [ksoftirqd/0]
2020 04 03 22:32:34 11 root [rcu_sched]
2020 04 03 22:32:34 12 root [migration/0]
2020 04 03 22:32:34 13 root [cpuhp/0]

Now the script only fails if so many processes are started within a single second that the process numbers are reassigned starting from the beginning. For a long time, the limit for this was 65,535 processes, but now Linux systems can also cope with larger process IDs (PIDs).

Bash Variant 2

An in-depth study of the ps man page reveals other fields that are useful for the task at hand, such as the etimes output field. etimes tells you the number of seconds since the process was started, reducing the complexity considerably because you no longer have to parse month names or re-sort fields. This shrinks the command so it can be written in one line. Listing 4 returns all processes that are more than two days old.

Listing 4

Compact Bash Variant

$ ps -eaxho etimes,pid,user,cmd | sort -nr | awk '$1 > 2*24*60*60 {print}' | head
 227081  106 root  [blkcg_punt_bio]
 227081  105 root  [kblockd]
 227081  104 root  [kintegrityd]
 227081   57 root  [khugepaged]
 227081   56 root  [ksmd]
 227081   55 root  [kcompactd0]
 227081   54 root  [writeback]
 227081   53 root  [oom_reaper]
 227081   52 root  [khungtaskd]
 227081   51 root  [kauditd]

However, this variant also works with an accuracy of one second. Since the code sorts backwards, this is even more noticeable, because the PID 1 does not appear at the beginning of the list. This can be patched up by reading the sort command options such that if the process age is identical, the PID is used as the sort criterion in ascending order. This is ensured by the parameter specification k1nr,2n (Listing 5).

Listing 5

Improved Compact Bash Variant

$ ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > 2*24*60*60 {print}' | head
 226597   1 root  init [2]
 226597   2 root  [kthreadd]
 226597   3 root  [rcu_gp]
 226597   4 root  [rcu_par_gp]
 226597   6 root  [kworker/0:0H-kblockd]
 226597   9 root  [mm_percpu_wq]
 226597  10 root  [ksoftirqd/0]
 226597  11 root  [rcu_sched]
 226597  12 root  [migration/0]
 226597  13 root  [cpuhp/0]

The previous call contains the calculation of seconds by awk in detailed form: 2*24*60*60 corresponds to two times 24 hours of 60 minutes each with 60 seconds each. Instead, the value can also be written directly as 172800.

The value 86400 is useful for the number of seconds per day when parameterizing the script. Listing 6 expects a parameter for the number of days. You then multiply the passed numerical value by 86,400.

Listing 6

Number of Days as a Parameter

01 #!/bin/sh
02 if [ -n "$1" ]; then
03   limit=$1;
04 else
05   limit=10;
06 fi
07 ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > '"$limit"'*86400 {print}'

If you do not enter a numeric value as a call parameter, the script uses a value of 10 as the default case (10 days).

Bash Variant 3

The fact that split seconds were missing induced us to make a third attempt. Instead of the ps command, entries from the /proc filesystem are used as the basis here.

The required specification is found in field number 22 (starttime) of the /proc/<pid>/stat file. It tells you the number of clock ticks after the Linux kernel started up at the time a process is launched. Specifying the clock ticks is tricky; it is based on the assumption of a clock speed of 100Hz (i.e., 100 ticks per second [3]):

$ getconf CLK_TCK
100

Not all distributions adhere to this: Some use 250 or 1000Hz internally instead. However, they always outwardly report 100Hz. We could not clarify why this is the case. On Debian GNU/Linux, the two values are identical: 100Hz.

Like the previous shell scripts, the one in Listing 7 first reads a parameter again and, if no time span was specified, assumes 10 days as the default. Then awk reads out two fields: 1 and 22 (the PID and number of clock ticks) in two calls. The first one determines the values for awk's own process (whose PID in a shell typically resides in $$); the second one determines the current time in clock ticks since the computer booted.

Listing 7

Bash Script with Clock Ticks

01 #!/bin/sh
02 if [ -n "$1" ]; then
03   limit=$1;
04 else
05   limit=10;
06 fi
07 now=$(awk '{print $22}' /proc/$$/stat)
08 awk '$22 < '$now'-(100*86400*'$limit') {printf "Sec. since boot: %.2f - PID: %i\n", $22/100, $1}' /proc/[1-9]*/stat | sort -n -k4 -k7

Then awk reads the stat files of all running processes; this is done by specifying:

/proc/[1-9]*/stat

The number of clock ticks per second (100) and seconds per day (86,400) are hardwired values here for simplicity's sake.

Since we wanted the output as a floating-point number to look nice, the output is restricted to just two decimal places using printf – the clock ticks are no more accurate than this anyway. sort then numerically sorts the two relevant fields as columns. The first numeric column lists the number of clock ticks, while the second lists the user ID.

The solution comes quite close to our objective, but cannot display the usernames for the processes. In addition, some processes that were definitely started long after the system booted (for example, the Tor Browser) unexpectedly appear as if they were started zero seconds after the system booted. The init process, on the other hand, did not start until 468 clock ticks or 4.68 seconds after startup. In the test case, this was probably because the hard disk encryption password had to be entered first.

Removing awk from the code and specifying the matching fields 22 and 1 directly as parameters of the sort command makes everything a bit easier. Unfortunately, the result is unreadable output with a huge volume of data.

Annoyingly, the time data is still too imprecise to do without a final sort by PID. In theory, the data should be more precise than in the previous versions, because clock ticks provide more precise information than whole seconds. However, the problem of inaccuracy in case of a PID overflow obviously still exists. All in all, the variants with ps seem to be the better approach.

Python

We used the psutil [4] library for an attempt with Python. psutil provides a large number of functions and delivers bundles of information about processes (e.g., a process's PID, run time, owner, and memory requirements). As was revealed when we read the library's source code, psutil also ultimately accesses information from the /proc filesystem.

The script in Listing 8 includes two functions. The first function, getListOfProcesses(), scans the process list and returns a list of the individual processes. Each list entry contains four data fields: PID, program or call name, time of creation, and username. The second function, calculateTimestamp(), calculates the time, which serves as a limit filter to filter out irrelevant processes later.

Listing 8

Python Variant

01 import psutil
02 import datetime
03
04 # Define global variables
05 listOfProcessNames = []
06
07 def getListOfProcesses(createTime=10):
08   # Deliver list of running processes as dictionary
09   # PID, program name, creation time and process owner
10
11   # Define upper limit of interval
12   intervalTime = calculateTimestamp(createTime)
13
14   for proc in psutil.process_iter():
15     pInfoDict = proc.as_dict(attrs=['pid', 'name', 'create_time', 'username'])
16     # Create time values from
17     currentCreateTime = pInfoDict["create_time"]
18
19     # Is process outside of time interval?
20     if currentCreateTime < intervalTime:
21       listOfProcessNames.append(pInfoDict)
22   return
23
24 def calculateTimestamp(daysValue=10):
25   # Compute time interval (default: ten days)
26
27   # Determine current timestamp
28   currentTimestamp = datetime.datetime.now()
29
30   # Compute time interval
31   dateRange = datetime.timedelta(days=daysValue)
32   targetTimestamp = currentTimestamp - dateRange
33   unixTime = targetTimestamp.timestamp()
34
35   # Return as UNIX timestamp
36   return unixTime
37
38   getListOfProcesses()
39
40   # Sort list by create time and PID
41   listOfProcessNames = sorted(
42   listOfProcessNames,
43   key = lambda i: (i['create_time'], i['pid'])
44   )
45
46   # Process list values from
47   for currentProcess in listOfProcessNames:
48     # Extract process details
49     username = currentProcess["username"]
50     pid = currentProcess["pid"]
51     creationTime = currentProcess["create_time"]
52     creationTimeString = datetime.datetime.fromtimestamp(creationTime).strftime('%d.%m.%Y %H:%M:%S')
53     processName = currentProcess["name"]
54
55   # Output process information
56   print(
57     "User name: %s, PID: %8i, Program: %s" % (username, pid, processName),
58     ", created on",
59     creationTimeString
60   )

The main program first calls the getListOfProcesses() function and then sorts the list of processes by their creation times and PIDs. This results in the output shown in Listing 9, which contains all the processes identified with their owners, PIDs, program names, and creation times. If you want to search for all Bash processes in the results, grep can help you filter the output.

Listing 9

Python Script Output

$ python3 list-processes2.py | grep bash
User name: frank, PID:   3428, Program: bash , created on 08.03.2020 21:49:09
User name: frank, PID:  10438, Program: bash , created on 16.03.2020 21:12:18
User name: frank, PID:   5919, Program: bash , created on 25.03.2020 12:13:29

Perl

As with Python, you would not want to program everything yourself in Perl, although this would certainly be possible by browsing the /proc filesystem. Instead, you should first look at the Comprehensive Perl Archive Network (CPAN) [5], since there may already be a Perl module for accessing the process table. And, lo and behold, there is: Proc::ProcessTable [6].

In Perl, you first create an instance of Proc::ProcessTable and retrieve a reference to a data structure with the entire process table in it. You could certainly iterate through the table with loops. However, if you like functional programming (à la Lisp), you can use a Schwartzian transform [7]. This works almost like a pipe at the command line or in shell scripts, only backwards: The data source is at the end (Listing 10).

Listing 10

Perl Variant

01 #!/usr/bin/perl
02
03 # Boiler plate to avoid bugs
04 use strict;
05 use warnings;
06
07 # Use modern "say" instead of "print"
08 use 5.010;
09
10 # Minimal parameter parsing: If a number is passed as parameter
11 # output this number of processes, otherwise 10.
12 my $max = @ARGV ? $ARGV[0] : 10;
13
14 # Use the Proc::ProcessTable module
15 use Proc::ProcessTable;
16
17 # Create a disposable object and save the process table it generated
18 my $table = Proc::ProcessTable->new->table;
19
20 # Schwartzian transform of table
21 my @result =
22   # Sort the list, first by start time and then by PID
23   sort { ($a->[0] <=> $b->[0]) or ($a->[1] <=> $b->[1]) }
24   # Use only the start time, PID and UID of the process
25   map { [ $_->start, $_->pid, $_->uid ] }
26   # The array following the dereferenced scalar is the data source
27   @$table;
28
29 # Output the results by classical iteration
30 foreach my $p (@result[0..$max-1]) {
31   say sprintf('PID: %6i  |  Start: %s  |  UID: %s',
32               $p->[1], ''.localtime($p->[0]), $p->[2]);
33 }

Listing 11 shows a more compact Perl variant without comments, boiler plate, or command-line parsing (it outputs all processes, sorted) – all of this in just one Schwartzian transform.

Listing 11

Compact Perl Variant

01 #!/usr/bin/perl
02
03 use Proc::ProcessTable;
04
05 print
06   map { sprintf("PID: %6i  |  Start: %s  |  UID: %s\n",
07                 $_->[1], ''.localtime($_->[0]), $_->[2]) }
08   sort { ($a->[0] <=> $b->[0]) or ($a->[1] <=> $b->[1]) }
09   map { [ $_->start, $_->pid, $_->uid ] }
10   @{ Proc::ProcessTable->new->table };

Go

The Go programming language has recently gained in popularity among developers [8], which is why we offer an appropriate solution in Go. Our solution is based on two modules, go-ps [9] and go-sysconf [10], which provide functions for reading processes and system information. Further information from the /proc filesystem, which neither of the two modules currently support, is used.

Our Go script has about 150 lines; we have split it into several listings for clarity. The first step (Listing 12) contains the package definition and imports the required modules. The following steps, which are part of the main function, include the variable definitions and parameters (Listing 13), time frame and boot time (Listing 14), CLK_TCK (Listing 15), and routines for retrieving (Listing 16) and evaluating the process list (Listing 17).

Listing 12

Import Required Modules

01 package main
02
03 import (
04   // import standard modules
05   "bufio"
06   "fmt"
07   "io/ioutil"
08   "log"
09   "os"
10   "strconv"
11   "strings"
12   "time"
13   // import additional modules
14   ps "github.com/mitchellh/go-ps"
15   "github.com/tklauser/go-sysconf"
16 )
17
18 func main () {
19   ...
20 }

Listing 13

Variables

01 var bootTime string
02 var userId string
03
04 // Suppress date and time output in log.
05 log.SetFlags(0)
06
07 // Set default value of ten days
08 timeLimit64 := int64(10)
09
10 // Read command line parameters
11 args := os.Args[1:]
12 if len(args) > 0 {
13   // Convert string to number
14   timeLimitArg, err := strconv.ParseInt(args[0], 10, 64)
15   if err != nil {
16     log.Fatalf("Error: %v\n", err)
17   }
18   timeLimit64 = timeLimitArg
19 }
20 log.Printf("Set time limit to %d days\n", timeLimit64)

Listing 13 covers the definition of the required variables and evaluation of the command-line parameters. If nothing else is specified, the program sets the default value to 10.

With the data already determined, the code sets the time frame and consequently defines the relevant processes. It then determines the boot time: the number of seconds since January 1, 1970 (Listing 14). To evaluate time stamps correctly, the clock ticks are determined with the sysconf module as shown in Listing 15.

Listing 14

Time Frame

01 // Compute time frame
02 // Current time - days * 24h * 60min * 60s
03 timeBoundary := time.Now().Unix() - timeLimit64*24*60*60
04
05 // Determine boot time from /proc/stat in seconds since 1.1.1970
06 // available in /proc/stat in the line starting with btime
07 fileHandle, err := os.Open("/proc/stat")
08 if err != nil {
09   log.Fatalf("Error calling os.Open(): %v\n", err)
10 }
11 defer fileHandle.Close()
12
13 scanner := bufio.NewScanner(fileHandle)
14 for scanner.Scan() {
15   currentLine := scanner.Text()
16   if strings.HasPrefix(currentLine, "btime") {
17     dataFields := strings.Fields(currentLine)
18     bootTime = dataFields[1]
19     break
20   }
21 }
22
23 // Convert string to numeric value
24 bootTime64, err := strconv.ParseInt(bootTime, 10, 64)
25 if err != nil {
26   log.Fatalf("Error: %v\n", err)
27 }

Listing 15

CLK_TCK

01 // Reference value stored for CLK_TCK
02 // Values per second
03 clkTck, err := sysconf.Sysconf(sysconf.SC_CLK_TCK)
04 if err != nil {
05   log.Fatalf("Error calling Sysconf")
06 }

In the next step, the user scans the processes and creates a list (Listing 16). A for loop then browses this list and analyzes each process with regard to the user and the process run time. If a process is within the period under consideration, information to that effect is displayed (Listing 17).

Listing 16

Process List

01 // Get process list
02 processList, err := ps.Processes()
03 if err != nil {
04   log.Fatalf("Error in call to ps.Processes()")
05 }

Listing 17

Analyze Processes

01 // Iterate through process list
02 for _, process := range processList {
03  // Read process list
04  // Extract PID and executed program
05  pid := process.Pid()
06  exec := process.Executable()
07
08  // Read user ID from /proc/<pid>/status
09  // Available in column 2 of the line starting with Uid
10  // Go counts with an index of 0; therefore data field 1
11  statusPath := fmt.Sprintf("/proc/%d/status", pid)
12  fileHandle, err := os.Open(statusPath)
13  if err != nil {
14    log.Fatalf("Error calling os.Open(): %v\n", err)
15  }
16  defer fileHandle.Close()
17
18  scanner = bufio.NewScanner(fileHandle)
19  for scanner.Scan() {
20    currentLine := scanner.Text()
21    if strings.HasPrefix(currentLine, "Uid") {
22      uidFields := strings.Fields(currentLine)
23      userId = uidFields[1]
24      break
25    }
26  }
27
28  // Read process status from /proc/<pid>/stat
29  procPath := fmt.Sprintf("/proc/%d/stat", pid)
30  dataBytes, err := ioutil.ReadFile(procPath)
31  if err != nil {
32    log.Fatalf("Error: %v\n", err)
33  }
34  // Break line down into data fields
35  dataFields := strings.Fields(string(dataBytes))
36
37  // Compute process start time
38  // Read number of clock ticks since the system booted
39  // Available in column 22 of /proc/<pid>/stat
40  // Go counts with an index of 0; therefore data field 21
41  executionTime := dataFields[21]
42  executionTime64, err := strconv.ParseInt(executionTime, 10, 64)
43  if err != nil {
44  log.Fatalf("Error: %v\n", err)
45  }
46
47  // Divide the number of clock ticks passed by the stored kernel value
48  // Gives you seconds since booting
49  // And add the boot time
50  executionTime64 = (executionTime64 / clkTck) + bootTime64
51
52  // Check time frame
53  if executionTime64 < timeBoundary {
54    // Compute the start time as a date
55    startDate := time.Unix(executionTime64, 0)
56
57    // Output the information for the process
58    fmt.Printf("User ID: %s, Process ID: %8d, Program name: %s, Started on %s\n", userId, pid, exec, startDate)
59  }
60}

Listing 18 shows the output, where the script was called with the parameter 1 and then processed via a pipe with grep to find all Bash instances called in the process list.

Listing 18

Go Script Output

$ ./list-processes 1 | grep bash
User ID: 1000, Process ID:   604, Program name: bash, Started on 2020-04-14 11:31:51 +0200 CEST
User ID: 1000, Process ID:  5318, Program name: bash, Started on 2020-04-16 16:15:21 +0200 CEST
User ID: 1000, Process ID:  6984, Program name: bash, Started on 2020-04-16 19:04:52 +0200 CEST
User ID: 1000, Process ID:  6998, Program name: bash, Started on 2020-04-16 19:08:57 +0200 CEST

Conclusions

If you compare all the solutions with regard to functionality and our original objectives, all variants with the exception of Bash variant 3 provide useful results. In terms of program size, Bash variant 2 wins; the Go variant is the longest with more than 140 lines. The Python implementation falls in the lower middle range.

Opinions differ significantly on comprehensibility and readability. Especially with Listing 11 (the compact Perl variant), even die-hard Perl programmers need a moment (and the documentation for the module we used) to understand it. The implementations in Python or Go may be longer, but can be more quickly understood even by beginners.

In terms of run time, we found no significant differences in the solutions; all of them usually delivered a result within one to a maximum of one and a half seconds. This is fine for everyday use.

Both the Python script, as well as the Perl and Go implementations, make use of a matching library that offers easy access to the process information. The libraries for Python and Perl proved to be the most comprehensive. The Go library, however, still has room for extension. Functions that are already integrated in the Python library had to be built in the Go variant.

Thanks

We would like to thank Tobias Klauser for his go-sysconf package and support in optimizing the Go variant.

The Author

Frank Hofmann mostly works on the road, preferably from Berlin, Geneva, and Cape Town, as a developer, trainer, and author. He is currently the Linux system administrator for the scientific computing cluster at the Mésocentre de Calcul at the Université de Franche-Comté in Besançon.

Axel Beckert works as a Linux system administrator and specialist for network security with the ETH Zurich's central IT services. He is also a volunteer with the Debian GNU/Linux distribution, the Linux User Group Switzerland (LUGS), the Hackerfunk radio show and podcast, and in various open source projects.

Hofmann and Beckert have also authored a Debian package management book [11].