Search for processes by start time
Ghost Hunter
How do you find a process running on a Linux system by start time? The question sounds trivial, but the answer is trickier than it first appears.
As the maintainer of a computing cluster [1], Frank also provides his users with commercial software for calculations based on the fair-use principle. A limited number of license keys are available for this software (e.g., 10 keys for the MATLAB [2] simulation software).
Some of these calculations can take up to a week. When a calculation is finished and the process terminates, the license key is automatically returned to the pool of free keys and can be grabbed by another user. However, if users forget to end their processes, no more keys can be handed out, as these have all been allocated. To prevent this, the admins want to automatically search for processes that are older than 10 days. If they find a process matching this criteria, they can check with the users to clarify what should happen to the process.
The Linux kernel manages processes and makes information relating to them available to the user in the /proc
filesystem. At the command line, ps
is the reliable interface to process management. Unfortunately, ps
has dozens of options, and its output is often not very clear either. This can be remedied with a little shell code or possibly a scripting language. This article compares several potential solutions using Bash, Python and Perl scripts, and the Go programming language.
Our goal is to find a solution that detects processes that are still running and were launched at least 10 days ago and then output the results in a list that is sorted in descending chronological order. The output will also include the user's login name or user ID, the PID, the executed program, and the time when the respective process began. If possible, we want to use only on-board tools. For the solutions based on Bash, you will need the ancient procps 3.3.0 release or newer (earlier versions lack some of the features used here).
Bash Variant 1
The first obvious solution is based on the ps
command in combination with awk
, date
, sed
, and sort
. ps
supports an optional output field lstart
, which outputs a process's start time (and date) in a uniform, long format. Additionally, the option -h
must be used to completely suppress the headers in the ps
output.
While finding and implementing the solution (Listing 1) was quick, parsing ps
's output is not trivial, which makes the script relatively unreadable as well as quite long. We encountered the following problems with this solution:
- You have to set the
LC_TIME
environment variable to make sure that localized month names do not suddenly appear (env LC_TIME=C
). - The day of the month has additional spaces before the single-digit numbers. To sort, you have to replace them with a zero using the
sed
parameter (lines 5 and 6, Listing 1). - The start date contains the months in letters instead of numbers; you have to convert them to digits. This can be done with
sed
as shown in lines 7 through 18. - The order of the date components is not suitable for sorting (first month, then day, then time, and finally the year).
awk
changes the order of these four components. - The same applies to filtering from a certain date, since
awk
can also compare strings with<
. - The script uses
date
to generate the appropriate comparison date right at the outset, especially since it can also calculate data with relative specifications. The specification "10 days ago from now" is returned by calling:
date -d 'now -10 days'
date
can format the output very flexibly.
- If you do not specify any parameters when calling the script, it shows all processes older than 10 days.
- All numeric fields must be explicitly specified in
sort
; otherwisesort
will only consider the first field as numeric.
Listing 1
First Bash Attempt
01 #!/bin/sh 02 if [ -n "$1" ]; then limit=$1; else limit=10; fi 03 date="$(date '+%Y %m %d %T' -d "now -$limit days")" 04 env LC_TIME=C ps -eaxho pid,lstart,user,cmd | \ 05 sed -e 's/^ *//; 06 s/ \([1-9]\) / 0\1 /; 07 s/Jan/01/; 08 s/Feb/02/; 09 s/Mar/03/; 10 s/Apr/04/; 11 s/May/05/; 12 s/Jun/06/; 13 s/Jul/07/; 14 s/Aug/08/; 15 s/Sep/09/; 16 s/Oct/10/; 17 s/Nov/11/; 18 s/Dec/12/' | \ 19 awk '$6" "$3" "$4" "$5" "$1 < "'"$date"'" {print $6" "$3" "$4" "$5" "$1" "$7" "$8}' | \ 20 sort -n -k1 -k2 -k3 -k4 -k5
The output from Listing 1 without the sort
parameter with -k
looks like Listing 2 on a computer that was last booted on April 3, 2020.
Listing 2
Output from
$ ./list-processes1.sh | head 2020 04 03 22:32:34 1 root init 2020 04 03 22:32:34 10 root [ksoftirqd/0] 2020 04 03 22:32:34 104 root [kintegrityd] 2020 04 03 22:32:34 105 root [kblockd] 2020 04 03 22:32:34 106 root [blkcg_punt_bio] 2020 04 03 22:32:34 11 root [rcu_sched] 2020 04 03 22:32:34 12 root [migration/0] 2020 04 03 22:32:34 13 root [cpuhp/0] 2020 04 03 22:32:34 14 root [cpuhp/1] 2020 04 03 22:32:34 15 root [migration/1]
In Listing 2, you can immediately see that the sequence of the processes cannot be correct. This is because the time stamps in the field lstart
are only accurate to the second, not to the micro- or nanosecond. Sorting the output by process numbers at the very end solves this problem for the most part. You have to specify all fields up to and including the process number in the sort call, as shown in Listing 2. The output now looks like Listing 3.
Listing 3
Sorted Output
$ ./list-processes1.sh | head 2020 04 03 22:32:34 1 root init 2020 04 03 22:32:34 2 root [kthreadd] 2020 04 03 22:32:34 3 root [rcu_gp] 2020 04 03 22:32:34 4 root [rcu_par_gp] 2020 04 03 22:32:34 6 root [kworker/0:0H-kblockd] 2020 04 03 22:32:34 9 root [mm_percpu_wq] 2020 04 03 22:32:34 10 root [ksoftirqd/0] 2020 04 03 22:32:34 11 root [rcu_sched] 2020 04 03 22:32:34 12 root [migration/0] 2020 04 03 22:32:34 13 root [cpuhp/0]
Now the script only fails if so many processes are started within a single second that the process numbers are reassigned starting from the beginning. For a long time, the limit for this was 65,535 processes, but now Linux systems can also cope with larger process IDs (PIDs).
Bash Variant 2
An in-depth study of the ps
man page reveals other fields that are useful for the task at hand, such as the etimes
output field. etimes
tells you the number of seconds since the process was started, reducing the complexity considerably because you no longer have to parse month names or re-sort fields. This shrinks the command so it can be written in one line. Listing 4 returns all processes that are more than two days old.
Listing 4
Compact Bash Variant
$ ps -eaxho etimes,pid,user,cmd | sort -nr | awk '$1 > 2*24*60*60 {print}' | head 227081 106 root [blkcg_punt_bio] 227081 105 root [kblockd] 227081 104 root [kintegrityd] 227081 57 root [khugepaged] 227081 56 root [ksmd] 227081 55 root [kcompactd0] 227081 54 root [writeback] 227081 53 root [oom_reaper] 227081 52 root [khungtaskd] 227081 51 root [kauditd]
However, this variant also works with an accuracy of one second. Since the code sorts backwards, this is even more noticeable, because the PID 1
does not appear at the beginning of the list. This can be patched up by reading the sort
command options such that if the process age is identical, the PID is used as the sort criterion in ascending order. This is ensured by the parameter specification k1nr,2n
(Listing 5).
Listing 5
Improved Compact Bash Variant
$ ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > 2*24*60*60 {print}' | head 226597 1 root init [2] 226597 2 root [kthreadd] 226597 3 root [rcu_gp] 226597 4 root [rcu_par_gp] 226597 6 root [kworker/0:0H-kblockd] 226597 9 root [mm_percpu_wq] 226597 10 root [ksoftirqd/0] 226597 11 root [rcu_sched] 226597 12 root [migration/0] 226597 13 root [cpuhp/0]
The previous call contains the calculation of seconds by awk
in detailed form: 2*24*60*60
corresponds to two times 24 hours of 60 minutes each with 60 seconds each. Instead, the value can also be written directly as 172800
.
The value 86400
is useful for the number of seconds per day when parameterizing the script. Listing 6 expects a parameter for the number of days. You then multiply the passed numerical value by 86,400.
Listing 6
Number of Days as a Parameter
01 #!/bin/sh 02 if [ -n "$1" ]; then 03 limit=$1; 04 else 05 limit=10; 06 fi 07 ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > '"$limit"'*86400 {print}'
If you do not enter a numeric value as a call parameter, the script uses a value of 10
as the default case (10 days).
Bash Variant 3
The fact that split seconds were missing induced us to make a third attempt. Instead of the ps
command, entries from the /proc
filesystem are used as the basis here.
The required specification is found in field number 22
(starttime
) of the /proc/<pid>/stat
file. It tells you the number of clock ticks after the Linux kernel started up at the time a process is launched. Specifying the clock ticks is tricky; it is based on the assumption of a clock speed of 100Hz (i.e., 100 ticks per second [3]):
$ getconf CLK_TCK 100
Not all distributions adhere to this: Some use 250 or 1000Hz internally instead. However, they always outwardly report 100Hz. We could not clarify why this is the case. On Debian GNU/Linux, the two values are identical: 100Hz.
Like the previous shell scripts, the one in Listing 7 first reads a parameter again and, if no time span was specified, assumes 10 days as the default. Then awk
reads out two fields: 1
and 22
(the PID and number of clock ticks) in two calls. The first one determines the values for awk
's own process (whose PID in a shell typically resides in $$
); the second one determines the current time in clock ticks since the computer booted.
Listing 7
Bash Script with Clock Ticks
01 #!/bin/sh 02 if [ -n "$1" ]; then 03 limit=$1; 04 else 05 limit=10; 06 fi 07 now=$(awk '{print $22}' /proc/$$/stat) 08 awk '$22 < '$now'-(100*86400*'$limit') {printf "Sec. since boot: %.2f - PID: %i\n", $22/100, $1}' /proc/[1-9]*/stat | sort -n -k4 -k7
Then awk
reads the stat
files of all running processes; this is done by specifying:
/proc/[1-9]*/stat
The number of clock ticks per second (100) and seconds per day (86,400) are hardwired values here for simplicity's sake.
Since we wanted the output as a floating-point number to look nice, the output is restricted to just two decimal places using printf
– the clock ticks are no more accurate than this anyway. sort
then numerically sorts the two relevant fields as columns. The first numeric column lists the number of clock ticks, while the second lists the user ID.
The solution comes quite close to our objective, but cannot display the usernames for the processes. In addition, some processes that were definitely started long after the system booted (for example, the Tor Browser) unexpectedly appear as if they were started zero seconds after the system booted. The init
process, on the other hand, did not start until 468 clock ticks or 4.68 seconds after startup. In the test case, this was probably because the hard disk encryption password had to be entered first.
Removing awk
from the code and specifying the matching fields 22
and 1
directly as parameters of the sort
command makes everything a bit easier. Unfortunately, the result is unreadable output with a huge volume of data.
Annoyingly, the time data is still too imprecise to do without a final sort by PID. In theory, the data should be more precise than in the previous versions, because clock ticks provide more precise information than whole seconds. However, the problem of inaccuracy in case of a PID overflow obviously still exists. All in all, the variants with ps
seem to be the better approach.
Python
We used the psutil [4] library for an attempt with Python. psutil provides a large number of functions and delivers bundles of information about processes (e.g., a process's PID, run time, owner, and memory requirements). As was revealed when we read the library's source code, psutil also ultimately accesses information from the /proc
filesystem.
The script in Listing 8 includes two functions. The first function, getListOfProcesses()
, scans the process list and returns a list of the individual processes. Each list entry contains four data fields: PID, program or call name, time of creation, and username. The second function, calculateTimestamp()
, calculates the time, which serves as a limit filter to filter out irrelevant processes later.
Listing 8
Python Variant
01 import psutil 02 import datetime 03 04 # Define global variables 05 listOfProcessNames = [] 06 07 def getListOfProcesses(createTime=10): 08 # Deliver list of running processes as dictionary 09 # PID, program name, creation time and process owner 10 11 # Define upper limit of interval 12 intervalTime = calculateTimestamp(createTime) 13 14 for proc in psutil.process_iter(): 15 pInfoDict = proc.as_dict(attrs=['pid', 'name', 'create_time', 'username']) 16 # Create time values from 17 currentCreateTime = pInfoDict["create_time"] 18 19 # Is process outside of time interval? 20 if currentCreateTime < intervalTime: 21 listOfProcessNames.append(pInfoDict) 22 return 23 24 def calculateTimestamp(daysValue=10): 25 # Compute time interval (default: ten days) 26 27 # Determine current timestamp 28 currentTimestamp = datetime.datetime.now() 29 30 # Compute time interval 31 dateRange = datetime.timedelta(days=daysValue) 32 targetTimestamp = currentTimestamp - dateRange 33 unixTime = targetTimestamp.timestamp() 34 35 # Return as UNIX timestamp 36 return unixTime 37 38 getListOfProcesses() 39 40 # Sort list by create time and PID 41 listOfProcessNames = sorted( 42 listOfProcessNames, 43 key = lambda i: (i['create_time'], i['pid']) 44 ) 45 46 # Process list values from 47 for currentProcess in listOfProcessNames: 48 # Extract process details 49 username = currentProcess["username"] 50 pid = currentProcess["pid"] 51 creationTime = currentProcess["create_time"] 52 creationTimeString = datetime.datetime.fromtimestamp(creationTime).strftime('%d.%m.%Y %H:%M:%S') 53 processName = currentProcess["name"] 54 55 # Output process information 56 print( 57 "User name: %s, PID: %8i, Program: %s" % (username, pid, processName), 58 ", created on", 59 creationTimeString 60 )
The main program first calls the getListOfProcesses()
function and then sorts the list of processes by their creation times and PIDs. This results in the output shown in Listing 9, which contains all the processes identified with their owners, PIDs, program names, and creation times. If you want to search for all Bash processes in the results, grep
can help you filter the output.
Listing 9
Python Script Output
$ python3 list-processes2.py | grep bash User name: frank, PID: 3428, Program: bash , created on 08.03.2020 21:49:09 User name: frank, PID: 10438, Program: bash , created on 16.03.2020 21:12:18 User name: frank, PID: 5919, Program: bash , created on 25.03.2020 12:13:29
Perl
As with Python, you would not want to program everything yourself in Perl, although this would certainly be possible by browsing the /proc
filesystem. Instead, you should first look at the Comprehensive Perl Archive Network (CPAN) [5], since there may already be a Perl module for accessing the process table. And, lo and behold, there is: Proc::ProcessTable [6].
In Perl, you first create an instance of Proc::ProcessTable and retrieve a reference to a data structure with the entire process table in it. You could certainly iterate through the table with loops. However, if you like functional programming (à la Lisp), you can use a Schwartzian transform [7]. This works almost like a pipe at the command line or in shell scripts, only backwards: The data source is at the end (Listing 10).
Listing 10
Perl Variant
01 #!/usr/bin/perl 02 03 # Boiler plate to avoid bugs 04 use strict; 05 use warnings; 06 07 # Use modern "say" instead of "print" 08 use 5.010; 09 10 # Minimal parameter parsing: If a number is passed as parameter 11 # output this number of processes, otherwise 10. 12 my $max = @ARGV ? $ARGV[0] : 10; 13 14 # Use the Proc::ProcessTable module 15 use Proc::ProcessTable; 16 17 # Create a disposable object and save the process table it generated 18 my $table = Proc::ProcessTable->new->table; 19 20 # Schwartzian transform of table 21 my @result = 22 # Sort the list, first by start time and then by PID 23 sort { ($a->[0] <=> $b->[0]) or ($a->[1] <=> $b->[1]) } 24 # Use only the start time, PID and UID of the process 25 map { [ $_->start, $_->pid, $_->uid ] } 26 # The array following the dereferenced scalar is the data source 27 @$table; 28 29 # Output the results by classical iteration 30 foreach my $p (@result[0..$max-1]) { 31 say sprintf('PID: %6i | Start: %s | UID: %s', 32 $p->[1], ''.localtime($p->[0]), $p->[2]); 33 }
Listing 11 shows a more compact Perl variant without comments, boiler plate, or command-line parsing (it outputs all processes, sorted) – all of this in just one Schwartzian transform.
Listing 11
Compact Perl Variant
01 #!/usr/bin/perl 02 03 use Proc::ProcessTable; 04 05 print 06 map { sprintf("PID: %6i | Start: %s | UID: %s\n", 07 $_->[1], ''.localtime($_->[0]), $_->[2]) } 08 sort { ($a->[0] <=> $b->[0]) or ($a->[1] <=> $b->[1]) } 09 map { [ $_->start, $_->pid, $_->uid ] } 10 @{ Proc::ProcessTable->new->table };
Go
The Go programming language has recently gained in popularity among developers [8], which is why we offer an appropriate solution in Go. Our solution is based on two modules, go-ps [9] and go-sysconf [10], which provide functions for reading processes and system information. Further information from the /proc
filesystem, which neither of the two modules currently support, is used.
Our Go script has about 150 lines; we have split it into several listings for clarity. The first step (Listing 12) contains the package definition and imports the required modules. The following steps, which are part of the main
function, include the variable definitions and parameters (Listing 13), time frame and boot time (Listing 14), CLK_TCK
(Listing 15), and routines for retrieving (Listing 16) and evaluating the process list (Listing 17).
Listing 12
Import Required Modules
01 package main 02 03 import ( 04 // import standard modules 05 "bufio" 06 "fmt" 07 "io/ioutil" 08 "log" 09 "os" 10 "strconv" 11 "strings" 12 "time" 13 // import additional modules 14 ps "github.com/mitchellh/go-ps" 15 "github.com/tklauser/go-sysconf" 16 ) 17 18 func main () { 19 ... 20 }
Listing 13
Variables
01 var bootTime string 02 var userId string 03 04 // Suppress date and time output in log. 05 log.SetFlags(0) 06 07 // Set default value of ten days 08 timeLimit64 := int64(10) 09 10 // Read command line parameters 11 args := os.Args[1:] 12 if len(args) > 0 { 13 // Convert string to number 14 timeLimitArg, err := strconv.ParseInt(args[0], 10, 64) 15 if err != nil { 16 log.Fatalf("Error: %v\n", err) 17 } 18 timeLimit64 = timeLimitArg 19 } 20 log.Printf("Set time limit to %d days\n", timeLimit64)
Listing 13 covers the definition of the required variables and evaluation of the command-line parameters. If nothing else is specified, the program sets the default value to 10
.
With the data already determined, the code sets the time frame and consequently defines the relevant processes. It then determines the boot time: the number of seconds since January 1, 1970 (Listing 14). To evaluate time stamps correctly, the clock ticks are determined with the sysconf module as shown in Listing 15.
Listing 14
Time Frame
01 // Compute time frame 02 // Current time - days * 24h * 60min * 60s 03 timeBoundary := time.Now().Unix() - timeLimit64*24*60*60 04 05 // Determine boot time from /proc/stat in seconds since 1.1.1970 06 // available in /proc/stat in the line starting with btime 07 fileHandle, err := os.Open("/proc/stat") 08 if err != nil { 09 log.Fatalf("Error calling os.Open(): %v\n", err) 10 } 11 defer fileHandle.Close() 12 13 scanner := bufio.NewScanner(fileHandle) 14 for scanner.Scan() { 15 currentLine := scanner.Text() 16 if strings.HasPrefix(currentLine, "btime") { 17 dataFields := strings.Fields(currentLine) 18 bootTime = dataFields[1] 19 break 20 } 21 } 22 23 // Convert string to numeric value 24 bootTime64, err := strconv.ParseInt(bootTime, 10, 64) 25 if err != nil { 26 log.Fatalf("Error: %v\n", err) 27 }
Listing 15
CLK_TCK
01 // Reference value stored for CLK_TCK 02 // Values per second 03 clkTck, err := sysconf.Sysconf(sysconf.SC_CLK_TCK) 04 if err != nil { 05 log.Fatalf("Error calling Sysconf") 06 }
In the next step, the user scans the processes and creates a list (Listing 16). A for
loop then browses this list and analyzes each process with regard to the user and the process run time. If a process is within the period under consideration, information to that effect is displayed (Listing 17).
Listing 16
Process List
01 // Get process list 02 processList, err := ps.Processes() 03 if err != nil { 04 log.Fatalf("Error in call to ps.Processes()") 05 }
Listing 17
Analyze Processes
01 // Iterate through process list 02 for _, process := range processList { 03 // Read process list 04 // Extract PID and executed program 05 pid := process.Pid() 06 exec := process.Executable() 07 08 // Read user ID from /proc/<pid>/status 09 // Available in column 2 of the line starting with Uid 10 // Go counts with an index of 0; therefore data field 1 11 statusPath := fmt.Sprintf("/proc/%d/status", pid) 12 fileHandle, err := os.Open(statusPath) 13 if err != nil { 14 log.Fatalf("Error calling os.Open(): %v\n", err) 15 } 16 defer fileHandle.Close() 17 18 scanner = bufio.NewScanner(fileHandle) 19 for scanner.Scan() { 20 currentLine := scanner.Text() 21 if strings.HasPrefix(currentLine, "Uid") { 22 uidFields := strings.Fields(currentLine) 23 userId = uidFields[1] 24 break 25 } 26 } 27 28 // Read process status from /proc/<pid>/stat 29 procPath := fmt.Sprintf("/proc/%d/stat", pid) 30 dataBytes, err := ioutil.ReadFile(procPath) 31 if err != nil { 32 log.Fatalf("Error: %v\n", err) 33 } 34 // Break line down into data fields 35 dataFields := strings.Fields(string(dataBytes)) 36 37 // Compute process start time 38 // Read number of clock ticks since the system booted 39 // Available in column 22 of /proc/<pid>/stat 40 // Go counts with an index of 0; therefore data field 21 41 executionTime := dataFields[21] 42 executionTime64, err := strconv.ParseInt(executionTime, 10, 64) 43 if err != nil { 44 log.Fatalf("Error: %v\n", err) 45 } 46 47 // Divide the number of clock ticks passed by the stored kernel value 48 // Gives you seconds since booting 49 // And add the boot time 50 executionTime64 = (executionTime64 / clkTck) + bootTime64 51 52 // Check time frame 53 if executionTime64 < timeBoundary { 54 // Compute the start time as a date 55 startDate := time.Unix(executionTime64, 0) 56 57 // Output the information for the process 58 fmt.Printf("User ID: %s, Process ID: %8d, Program name: %s, Started on %s\n", userId, pid, exec, startDate) 59 } 60}
Listing 18 shows the output, where the script was called with the parameter 1
and then processed via a pipe with grep
to find all Bash instances called in the process list.
Listing 18
Go Script Output
$ ./list-processes 1 | grep bash User ID: 1000, Process ID: 604, Program name: bash, Started on 2020-04-14 11:31:51 +0200 CEST User ID: 1000, Process ID: 5318, Program name: bash, Started on 2020-04-16 16:15:21 +0200 CEST User ID: 1000, Process ID: 6984, Program name: bash, Started on 2020-04-16 19:04:52 +0200 CEST User ID: 1000, Process ID: 6998, Program name: bash, Started on 2020-04-16 19:08:57 +0200 CEST
Conclusions
If you compare all the solutions with regard to functionality and our original objectives, all variants with the exception of Bash variant 3 provide useful results. In terms of program size, Bash variant 2 wins; the Go variant is the longest with more than 140 lines. The Python implementation falls in the lower middle range.
Opinions differ significantly on comprehensibility and readability. Especially with Listing 11 (the compact Perl variant), even die-hard Perl programmers need a moment (and the documentation for the module we used) to understand it. The implementations in Python or Go may be longer, but can be more quickly understood even by beginners.
In terms of run time, we found no significant differences in the solutions; all of them usually delivered a result within one to a maximum of one and a half seconds. This is fine for everyday use.
Both the Python script, as well as the Perl and Go implementations, make use of a matching library that offers easy access to the process information. The libraries for Python and Perl proved to be the most comprehensive. The Go library, however, still has room for extension. Functions that are already integrated in the Python library had to be built in the Go variant.
Thanks
We would like to thank Tobias Klauser for his go-sysconf package and support in optimizing the Go variant.
Infos
- Mésocentre de Calcul at the university de Franche-Comté in Besançon: http://meso.univ-fcomte.fr/
- MATLAB: https://www.mathworks.com/products/matlab.html
- Linux process execution time: https://www.softprayog.in/tutorials/linux-process-execution-time
- psutil: https://psutil.readthedocs.io/en/latest/
- MetaCPAN: https://metacpan.org
- Proc::ProcessTable: https://metacpan.org/pod/Proc::ProcessTable
- Schwartzian transform: https://en.wikipedia.org/wiki/Schwartzian_transform
- "10 Best Programming Languages to Learn in 2020": https://hackr.io/blog/best-programming-languages-to-learn-2020-jobs-future
- go-ps: https://github.com/mitchellh/go-ps
- go-sysconf: https://github.com/tklauser/go-sysconf
- Debian package management book: https://www.dpmb.org/index.en.html