Filesystem monitoring
Core Technology
Notification APIs in LInux.
If you were using Linux back in 2006, you will remember the desktop search tool Beagle (Figure 1), which was notified when you changed your files so it could re-index them. Modern file managers also rely on notifications to update their displays when files are created, deleted, or renamed (Figure 2), unlike earlier file managers that counted on the user to refresh the display (Figure 3). Now, think of the ClamAV open source antivirus software. If you try to open a file containing malware, you expect an on-access scanner to ban it. In this case, notifications aren't enough; ClamAV needs to be an active part of the process, allowing or denying certain operations. Happily, Linux can handle both cases. The downside is, it does so with two separate APIs, and you can't just choose one over another.
Inotify
The first API I'll look at, and the first one to appear in Linux 2.6, is inotify. It went mainline with Linux 2.6.13 in 2005. Inotify was the workhorse behind Beagle's indexer. It replaced an older filesystem monitoring technology (see the "dnotify" sidebar), improving it in several ways.
dnotify
Inotify was introduced with early 2.6 kernels, but filesystem monitoring in Linux is really much older. Its first incarnation, dnotify, appeared in Linux 2.4.0 back in 2001.
The dnotify API was a step forward: Original approaches to the problem involved polling directories for changes, which, was very inefficient. However, dnotify's design made an API cumbersome and was not easy to use. Although later approaches introduced separate system calls, dnotify relied on fcntl(2)
.
Signals are used for notifications, and they are somewhat difficult to treat correctly because they convey little information (not even the name of a file triggers the event), and you can't integrate them easily into event loops, although signnalfd(2)
file descriptors mitigate the issue to some extent. Dnotify forces you to retain an opened descriptor for each filesystem object you monitor. Moreover, it has no notion of events that rename a file, leaving programmers to figure it out by comparing two directory trees. If I recall correctly, Dropbox once offered a similar puzzle to candidates seeking an engineering position within the company (i.e., it's not trivial).
Dnotify is still available in the latest kernels, but with inotify
and fanotify
, there is little sense to use it except in legacy code.
First, inotify replaced a cumbersome signal-based notification mechanism with a pollable file descriptor from which you just read events. This makes event loop integration a breeze. It also waived a need to retain an opened file descriptor for each directory you monitor. To do so, inotify introduced three new system calls: inotify_init(2)
, inotify_add_watch(2)
, and inotify_rm_watch(2)
.
Your code starts with inotify_init()
, which returns a file descriptor acting as a handler to an in-kernel event queue. A newer variant, inotify_init1()
accepts the extra flags
argument. Passing IN_NONBLOCK
here opens the descriptor in non-blocking mode, saving you an fcntl(2)
call. IN_CLOEXEC
flag is a similar shortcut:
int fd = inotify_init(); if (fd < 0) { /* Handle the error */ }
Then, you add watches for filesystem objects of interest with the inotify_add_watch()
system call. It accepts three arguments: an inotify file descriptor, a pathname, and a set of flags (or "mask") telling the events in which you are interested. IN_CREATE
fires when an entry (think a file or a subdirectory) is created in the directory you watch. IN_OPEN
is reported when a file is opened, followed by IN_ACCESS
or IN_MODIFY
, when the contents are read or changed. Later, when the file is closed, the kernel sends either IN_CLOSE_WRITE
(if the file was opened for writing), or IN_CLOSE_NOWRITE
. You can capture both with IN_CLOSE
.
Some flags carry a _SELF
suffix, like in IN_DELETE_SELF
. They apply to the monitored directory itself, not its children. In particular, IN_DELETE_SELF
is reported when you remove a watched directory. The kernel then reports an IN_IGNORED
event for it. Moving a directory can also generate IN_DELETE_SELF
if it occurs across filesystem boundaries, but normally, it produces a sequence of two events: IN_MOVED_FROM
and IN_MOVED_TO
.
The inotify(7)
man page lists all supported flags. Here, I just pass IN_ALL_EVENTS
, which – you guessed it – captures everything:
int wd = inotify_add_watch(fd, argv[1], IN_ALL_EVENTS); if (wd < 0) { /* Handle this one as well */ }
An inotify_add_watch()
returns a so-called "watch descriptor." It matches events to watched filesystem objects and can be used to "unmonitor" them later with inotify_rm_watch()
. If anything goes wrong, inotify_add_watch()
returns -1
and errno
is set appropriately.
Now you wait for an inotify descriptor to become readable. In a real application, this happens in an event loop. In Listing 1, I just spin in read()
:
Listing 1
Read Event
It makes sense to use large buffers capable of storing multiple events for performance reasons. A struct inotify_event
represents a single inotify event. The wd
field contains the watch descriptor, which you can map back to a pathname (see inotify(7)
for details).
if (ev->mask & IN_OPEN) printf("IN_OPEN "); /* Handle other events here */
The mask
tells what exactly has happened. Besides the IN_*
flags you supply to inotify_add_watch()
, it may contain the aforementioned IN_IGNORED
or IN_Q_OVERFLOW
if the in-kernel queue has overflowed. The queue can store up to 16,384 events by default. Although the size is adjustable via /proc/sys/fs/inotify/max_queued_events
, there is always a limit. Otherwise, you leave your system open to a local denial of service (DoS) attack. When a queue overflows, the kernel discards further events (keeping memory usage constrained) until an application empties the queue or destroys it.
printf("%s ", ev->mask & IN_ISDIR ? "directory" : "file"); if (ev->len) printf("%s", ev->name); printf("\n");
If the event pertains to an entry within the watched directory, name
is its name, and len
is the name's length. For subdirectories, IN_ISDIR
is also set in the mask
. Finally, cookie
is an arbitrary but unique integer that links IN_MOVE_FROM
and IN_MOVE_TO
events (not shown here) together.
Despite all its goodies, the inotify API is somewhat limited: It doesn't support recursive operations, so to monitor a directory, including all its children, you'd have to add watches one by one. Keep in mind that the directory can change while you install watches, so your code should anticipate possible races.
Rename events pose another difficulty. Their twofold split is natural from the kernel's point of view, but it leaves your code guessing as to whether it will get matching IN_MOVE_TO
and IN_MOVE_FROM
events. What if you don't monitor a directory to which the object moved? A common solution is to wait for IN_MOVE_TO
for a few milliseconds. If it's absent, you conclude it won't appear at all. Although not ultimately robust, this approach is reported to produce accurate results 95%-99% of the time.
Inotify doesn't convey a PID for the process doing changes and provides no mechanism for access permission decisions. This doesn't mean inotify is flawed or should be deprecated. Much the opposite, many Linux applications rely on it; yet, there is some room for a more advanced API that addresses at least some of these issues.
Fanotify
Fanotify is such an API. It made a debut in Linux 2.6.36, almost five years after inotify. Conceptually, fanotify is similar to inotify, yet somewhat closer to the low-level kernel API they both use. It introduces a set of system calls to obtain a file descriptor and "marks" filesystem objects as being watched. Unlike inotify, fanotify can monitor a mounted filesystem as a whole. Monitoring a single directory recursively is still impossible, though. Whereas the inotify file descriptor is read-only, its fanotify counterpart is writable, which is how you tell fanotify your access permission decisions.
To create a fanotify file descriptor, use the fanotify_init()
system call:
int fd = fanotify_init(FAN_CLASS_CONTENT, O_RDONLY); if (fd < 0) { /* You know what to do */ }
Compared with inotify_init()
, calling fannotify_init()
usually implies root privileges. The call accepts two bitmask arguments. The first defines fanotify behavior; FAN_CLASS_CONTENT
or FAN_CLASS_PRE_CONTENT
are required to handle permission events. If a single file has multiple watchers, FAN_CLASS_PRE_CONTENT
wins and gets a chance to modify the file's data; FAN_CLASS_CONTENT
runs next, so it sees the contents in their final form (hence the name). The default, FAN_CLASS_NOTIF
, runs last and can't be used with permission events.
If you want a fanotify descriptor to be non-blocking, add FAN_NONBLOCK
. To waive limits for the in-kernel events queue size and the number of watches, use FANOTIFY_UNLIMITED_QUEUE
and FANOTIFY_UNLIMITED_MARKS
, respectively. Keep DoS attack scenarios in mind if you use these arguments.
When a file produces some event, fanotify opens a new file descriptor and hands it over to the userspace code. The second argument to fanotify_init()
tells how exactly to do it. It's the same as flags
in open(2)
. Here, you're not going to modify files, so read-only access is sufficient.
Next, you start adding marks. The fanotify_mark()
system call multiplexes adding, removing, and flushing marks. An equivalent watch descriptor is not available in fanotify, so the call just returns zero if everything went okay:
int err = fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT, FAN_OPEN_PERM, AT_FDCWD, argv[1]); if (err) { /* Sorry, this didn't work out this time */ }
The fanotify file descriptor is fd
, and FAN_MARK_ADD
adds a mark. The whole mounted filesystem (FAN_MARK_MOUNT
) is being monitored for file open permission events (FAN_OPEN_PERM
). The last two arguments define an object to monitor. You'll find quite a few more possibilities in the fanotify_mark(2)
man page. AT_FDCWD
is a special value for the current working directory's file descriptor. If argv[1]
is not an absolute path, it is treated as being relative to the current directory (.
). This isn't utterly important here when installing a mount point mark, so any directory residing on a filesystem has the same effect.
Compared with inotify, fanotify's assortment of events might feel limited. At present, creating, deleting, and removing events are not supported: You can watch files and directories being opened, accessed, and closed, and that's it. Moreover, mmap()
generates no events. Fanotify isn't an inotify replacement; instead, it focuses on cases such as malware scanning and hierarchical storage management.
Now you can start looping for events again. Fanotify represents events as struct fanotify_event_metadata
. In theory, it varies in size, so fanotify provides some macros to aid iteration (Listing 2).
Listing 2
Looping for Events
Among the struct fanotify_event_metadata
fields, three are most interesting: mask
, fd
, and pid
. The mask
bitfield stores FAN_*
flags, as already discussed, plus FAN_Q_OVERFLOW
, indicating an overflow event (unless the queue is unlimited). The pid
process identifier generated the event, and fd
is a file descriptor for an object to which the event pertains. The second argument to fanotify_init()
dictates whether it is writable or not. Because there is no metadata->name
, to retrieve it from metadata->fd
, you just read a symbolic link in /proc/self/fd
:
char path[PATH_MAX], real_path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/self/fd/%d", metadata->fd); int path_len = readlink(path, real_path, sizeof(real_path)); if (path_len < 0) { /* That's an error */ }; real_path[path_len] = '\0';
Imagine you want to ban access to files containing EvilSignature
in their first 1KB of data. Real antivirus software is far more sophisticated than that, but the big picture stays the same (Listing 3).
Listing 3
Banning Files
Operations on metadata->fd
don't generate further fanotify events (there would be an infinite loop otherwise). To convey a permission decision, you write an instance of struct fanotify_response
to fanotify's fd
. You must set r->fd
to the file descriptor in question, and r->response
is either FAN_ALLOW
or FAN_DENY
. Fanotify handles permissions events in first-in, first-out fashion, and until you reply, the requesting process remains blocked. If access is banned, the process gets an EPERM
error. However, when your fanotify application terminates, all unhandled permission events are granted implicitly. Keep this in mind if you design security software.
The above example was intentionally simple. Refer to the fanotify(7)
man page for a more elaborate version.
Flying Higher
Having a native API for filesystem monitoring is good, but sometimes not exactly what you want. It might be too low-level for Python code, or it could be too Linux-specific for cross-platform development. For these situations, higher level libraries wrap platform specifics in an easy-to-use interface.
If getting filesystem notifications in Python code is all you need, consider pyinotify
[3], best described as "inotify, the Python way." It supports both Python 2 and Python 3. To install watches with the add_watch()
method, you use a pyinotify.WatchManager
instance (usually a singleton). The pyinotify.Notifier
class is a central dispatcher hub. Pyinotify provides a simple blocking and threaded notifier and integrations for popular Python asynchronous frameworks, such as asyncore
/asyncio
modules and Tornado. When an event fires, pyinotify runs a defineded handler (a Python callable). If you want to chain handlers, consider using pyinotify.ProcessEvents
instead of plain functions or lambdas.
On the other hand, python-fanotify [4] is best described as Python bindings to the native C API. This module comes from Google and has zero Python code inside except for setup.py
. Documentation is missing as well (sans two examples), which is probably not a big deal. The API stays the same, except you prefix identifiers with fanotify plus a dot and rename them to match Python standards; so, fanotify_init()
becomes fanotify.Init()
and FANOTIFY_EVENT_NEXT()
translates to fanotify.EventNext()
, because Python has no notion of macros.For dessert, try Watchdog [5]. As opposed to the first two libraries, which wrap a single API, this one abstracts several OS-dependent mechanisms, such as inotify on Linux and kqueue on FreeBSD. To use the library, you create a watchdog.observers.Observer
thread object. Watchdog detects your target platform and chooses the appropriate notification mechanism automatically. Then, you implement an event handler, which is a class that inherits watchdog.events.FileSystemEventHandler
and overrides instance methods like on_moved()
or on_created()
, which are probably self-explanatory. Now, you "schedule" monitoring for a specific directory with observer.schedule()
. Watchdog can recursively monitor by passing the recursive=True
keyword argument. Finally, spawn the monitoring thread with observer.start()
and start receiving notifications.
Watchdog also provides the watchmedo
command-line tool for your shell scripts. Watchmedo executes shell commands in response to various filesystem events and serves as a reference for how to use the library in a real-world project.
Command of the Month: inotifywait
Watchmedo isn't the only command to mate filesystem notifications with shell scripts. The inotifywait command, along with its cousin inotifywatch, is the de facto standard in Linux. Both come in a single package, often called inotify-tools.
The purpose of these tools is to wait for filesystem events in selected directories then dump some statistics. The difference is inotifywait's output is easy to parse (and is configurable with --format and --csv), whereas inotifywatch prints a human-readable table (Figure 4).
The command-line syntax is also rather similar. You supply a path you want to monitor (either a file or a directory). The -r switch enables recursive operation. To exclude certain pathnames, use the --exclude key, which accepts a regular expression. Events to monitor are specified with -e. By default, inotifywait captures a single event, but you can override this with --monitor/-m. In this mode, the command executes forever. To do the same thing, but dump events to a file rather than stdout, use --daemon/-d. This doesn't apply to inotifywatch, which lasts until you interrupt it with Ctrl+C or a time out specified with -t and the number of seconds.
Study this snippet from an inotifywait session:
$ inotifywait -rm -e create,access /tmp /tmp/ CREATE tmpfBnccrk ... /tmp/mc-val/ CREATE extfs1C1AQYMathJax.js /tmp/mc-val/ ACCESS extfs1C1AQYMathJax.js ...
Here, I instructed Midnight Commander to open a ZIP archive and viewed a file in it. The output spans a few dozen lines: /tmp is a busy place on a live Linux system.
Infos
- A screenshot of Beagle's UI: https://commons.wikimedia.org/wiki/File:Beagle-search.png
- Apache License Version 2.0, January 2004: https://www.apache.org/licenses/LICENSE-2.0
- pyinotify: https://github.com/seb-m/pyinotify
- python-fanotify: https://github.com/google/python-fanotify
- Watchdog: http://github.com/gorakhargosh/watchdog
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.