Building virtual filesystemsBy
Cmdfs builds a filtered virtual filesystem based on a source directory tree. You can even integrate other programs to convert data on the fly.
It’s a common task: You only want the users to see certain files on a file server, and the computer also needs to change these files dynamically during access. From the large collection of material on the server, perhaps you want the back office staff to see only the office documents and the graphical artists only the graphics.
Cmdfs is a handy tool that builds a virtual filesystem by filtering the contents of an existing directory tree. With cmdfs, you can create a virtual filesystem containing only the parts of the source filesystem you want to make available to users. However, cmdfs can do much more, including filtering to transform the files to your specifications. For instance, you could use the cmdfs filtering feature to scale down the resolution of too-large digital images or to convert files to an alternative format.
The Cmdfs FUSE-based filesystem lets you build this filtered filesystem without the administrator needing to go through the time-consuming and error-prone task of setting up a complex structure of directories and links. Although the latest cmdfs release dates back to 2010, the software works well – with just a few minor quirks.
Unless the repository for your Linux distribution offers a cmdfs package, installation requires some manual work with the source code archive. (So far, cmdfs hasn’t found its way into the Debian or Ubuntu repositories.)
To build the source code, you need the FUSE developer package. Unfortunately, the ./configure script will not notify you if the package is missing, but a simple
aptitude install libfuse-dev
command will load the package.
For the cmdfs source code, see the cmdfs site. Once the FUSE developer package is installed, the familiar process of download, unpack, .configure, make, and make install will take care of setting up cmdfs. If you prefer to know which programs are installed by the package manager, you can use checkinstall instead of make install.
After you complete the installation, users – who need to be members of the fuse group on some distributions – can use cmdfs to create alternative views for files or directories using a syntax modeled on the mount command. Cmdfs provides a read-only directory structure, which the user or administrator can modify using parameters at mount time.
As an example of cmdfs at work, suppose I want to create a filesystem that draws files from a source directory but only shows files of a specific type, and I also want it to hide empty directories – that is, directories that don’t contain files of the specified type. If the file type I am interested in is .jpg, I would enter:
cmdfs ~/Data ~/test -o extension=jpg,hide-empty-dirs
The file extension is not case sensitive. If you are doing this for multiple file types, use a comma-separated list in quotes, as follows:
cmdfs ~/Data ~/test -o extension="JPG;PNG",hide-empty-dirs
hide-empty-dirs tells cmdfs to hide directories that do not match the filter conditions. Filtering only by file extension has some disadvantages. Files without an extension, or with a misspelled extension, are not included, and if you want to see all possible image formats, your filter list is going to be pretty long. The resulting view would be hard to understand – and probably incomplete. In this scenario, filtering by Mime type is a better choice (Figure 1):
cmdfs ~/Data ~/test -o mime-re=image/*,hide-empty-dirs
To identify the correct Mime type for a specific file, run the file command with the --mime-type option:
file --mime-type LibreOfficeText.odt LibreOfficeText.odt: application vnd.oasis.opendocument.text
Editing Files On the Fly
Cmdfs can do more than just filter data. You can also use external programs to modify the files. For example, you can use the ImageMagick toolkit to transform digital images. ImageMagick [http://www.imagemagick.org/] is a comprehensive collection of tools for manipulating graphic images. With the convert tool, which is part of the ImageMagick collection, you could reduce images to a maximum width of 800 pixels or convert them to sepia.
Because the ImageMagick tools accept input from standard input and can write to standard output, you can integrate them seamlessly with cmdfs to, for example, automatically convert all of your JPGs to PNG files. (One potential problem with this technique is that all of your PNG files would still have a .jpg suffix, which would confuse many applications.)
Because convert can handle simple scaling, as well as perform many other operations, it is a perfect partner for cmdfs. The following command line outputs all the image files at the mountpoint with a maximum height and width of 800 pixels:
cmdfs ~/Data ~/test "-omime-re=image/*, hide-empty-dirs, command=convert --resize 800x800\'^>' -"
Smaller files keep their original size, as defined by the convert > flag, which you need to mask in the input for cmdfs. If all this works as intended, an on-the-fly conversion to sepia should be easy enough, too:
cmdfs ~/Data ~/test "-omime-re=image/*,hide-empty-dirs,command=convert --sepia -tone 90% -"
Integrating programs like this requires some care and attention.
Cache and Options
Cmdfs uses a cache to speed up some processes. Debian systems store this cache in /usr/local/var/cache/cmdfs/username; however, you can change this behavior using the -o cache-dir parameter. In testing, it is a good idea to regularly delete the contents of the cache directory. Deleting cached files is important because cmdfs only creates files in its cache directory if the last modified timestamp changes.
Although you can specify an expiry date at mount time using -o cache-expiry, this will cause peak loads on the system. When cmdfs accesses the mountpoint, it recreates all the files in the cache after the expiry date has elapsed.
The -o parameter offers another couple of useful options beyond cache management. Table 1 gives you some of the most important examples.
All of the examples thus far have been static, meaning that cmdfs will ignore new files and directories at the source. However, if you add the monitor parameter to the -o option, you can tell cmdfs to monitor the sources. To do this, cmdfs relies on the Linux kernel’s inotify mechanism
When a change is discovered, it can trigger automated actions. For example, you can inform your users of changes to files in the source directory, or you can automatically launch an OCR tool for documents that are scanned. If you want to use this mechanism in your own scripts, you will need to load the inotify tools from the appropriate repositories.
For permanent use, you will want to integrate the environment in your /etc/fstab file to mount the virtual filesystem automatically. For instance, you could add the following entry for the sepia example,
[...] cmdfs#/Data /home/Username/test fuse user,mime-re=image/*,hide-empty-dirs,command=convert\040-\040-sepia-tone\04090%\040- 0 0
To make sure the entry mounts as intended, you also need to replace any blanks in the command line with \040.
In our lab, the only problems we encountered were when the directory and mountpoint resided in the same directory – for example, ~/Documents and ~/test. In this case, cmdfs seemed to enter an infinite loop on some distributions when it tried to access the mountpoint with, for example, ls -l.
We contacted the developer, but his troubleshooting efforts failed to come up with an explanation for this behavior before this issue went to press. One possible cause could be incorrect permissions for the cache directory. (The Syslog reports this kind of error.)
If you prefer not to use your own files as test candidates, or you don’t want to create test files of your own, you can download sample files from the web.
You will find a large collection of test data, including image files, with sizes up to 35GB, from sites such as Digital Corpora.
Dennis Schreiber is a computer forensics investigator with the fiscal authorities in Thüringen, Germany. He prefers to use Linux for acquiring and evaluating data in his work. When he is not sitting in front of his computer, Dennis spends most of his time with his family and friends – and on his motorbike.
The company is collaborating with Google and Intel to use Kubernetes as an engine for Fuel
Customers can take a free test drive of SLES for HPC on the Azure Cloud
San Francisco-based chip company announces their first fully open source chip platform.
The whole distro gets rebuilt on glibc 2.3
Ubuntu Vendor tries to solve app packaging and distribution problem across distributions.
Founder of ownCloud launches the Nextcloud project.
Will The Machine change the way future programmers think about memory?
The new Torus distributed storage system is available under an open source license on GitHub
Juries decides Google’s use of Java APIs Was Fair Use