Tracking down hard disk space consumption with Go
Programming Snapshot – Disk Space Analyzer
This month, Mike Schilli writes a quick terminal UI in Go to help track down space hogs on his hard disk.
When installing a new hard disk, users think they are in seventh heaven – it seems almost impossible to ever fill up the drive. Given today's huge capacities, though, virtually everyone becomes a thoughtless space waster sooner or later. As the disk fills up, space needs to be freed up to keep things going. Most of the time, the culprit is huge files no longer in use that are still hanging around in forgotten directories and causing overcrowding. The most effective strategy is often a simple one of tracking down the biggest space wasters and then deleting the unneeded files.
The spacetree
Go program helps with this housecleaning by pointing to the most storage-intensive paths starting from a root directory and provides their disk space usage in bytes. As Figure 1 shows, the Go binary, which you call with the directory to be analyzed, switches the terminal to graphics mode and displays the storage paths as an interactive tree. You can expand the green elements in the tree with a mouse click or by pressing the Enter key to determine which folders contain the fattest files.
Biggest Fish
In Figure 1, the Go spacetree
binary takes a look at the ~/git/
directory and discovers some 14GB of data there. Using the arrow keys, their vi equivalents J and K, or even the mouse if you fancy, you can navigate to the directories highlighted in green and expand them by pressing the Enter key. Figure 2 shows that, for example, the articles/
directory, which contains the articles from 26 years of this column, occupies a total of almost 4GB.
Most of it, some 850MB, resides in the internal Git directory, .git
, which I can't touch without disrupting its inner workings. However, for the biggest disposable space waster, check out the subdirectory with the wifi
network checker from the October 2023 issue: This directory still has the screencast video in it! By deleting it, I've saved another 850MB. On top of that, the March 2019 progressbar
article contains the unused foo
and foo.bak
files that really need to be removed.
Smarter with Heap
But how does the Go program traverse the deeply nested file structure to count the drive bytes used by each file and print a short list of the largest files? It finds the 10 biggest files from a huge number of files, which can easily be millons on a heavily used system. The simple sounding solution of dropping all the entries into an array, sorting the array in descending order by file size, and then outputting the first 10 entries is not an option. Even a highly effective sorting algorithm would take quite some time to process a million entries, not to mention the RAM wasted by such a huge array. After all, only 10 entries are of interest at the end of the day.
The so-called "min heap" proves to be an efficient data structure for problems like this. The nodes of this binary tree always satisfy the rule that the entries below a given node always have larger values than the node itself. In Figure 3, the node with the value 3 is at the top of the tree root and is smaller than all other nodes in the tree. Note, though, that the rule applies only to directly or indirectly connected nodes. In Figure 3, for example, it is perfectly okay for the first element of the second row from the top, which belongs to the left subtree, to contain a value of 8 while the third row of the right subtree contains a value of 7. They're not in a direct or indirect parent/child relationship, so the rule doesn't apply.
Minimal Heap
Software programs represent the tree as a simple array which, as shown in Figure 3, creates subsets of one, two, four, eight, and so on that border on each other seamlessly, each subset representing one horizontal row within the tree. It is now relatively easy to find and remove the smallest entry in the tree, because it is always at the top, in the root element of the tree. Removing the node leaves an empty spot and disrupts the tree order, but the standardized heap algorithm can reorder the tree so that another node takes the place of the removed one. This causes subsequent nodes to bubble up, until the tree again satisfies the heap rules.
This makes the data structure ideal for an actively maintained list of the N largest files found so far. If the tree initially contains less than N nodes, all newly found files are moved into it without checking. On the other hand, if the tree grows to N+1 nodes when a new file is inserted, it needs to be shrunk back to N nodes. This works best if the algorithm removes the root node with the smallest file and lets the remaining nodes move up by repairing the tree.
Figure 4 shows the output of the test program in Listing 1. It first fills the heap with the paths to eight files including their sizes in bytes. However, it limits the maximum size of the heap to six entries. The result is the six largest files from the entire collection, sorted in descending order by file size. As expected, the abc
and ghi
entries with byte counts of 123 and 124 are dropped as the smallest files.
Listing 1
heap-main.go
01 package main 02 import ( 03 "fmt" 04 ) 05 func main() { 06 entries := []FsEntry{ 07 {path: "abc", used: 123}, 08 {path: "def", used: 456}, 09 {path: "ghi", used: 124}, 10 {path: "jkl", used: 457}, 11 {path: "mno", used: 125}, 12 {path: "pqr", used: 458}, 13 {path: "stu", used: 126}, 14 {path: "vwx", used: 999}, 15 } 16 17 h := NewTopN(6) 18 for _, e := range entries { 19 h.Add(e) 20 } 21 22 h.Iterate(func(e FsEntry) { 23 fmt.Printf("%s %d\n", e.path, e.used) 24 }) 25 }
Listing 1 uses NewTopN(6)
in line 17 to create a min heap with a maximum of six nodes. The for
loop starting in line 18 uses Add()
in each round to place the next entry from the entries
array slice onto the min heap. Finally, Iterate()
works its way through the winning entries in the data structure starting in line 22, from the file with the highest byte count to the smallest of the top six.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.